Nishith Agarwal
e83dde3b95
Returning empty Statues for an empty spark partition caused due to incorrect bin packing
2018-12-04 11:41:38 -08:00
Balaji Varadarajan
f999e4960c
Avoid WriteStatus collect() call when committing batch
2018-11-28 10:41:49 -08:00
Nishith Agarwal
d0fde47458
Fixing number of insert buckets to be generated by rounding off to the closest greater integer
2018-11-15 10:04:45 -08:00
jiale.tan
98fd97b65f
feature(HoodieGlobalBloomIndex): adds a new type of bloom index to allow global record key lookup
2018-09-29 19:55:20 +05:30
vinothchandar
9ca6f91e97
Perform consistency checks during write finalize
...
- Check to ensure written files are listable on storage
- Docs reflected to capture how this helps with s3 storage
- Unit tests added, corrections to existing tests
- Fix DeltaStreamer to manage archived commits in a separate folder
2018-09-28 08:04:41 +05:30
Vinoth Chandar
eca49a255e
Rebasing and fixing conflicts against master
2018-09-11 11:03:30 +05:30
Nishith Agarwal
2b1af18941
Adding check for rolling stats not present to handle backwards compatibility of existing timeline
2018-09-10 11:53:46 +08:00
Nishith Agarwal
459e523d9e
1. Small file size handling for inserts into log files. In summary, the total size of the log file is compared with the parquet max file size and if there is scope to add inserts the add it.
2018-09-06 08:52:08 +08:00
Nishith Agarwal
3746ace76a
Fixing Null pointer exception in finally block
2018-08-21 21:07:53 -07:00
Balaji Varadarajan
2e12c86d01
Ensure Compaction Operation compacts the data file as defined in the workload
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
2f8ce93030
Async Compaction Main API changes
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
1b61f04e05
(1) Define CompactionWorkload in avro to allow storing them in instant files.
...
(2) Split APIs in HoodieRealtimeCompactor to separate generating compaction workload from running compaction
2018-08-07 08:19:50 -07:00
Balaji Varadarajan
6d01ae8ca0
FileSystemView and Timeline level changes to support Async Compaction
2018-08-07 08:19:50 -07:00
Nishith Agarwal
34ab54a9d3
Fixing bug introducted in rollback for MOR table type with inserts into log files
2018-07-17 17:20:34 -07:00
Nishith Agarwal
3da063f83b
Adding ability for inserts to be written to log files
2018-06-11 14:08:59 -07:00
Nishith Agarwal
23d53763c4
enabling global index for MOR
2018-05-16 10:36:25 -07:00
Balaji Varadarajan
dfc0c61eb7
Support union mode in HoodieRealtimeRecordReader for pure insert workloads
...
Also Replace BufferedIteratorPayload abstraction with function passing
2018-05-10 17:39:56 -07:00
Nishith Agarwal
720e42f52a
Parallelized read-write operations in Hoodie Merge phase
2018-04-12 11:46:42 -07:00
Balaji Varadarajan
788e4f2d2e
CodeStyle formatting to conform to basic Checkstyle rules.
...
The code-style rules follow google style with some changes:
1. Increase line length from 100 to 120
2. Disable JavaDoc related checkstyles as this needs more manual work.
Both source and test code are checked for code-style
2018-03-30 11:09:40 -07:00
Nishith Agarwal
0eaa21111a
Re-factoring Compaction as first level API in WriteClient similar to upsert/insert
2018-03-07 16:16:39 -08:00
Nishith Agarwal
5405a6287b
Introducing HoodieLogFormat V2 with versioning support
...
- HoodieLogFormat V2 has support for LogFormat evolution through versioning
- LogVersion is associated with a LogBlock not a LogFile
- Based on a version for a LogBlock, approporiate code path is executed
- Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff
- Implemented Reverse pointer to be able to traverse the log in reverse
- Introduce new MAGIC for backwards compatibility with logs without versions
2018-03-06 21:14:11 -08:00
Nishith Agarwal
7076c2e9f0
refactor classes to accept Map passed by RealtimeCompactor to avoid multiple map creations in HoodieMergeHandle
2018-02-07 11:16:01 -08:00
Nishith Agarwal
30049383f5
Small File Size correction handling for MOR table type
2018-02-07 11:01:10 -08:00
Nishith Agarwal
2116815261
Fixing Rollback for compaction/commit operation, added check for null commit
...
- Fallback to old way of rollback by listing all partitions
- Added null check to ensure only partitions which are to be rolledback are considered
- Added location (committime) to workload stat
- Added checks in CompactedScanner to guard against task retries
- Introduce new logic for rollback (bounded by instant_time and target_instant time)
- Reversed logfiles order
2018-02-06 16:55:23 -08:00
Jian Xu
15e669c60c
Incorporating code review feedback for finalizeWrite for COW #4
2018-02-02 11:38:25 -08:00
Jian Xu
3736243fb3
Rebases with latest upstream
2018-02-02 11:38:25 -08:00
Jian Xu
363e35bb0f
Add finalizeWrite support for HoodieMergeHandle
2018-02-02 11:38:25 -08:00
Jian Xu
acae6586f3
Incorporating code review feedback for finalizeWrite for COW #3
2018-02-02 11:38:25 -08:00
Jian Xu
37f2cdd7e4
Incorporating code review feedback for finalizeWrite for COW #2
2018-02-02 11:38:25 -08:00
Jian Xu
2fe4fef625
Incorporating code review feedback for finalizeWrite for COW
2018-02-02 11:38:25 -08:00
Jian Xu
c874248f23
Add FinalizeWrite in HoodieCreateHandle for COW tables
2018-02-02 11:38:25 -08:00
Vinoth Chandar
0cd186c899
Multi FS Support
...
- Reviving PR 191, to make FileSystem creation off actual path
- Streamline all filesystem access to HoodieTableMetaClient
- Hadoop Conf from Spark Context serialized & passed to executor code too
- Pick up env vars prefixed with HOODIE_ENV_ into Configuration object
- Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS
- Adding s3a to supported schemes & support escaping "." in env vars
- Tests use HoodieTestUtils.getDefaultHadoopConf
2018-01-17 23:34:21 -08:00
Nishith Agarwal
44839b88c6
Removing compaction action type and associated compaction timeline operations, replace with commit action type
2018-01-09 09:56:15 -08:00
Nishith Agarwal
9b610f82c7
Separating out compaction() API
2017-11-14 22:56:29 -08:00
Vinoth Chandar
e45679f5e2
Reformatting code per Google Code Style all over
2017-11-12 23:19:02 -08:00
Nishith Agarwal
c7d63a7622
1) Separated rollback as a table operation 2) Implement rollback for MOR
2017-10-12 07:36:46 -07:00
Vinoth Chandar
f2980052cd
Revert effects of PR #259
2017-09-28 10:29:58 -07:00
Eric Sayle
6230e15191
Update deprecated hash function
...
Guava deprecated hashString(String) in v15, and removed it in v16.
Replace call with hashUnencodedString(String), which replace it, to
be compatible with newer versions of Guava.
2017-09-18 17:39:19 -07:00
Omkar Joshi
5c639c0b05
Adding support for UserDefinedBulkInsertPartitioner
2017-09-08 20:55:13 -07:00
Omkar Joshi
ec40d04d51
Fixing UpsertPartitioner to ensure that input records are deterministically assigned to output partitions
2017-09-07 17:03:56 -07:00
Nishith Agarwal
5ee4ac40ae
Use CompletedFileSystemView instead of CompactedView considering deltacommits
2017-08-07 12:26:42 -07:00
Vinoth Chandar
754ab88a2d
Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView
...
- Usage now marks code as clearly using either RO or RT views, for future evolution
- Tests on all of FileGroups and FileSlices
2017-06-22 17:16:13 -07:00
Vinoth Chandar
c00f1a9ed9
Refactoring HoodieTableFileSystemView using FileGroups/FileSlices
...
- Merged all filter* and get* methods
- new constructor takes filestatus[]
- All existing tests pass
- FileGroup is all files that belong to a fileID within a partition
- FileSlice is a generation of data and log files, starting at a base commit
2017-06-22 17:16:13 -07:00
Vinoth Chandar
23e7badd8a
Rename IO Handles & introduce stub for BucketedIndex
...
- UpdateHandle -> MergeHandle, InsertHandle -> CreateHandle
- Also bunch of code cleanup in different places
2017-06-22 17:16:13 -07:00
Kaushik Devarajaiah
3aa8083913
Correct clean bug that causes clean failure when partitionPaths are empty
2017-06-20 15:45:32 -07:00
Kaushik Devarajaiah
521555c576
Parallelize file version deletes during clean and related tests
2017-06-15 18:20:42 -07:00
Vinoth Chandar
da17c5c607
Introduce getCommitsAndCompactionsTimeline() explicitly & adjust usage across code base
2017-05-01 21:48:27 -07:00
Prasanna Rajaperumal
91b088f29f
Implement Compaction policy abstraction. Implement LogSizeBased Bounded IO Compaction as the default strategy
2017-04-20 16:59:06 -07:00
Prasanna Rajaperumal
aee136777b
Fixes needed to run merge-on-read testing on production scale data
2017-04-02 22:25:47 -07:00
Prasanna Rajaperumal
d83b671ada
Implement Savepoints and required metadata timeline - Part 2
2017-03-13 23:09:29 -07:00