1
0
Commit Graph

59 Commits

Author SHA1 Message Date
Balaji Varadarajan
ae3c02fb3f HUDI-162 : File System view must be built with correct timeline actions 2019-07-14 00:48:09 -07:00
Vinoth Chandar
b791473a6d Introduce HoodieReadHandle abstraction into index
- Generalized BloomIndex to work with file ids instead of paths
 - Abstracted away Bloom filter checking into HoodieLookupHandle
 - Abstracted away range information retrieval into HoodieRangeInfoHandle
2019-06-12 10:46:14 -07:00
Balaji Varadarajan
065173211e HUDI-147 Compaction Inflight Rollback not deleting Marker directory 2019-06-09 11:45:54 -07:00
Balaji Varadarajan
479908fd20 HUDI-125 : Change License for all source files and update RAT configurations 2019-06-09 11:41:55 -07:00
Balaji Varadarajan
145034c5fa Spark Stage retry handling 2019-05-21 14:49:51 -07:00
Balaji Varadarajan
64fec64097 Timeline Service with Incremental View Syncing support 2019-05-16 13:25:33 -07:00
Nishith Agarwal
3d9041e216 Fixing source schema and writer schema distinction in payloads 2019-03-26 19:44:27 -07:00
Nishith Agarwal
9e59da7fd9 Refactor HoodieTable Rollback to write one rollback instant for a batch of commits to rollback 2019-03-19 10:10:16 -07:00
Nishith Agarwal
0dd4a90b03 Enable multi/nested rollbacks for MOR table type 2019-03-19 10:10:16 -07:00
Balaji Varadarajan
8adaca3454 Table rollback for inflight compactions MUST not delete instant files at any time to avoid race conditions 2019-02-11 18:30:21 -08:00
Nishith Agarwal
994d42d307 cleaner should now use commit timeline and not include deltacomits 2019-01-28 10:46:33 -08:00
arukavytsia
6946dd7557 General enhancements 2018-12-18 12:52:39 -08:00
Nishith Agarwal
e83dde3b95 Returning empty Statues for an empty spark partition caused due to incorrect bin packing 2018-12-04 11:41:38 -08:00
Balaji Varadarajan
f999e4960c Avoid WriteStatus collect() call when committing batch 2018-11-28 10:41:49 -08:00
Nishith Agarwal
d0fde47458 Fixing number of insert buckets to be generated by rounding off to the closest greater integer 2018-11-15 10:04:45 -08:00
jiale.tan
98fd97b65f feature(HoodieGlobalBloomIndex): adds a new type of bloom index to allow global record key lookup 2018-09-29 19:55:20 +05:30
vinothchandar
9ca6f91e97 Perform consistency checks during write finalize
- Check to ensure written files are listable on storage
 - Docs reflected to capture how this helps with s3 storage
 - Unit tests added, corrections to existing tests
 - Fix DeltaStreamer to manage archived commits in a separate folder
2018-09-28 08:04:41 +05:30
Vinoth Chandar
eca49a255e Rebasing and fixing conflicts against master 2018-09-11 11:03:30 +05:30
Nishith Agarwal
459e523d9e 1. Small file size handling for inserts into log files. In summary, the total size of the log file is compared with the parquet max file size and if there is scope to add inserts the add it. 2018-09-06 08:52:08 +08:00
Balaji Varadarajan
2e12c86d01 Ensure Compaction Operation compacts the data file as defined in the workload 2018-08-07 08:19:50 -07:00
Balaji Varadarajan
2f8ce93030 Async Compaction Main API changes 2018-08-07 08:19:50 -07:00
Balaji Varadarajan
1b61f04e05 (1) Define CompactionWorkload in avro to allow storing them in instant files.
(2) Split APIs in HoodieRealtimeCompactor to separate generating compaction workload from running compaction
2018-08-07 08:19:50 -07:00
Nishith Agarwal
3da063f83b Adding ability for inserts to be written to log files 2018-06-11 14:08:59 -07:00
Balaji Varadarajan
dfc0c61eb7 Support union mode in HoodieRealtimeRecordReader for pure insert workloads
Also Replace BufferedIteratorPayload abstraction with function passing
2018-05-10 17:39:56 -07:00
Nishith Agarwal
720e42f52a Parallelized read-write operations in Hoodie Merge phase 2018-04-12 11:46:42 -07:00
Balaji Varadarajan
788e4f2d2e CodeStyle formatting to conform to basic Checkstyle rules.
The code-style rules follow google style with some changes:

1. Increase line length from 100 to 120
2. Disable JavaDoc related checkstyles as this needs more manual work.

Both source and test code are checked for code-style
2018-03-30 11:09:40 -07:00
Nishith Agarwal
0eaa21111a Re-factoring Compaction as first level API in WriteClient similar to upsert/insert 2018-03-07 16:16:39 -08:00
Nishith Agarwal
7076c2e9f0 refactor classes to accept Map passed by RealtimeCompactor to avoid multiple map creations in HoodieMergeHandle 2018-02-07 11:16:01 -08:00
Nishith Agarwal
30049383f5 Small File Size correction handling for MOR table type 2018-02-07 11:01:10 -08:00
Nishith Agarwal
2116815261 Fixing Rollback for compaction/commit operation, added check for null commit
- Fallback to old way of rollback by listing all partitions
	- Added null check to ensure only partitions which are to be rolledback are considered
	- Added location (committime) to workload stat
	- Added checks in CompactedScanner to guard against task retries
	- Introduce new logic for rollback (bounded by instant_time and target_instant time)
        - Reversed logfiles order
2018-02-06 16:55:23 -08:00
Jian Xu
15e669c60c Incorporating code review feedback for finalizeWrite for COW #4 2018-02-02 11:38:25 -08:00
Jian Xu
3736243fb3 Rebases with latest upstream 2018-02-02 11:38:25 -08:00
Jian Xu
acae6586f3 Incorporating code review feedback for finalizeWrite for COW #3 2018-02-02 11:38:25 -08:00
Jian Xu
37f2cdd7e4 Incorporating code review feedback for finalizeWrite for COW #2 2018-02-02 11:38:25 -08:00
Jian Xu
2fe4fef625 Incorporating code review feedback for finalizeWrite for COW 2018-02-02 11:38:25 -08:00
Jian Xu
c874248f23 Add FinalizeWrite in HoodieCreateHandle for COW tables 2018-02-02 11:38:25 -08:00
Vinoth Chandar
0cd186c899 Multi FS Support
- Reviving PR 191, to make FileSystem creation off actual path
 - Streamline all filesystem access to HoodieTableMetaClient
 - Hadoop Conf from Spark Context serialized & passed to executor code too
 - Pick up env vars prefixed with HOODIE_ENV_ into Configuration object
 - Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS
 - Adding s3a to supported schemes & support escaping "." in env vars
 - Tests use HoodieTestUtils.getDefaultHadoopConf
2018-01-17 23:34:21 -08:00
Nishith Agarwal
44839b88c6 Removing compaction action type and associated compaction timeline operations, replace with commit action type 2018-01-09 09:56:15 -08:00
Nishith Agarwal
9b610f82c7 Separating out compaction() API 2017-11-14 22:56:29 -08:00
Vinoth Chandar
e45679f5e2 Reformatting code per Google Code Style all over 2017-11-12 23:19:02 -08:00
Nishith Agarwal
c7d63a7622 1) Separated rollback as a table operation 2) Implement rollback for MOR 2017-10-12 07:36:46 -07:00
Vinoth Chandar
f2980052cd Revert effects of PR #259 2017-09-28 10:29:58 -07:00
Eric Sayle
6230e15191 Update deprecated hash function
Guava deprecated hashString(String) in v15, and removed it in v16.
Replace call with hashUnencodedString(String), which replace it, to
be compatible with newer versions of Guava.
2017-09-18 17:39:19 -07:00
Omkar Joshi
ec40d04d51 Fixing UpsertPartitioner to ensure that input records are deterministically assigned to output partitions 2017-09-07 17:03:56 -07:00
Vinoth Chandar
754ab88a2d Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView
- Usage now marks code as clearly using either RO or RT views, for future evolution
  - Tests on all of FileGroups and FileSlices
2017-06-22 17:16:13 -07:00
Vinoth Chandar
c00f1a9ed9 Refactoring HoodieTableFileSystemView using FileGroups/FileSlices
- Merged all filter* and get* methods
 - new constructor takes filestatus[]
 - All existing tests pass
 - FileGroup is all files that belong to a fileID within a partition
 - FileSlice is a generation of data and log files, starting at a base commit
2017-06-22 17:16:13 -07:00
Vinoth Chandar
23e7badd8a Rename IO Handles & introduce stub for BucketedIndex
- UpdateHandle -> MergeHandle, InsertHandle -> CreateHandle
 - Also bunch of code cleanup in different places
2017-06-22 17:16:13 -07:00
Kaushik Devarajaiah
3aa8083913 Correct clean bug that causes clean failure when partitionPaths are empty 2017-06-20 15:45:32 -07:00
Kaushik Devarajaiah
521555c576 Parallelize file version deletes during clean and related tests 2017-06-15 18:20:42 -07:00
Prasanna Rajaperumal
aee136777b Fixes needed to run merge-on-read testing on production scale data 2017-04-02 22:25:47 -07:00