1
0
Commit Graph

36 Commits

Author SHA1 Message Date
Vinoth Chandar
b791473a6d Introduce HoodieReadHandle abstraction into index
- Generalized BloomIndex to work with file ids instead of paths
 - Abstracted away Bloom filter checking into HoodieLookupHandle
 - Abstracted away range information retrieval into HoodieRangeInfoHandle
2019-06-12 10:46:14 -07:00
Balaji Varadarajan
479908fd20 HUDI-125 : Change License for all source files and update RAT configurations 2019-06-09 11:41:55 -07:00
Balaji Varadarajan
145034c5fa Spark Stage retry handling 2019-05-21 14:49:51 -07:00
Balaji Varadarajan
64fec64097 Timeline Service with Incremental View Syncing support 2019-05-16 13:25:33 -07:00
Balaji Varadarajan
9cce9abf4d Fix various errors found by long running delta-streamer tests
1. Parquet Avro schema mismatch errors when ingesting are sometimes silently ignored due to race-condition in BoundedInMemoryExecutor. This was reproducible when running long-running delta-streamer with wrong schema and it caused data-loss
  2. Fix behavior of Delta-Streamer to error out by default if there are any error records
  3. Fix a bug in tracking write errors in WriteStats. Earlier the write errors were tracking sampled errors as opposed to total errors.
  4. Delta Streamer does not commit the changes done as part of inline compaction as auto-commit is force disabled. Fix this behavior to always auto-commit inline compaction as it would not otherwise commit.
2019-05-13 10:47:34 -07:00
Nishith Agarwal
3d9041e216 Fixing source schema and writer schema distinction in payloads 2019-03-26 19:44:27 -07:00
Balaji Varadarajan
3ae6cb4ed5 FileSystem View must treat same fileIds present in different partitions as different file-groups and handle pending compaction correctly 2019-03-01 10:49:04 -08:00
Nishith Agarwal
7985eb72b5 Fixing behavior of Merge/CreateHandle for invalid/wrong schema records 2019-01-28 16:01:03 -08:00
Nishith Agarwal
110df7190b Enabling hard deletes for MergeOnRead table type 2018-12-31 12:49:58 -08:00
arukavytsia
6946dd7557 General enhancements 2018-12-18 12:52:39 -08:00
Balaji Varadarajan
f999e4960c Avoid WriteStatus collect() call when committing batch 2018-11-28 10:41:49 -08:00
Nishith Agarwal
0fe92dee55 Fix a failing test case intermittenly in TestMergeOnRead due to incorrect prev commit time 2018-09-08 09:39:18 +08:00
Nishith Agarwal
459e523d9e 1. Small file size handling for inserts into log files. In summary, the total size of the log file is compared with the parquet max file size and if there is scope to add inserts the add it. 2018-09-06 08:52:08 +08:00
Balaji Varadarajan
6d01ae8ca0 FileSystemView and Timeline level changes to support Async Compaction 2018-08-07 08:19:50 -07:00
Nishith Agarwal
3da063f83b Adding ability for inserts to be written to log files 2018-06-11 14:08:59 -07:00
Nishith Agarwal
04655e9e85 Adding metrics for MOR and COW 2018-04-26 09:32:45 -07:00
Balaji Varadarajan
788e4f2d2e CodeStyle formatting to conform to basic Checkstyle rules.
The code-style rules follow google style with some changes:

1. Increase line length from 100 to 120
2. Disable JavaDoc related checkstyles as this needs more manual work.

Both source and test code are checked for code-style
2018-03-30 11:09:40 -07:00
Omkar Joshi
c5b4cb1b75 Spawning parallel writer thread to separate reading records from spark and writing records to parquet file 2018-03-15 16:58:14 -07:00
Nishith Agarwal
5405a6287b Introducing HoodieLogFormat V2 with versioning support
- HoodieLogFormat V2 has support for LogFormat evolution through versioning
			- LogVersion is associated with a LogBlock not a LogFile
			- Based on a version for a LogBlock, approporiate code path is executed
		- Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff
		- Implemented Reverse pointer to be able to traverse the log in reverse
		- Introduce new MAGIC for backwards compatibility with logs without versions
2018-03-06 21:14:11 -08:00
Nishith Agarwal
6fec9655a8 Added support for Disk Spillable Compaction to prevent OOM issues 2018-02-26 16:00:35 -08:00
Nishith Agarwal
d495484399 Write smaller sized multiple blocks to log file instead of a large one
- Use SizeEstimator to size number of records to write
	- Configurable block size
   	- Configurable log file size
2018-02-23 07:31:39 -08:00
Nishith Agarwal
2116815261 Fixing Rollback for compaction/commit operation, added check for null commit
- Fallback to old way of rollback by listing all partitions
	- Added null check to ensure only partitions which are to be rolledback are considered
	- Added location (committime) to workload stat
	- Added checks in CompactedScanner to guard against task retries
	- Introduce new logic for rollback (bounded by instant_time and target_instant time)
        - Reversed logfiles order
2018-02-06 16:55:23 -08:00
Nishith Agarwal
e10100fe32 Reducing list status calls from listing logfile versions, some associated refactoring 2018-01-29 08:26:39 -08:00
Nishith Agarwal
937ae322ba Reducing memory footprint required in HoodieAvroDataBlock and HoodieAppendHandle 2018-01-29 08:22:29 -08:00
Vinoth Chandar
e45679f5e2 Reformatting code per Google Code Style all over 2017-11-12 23:19:02 -08:00
Nishith Agarwal
c7d63a7622 1) Separated rollback as a table operation 2) Implement rollback for MOR 2017-10-12 07:36:46 -07:00
Kaushik Devarajaiah
c98ee057fc capture record metadata before deflating for record counting 2017-10-02 10:46:06 -07:00
Nishith Agarwal
616c9a68c3 Enabled deletes in merge_on_read 2017-07-26 13:37:27 -07:00
Vinoth Chandar
c00f1a9ed9 Refactoring HoodieTableFileSystemView using FileGroups/FileSlices
- Merged all filter* and get* methods
 - new constructor takes filestatus[]
 - All existing tests pass
 - FileGroup is all files that belong to a fileID within a partition
 - FileSlice is a generation of data and log files, starting at a base commit
2017-06-22 17:16:13 -07:00
Vinoth Chandar
23e7badd8a Rename IO Handles & introduce stub for BucketedIndex
- UpdateHandle -> MergeHandle, InsertHandle -> CreateHandle
 - Also bunch of code cleanup in different places
2017-06-22 17:16:13 -07:00
gekath
52c507f83e Writes relative paths to .commit files
Handle case where path is read in as null from commit file

Merged with updated release
2017-06-16 12:51:19 -07:00
gekath
db7311f85e Writes relative paths to .commit files instead of absolute paths
Clean up code

Removed commented out code

Fixed merge conflict with master
2017-06-16 12:51:19 -07:00
Prasanna Rajaperumal
240c91241b Implement HoodieLogFormat replacing Avro as the default log format 2017-05-23 08:35:11 -07:00
Prasanna Rajaperumal
aee136777b Fixes needed to run merge-on-read testing on production scale data 2017-04-02 22:25:47 -07:00
siddharthagunda
348a48aa80 Add delete support to Hoodie (#85) 2017-03-04 01:33:49 -08:00
prazanna
eb46e7c72b Implement Merge on Read Storage (#76)
1. Create HoodieTable abstraction for commits and fileSystemView
2. HoodieMergeOnReadTable created
3. View is now always obtained from the table and the correct view based on the table type is returned
2017-02-21 16:24:38 -08:00