Omkar Joshi
5c639c0b05
Adding support for UserDefinedBulkInsertPartitioner
2017-09-08 20:55:13 -07:00
Omkar Joshi
ec40d04d51
Fixing UpsertPartitioner to ensure that input records are deterministically assigned to output partitions
2017-09-07 17:03:56 -07:00
Nishith Agarwal
e2d13c6305
Fix build failing issues
2017-09-07 10:54:36 -07:00
Nishith Agarwal
e484e91807
adding new config to separate shuffle and write parallelism
2017-08-18 16:05:25 -07:00
Nishith Agarwal
6a3c94aaa3
suppressing logs (under 4MB) for jenkins
2017-08-15 16:30:51 -07:00
Nishith Agarwal
5ee4ac40ae
Use CompletedFileSystemView instead of CompactedView considering deltacommits
2017-08-07 12:26:42 -07:00
Vinoth Chandar
45dd8980c3
Temporary fix for build break after rebase
2017-08-04 17:36:39 -07:00
Vinoth Chandar
86209640f7
Adding range based pruning to bloom index
...
- keys compared lexicographically using String::compareTo
- Range metadata additionally written into parquet file footers
- Trim fat & few optimizations to speed up indexing
- Add param to control whether input shall be cached, to speed up lookup
- Add param to turn on/off range pruning
- Auto compute of parallelism now simply factors in amount of comparisons done
- More accurate parallelism computation when range pruning is on
- tests added & hardened, docs updated
2017-08-04 13:22:13 -07:00
Nishith Agarwal
0b26b60a5c
fix for cleaning log files(mor)
2017-08-02 11:54:42 -07:00
Nishith Agarwal
19c22b231e
1. Use HoodieLogFormat to archive commits and other actions 2. Introduced avro schema for commits and compactions and an avro wrapper schema
2017-07-26 14:27:44 -07:00
Nishith Agarwal
616c9a68c3
Enabled deletes in merge_on_read
2017-07-26 13:37:27 -07:00
Prasanna Rajaperumal
5cc071f74e
Savepoint should not create a hole in the commit timeline
2017-06-27 16:36:09 -07:00
Vinoth Chandar
754ab88a2d
Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView
...
- Usage now marks code as clearly using either RO or RT views, for future evolution
- Tests on all of FileGroups and FileSlices
2017-06-22 17:16:13 -07:00
Vinoth Chandar
c00f1a9ed9
Refactoring HoodieTableFileSystemView using FileGroups/FileSlices
...
- Merged all filter* and get* methods
- new constructor takes filestatus[]
- All existing tests pass
- FileGroup is all files that belong to a fileID within a partition
- FileSlice is a generation of data and log files, starting at a base commit
2017-06-22 17:16:13 -07:00
Vinoth Chandar
23e7badd8a
Rename IO Handles & introduce stub for BucketedIndex
...
- UpdateHandle -> MergeHandle, InsertHandle -> CreateHandle
- Also bunch of code cleanup in different places
2017-06-22 17:16:13 -07:00
Kaushik Devarajaiah
3aa8083913
Correct clean bug that causes clean failure when partitionPaths are empty
2017-06-20 15:45:32 -07:00
gekath
52c507f83e
Writes relative paths to .commit files
...
Handle case where path is read in as null from commit file
Merged with updated release
2017-06-16 12:51:19 -07:00
gekath
db7311f85e
Writes relative paths to .commit files instead of absolute paths
...
Clean up code
Removed commented out code
Fixed merge conflict with master
2017-06-16 12:51:19 -07:00
Kaushik Devarajaiah
521555c576
Parallelize file version deletes during clean and related tests
2017-06-15 18:20:42 -07:00
Prasanna Rajaperumal
dda28c0b4b
Rollback inflight commits as well when rolling back to savepoint
2017-06-14 11:03:27 -07:00
Prasanna Rajaperumal
db6150c5ef
Refactor hoodie-hive
2017-06-09 13:06:33 -07:00
Prasanna Rajaperumal
bae98efeee
Delete other instant files (.clean) as well during commit archival
2017-05-24 13:51:49 -07:00
Prasanna Rajaperumal
240c91241b
Implement HoodieLogFormat replacing Avro as the default log format
2017-05-23 08:35:11 -07:00
Nishith Agarwal
3c984447da
view scheme added
2017-05-22 12:27:40 -07:00
Prasanna Rajaperumal
70dd7a25ea
Clean should not create a .inflight file
2017-05-22 10:48:35 -07:00
Zeeshan Qureshi
43a55b09fd
Add GCS to supported filesystems
2017-05-18 10:30:34 -07:00
Vinoth Chandar
b4e787ce1d
Update docs
2017-05-01 21:48:27 -07:00
Vinoth Chandar
da17c5c607
Introduce getCommitsAndCompactionsTimeline() explicitly & adjust usage across code base
2017-05-01 21:48:27 -07:00
Vinoth Chandar
bae0528013
Cleanup calls to HoodieTimeline.compareTimeStamps
2017-05-01 21:48:27 -07:00
Vinoth Chandar
7b1446548f
Initial impl of HoodieRealtimeInputFormat
...
- Works end-end for flat schemas
- Schema evolution & hardening remains
- HoodieClientExample can now write mor tables as well
2017-05-01 21:48:27 -07:00
Vinoth Chandar
9f526396a0
Add support for merge_on_read tables to HoodieClientExample
2017-05-01 21:48:27 -07:00
Prasanna Rajaperumal
7bca428a0a
Test to check if properties set are properly propogated
2017-04-28 12:47:14 -07:00
Prasanna Rajaperumal
3f97bdcccf
Test to check if properties set are properly propogated
2017-04-28 12:40:58 -07:00
Prasanna Rajaperumal
8974e11161
Make sure properties set in HoodieWriteConfig is propogated down to individual configs. Fix a race condition which lets InputFormat to think file size is 0 when it is actually not
2017-04-27 10:52:25 -07:00
Prasanna Rajaperumal
91b088f29f
Implement Compaction policy abstraction. Implement LogSizeBased Bounded IO Compaction as the default strategy
2017-04-20 16:59:06 -07:00
Vinoth Chandar
2b6322318c
CR feedback
2017-04-03 18:28:01 -07:00
Vinoth Chandar
e0fc4ec38e
Documentation update + helper method for WriteConfig builder
2017-04-03 18:28:01 -07:00
Vinoth Chandar
dce35ff0d7
Adding a config to control whether date partitioning can be assumed
...
- false by default
- CAUTION: If you have an existing tables without partition metadata, you need to set this to "true"
2017-04-03 18:28:01 -07:00
Vinoth Chandar
f9fd16069d
FSUtils.getAllPartitionsPaths() works based on .hoodie_partition_metadata
...
- clean/rollback/write paths covered by existing tests
- Snapshot copier fixed to copy metadata file also, and test fixed
- Existing tables need to be repaired by addition of metadata, before this can be rolled out
2017-04-03 18:28:01 -07:00
Vinoth Chandar
3129770fd0
Create .hoodie_partition_metadata in each partition, linking back to basepath
...
- Concurreny handled via taskID, failure recovery handled via renames
- Falls back to search 3 levels up
- Cli tool has command to add this to existing tables
2017-04-03 18:28:01 -07:00
Prasanna Rajaperumal
1e802ad4f2
Move HoodieAvroReader to hoodie-common, it will be used for compaction and in the record reader
2017-04-03 13:58:35 -07:00
Prasanna Rajaperumal
aee136777b
Fixes needed to run merge-on-read testing on production scale data
2017-04-02 22:25:47 -07:00
Yash Sharma
bca7e7dae4
improve documentations
2017-03-28 05:08:54 -07:00
Yash Sharma
d6f94b998d
Hoodie operability with S3
2017-03-28 05:08:54 -07:00
prazanna
0e3f635adb
remove hardcoding of autoClean
2017-03-23 15:54:26 -07:00
Zeeshan Qureshi
a94f3a638e
Pass table path as argument to HoodieClientExample
2017-03-23 08:12:20 -07:00
fishie9
b7047ab4fb
Pass in String StroageLevel for WriteStatus ( #113 )
2017-03-23 04:31:30 -07:00
prazanna
f1b7afad21
Add config for index parallelism and make clean public ( #109 )
...
* Add config for index parallelism and make clean public
* Review comments on clean api modification
2017-03-21 17:36:46 -07:00
ovj
21898907c1
tool for importing hive tables (in parquet format) into hoodie dataset ( #89 )
...
* tool for importing hive tables (in parquet format) into hoodie dataset
* review fixes
* review fixes
* review fixes
2017-03-21 14:42:13 -07:00
prazanna
d835710c51
Metadata timeline marks an already complete instant as complete again ( #98 )
2017-03-17 12:42:26 -07:00