Vinoth Chandar
0cd186c899
Multi FS Support
...
- Reviving PR 191, to make FileSystem creation off actual path
- Streamline all filesystem access to HoodieTableMetaClient
- Hadoop Conf from Spark Context serialized & passed to executor code too
- Pick up env vars prefixed with HOODIE_ENV_ into Configuration object
- Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS
- Adding s3a to supported schemes & support escaping "." in env vars
- Tests use HoodieTestUtils.getDefaultHadoopConf
2018-01-17 23:34:21 -08:00
Nishith Agarwal
44839b88c6
Removing compaction action type and associated compaction timeline operations, replace with commit action type
2018-01-09 09:56:15 -08:00
Nishith Agarwal
4aed5c7338
Adding a new Partition/Time based compaction strategy
2017-12-05 16:30:38 -08:00
Nishith Agarwal
9b610f82c7
Separating out compaction() API
2017-11-14 22:56:29 -08:00
Vinoth Chandar
e45679f5e2
Reformatting code per Google Code Style all over
2017-11-12 23:19:02 -08:00
Nishith Agarwal
abe964bebd
Implementing custom payload/merge hooks abstractions for application specific merge logic
2017-11-07 18:55:55 -08:00
Nishith Agarwal
c7d63a7622
1) Separated rollback as a table operation 2) Implement rollback for MOR
2017-10-12 07:36:46 -07:00
Vinoth Chandar
274aaf49fe
Incorporating code review feedback for DataSource
2017-10-02 20:44:53 -07:00
Vinoth Chandar
64e0573aca
Adding hoodie-spark to support Spark Datasource for Hoodie
...
- Write with COW/MOR paths work fully
- Read with RO view works on both storages*
- Incremental view supported on COW
- Refactored out HoodieReadClient methods, to just contain key based access
- HoodieDataSourceHelpers class can be now used to construct inputs to datasource
- Tests in hoodie-client using new helpers and mechanisms
- Basic tests around save modes & insert/upserts (more to follow)
- Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest
- Updated documentation to describe usage
- New sample app written using the DataSource API
2017-10-02 20:44:53 -07:00
Kaushik Devarajaiah
c98ee057fc
capture record metadata before deflating for record counting
2017-10-02 10:46:06 -07:00
Vinoth Chandar
f2980052cd
Revert effects of PR #259
2017-09-28 10:29:58 -07:00
Vinoth Chandar
9f98ae643b
Adding canIndexLogFiles(), isImplicitWithStorage(), isGlobal() to HoodieIndex
2017-09-28 10:19:29 -07:00
Eric Sayle
6230e15191
Update deprecated hash function
...
Guava deprecated hashString(String) in v15, and removed it in v16.
Replace call with hashUnencodedString(String), which replace it, to
be compatible with newer versions of Guava.
2017-09-18 17:39:19 -07:00
Omkar Joshi
5c639c0b05
Adding support for UserDefinedBulkInsertPartitioner
2017-09-08 20:55:13 -07:00
Omkar Joshi
ec40d04d51
Fixing UpsertPartitioner to ensure that input records are deterministically assigned to output partitions
2017-09-07 17:03:56 -07:00
Nishith Agarwal
e2d13c6305
Fix build failing issues
2017-09-07 10:54:36 -07:00
Nishith Agarwal
e484e91807
adding new config to separate shuffle and write parallelism
2017-08-18 16:05:25 -07:00
Nishith Agarwal
6a3c94aaa3
suppressing logs (under 4MB) for jenkins
2017-08-15 16:30:51 -07:00
Nishith Agarwal
5ee4ac40ae
Use CompletedFileSystemView instead of CompactedView considering deltacommits
2017-08-07 12:26:42 -07:00
Vinoth Chandar
45dd8980c3
Temporary fix for build break after rebase
2017-08-04 17:36:39 -07:00
Vinoth Chandar
86209640f7
Adding range based pruning to bloom index
...
- keys compared lexicographically using String::compareTo
- Range metadata additionally written into parquet file footers
- Trim fat & few optimizations to speed up indexing
- Add param to control whether input shall be cached, to speed up lookup
- Add param to turn on/off range pruning
- Auto compute of parallelism now simply factors in amount of comparisons done
- More accurate parallelism computation when range pruning is on
- tests added & hardened, docs updated
2017-08-04 13:22:13 -07:00
Nishith Agarwal
0b26b60a5c
fix for cleaning log files(mor)
2017-08-02 11:54:42 -07:00
Nishith Agarwal
19c22b231e
1. Use HoodieLogFormat to archive commits and other actions 2. Introduced avro schema for commits and compactions and an avro wrapper schema
2017-07-26 14:27:44 -07:00
Nishith Agarwal
616c9a68c3
Enabled deletes in merge_on_read
2017-07-26 13:37:27 -07:00
Prasanna Rajaperumal
5cc071f74e
Savepoint should not create a hole in the commit timeline
2017-06-27 16:36:09 -07:00
Vinoth Chandar
754ab88a2d
Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView
...
- Usage now marks code as clearly using either RO or RT views, for future evolution
- Tests on all of FileGroups and FileSlices
2017-06-22 17:16:13 -07:00
Vinoth Chandar
c00f1a9ed9
Refactoring HoodieTableFileSystemView using FileGroups/FileSlices
...
- Merged all filter* and get* methods
- new constructor takes filestatus[]
- All existing tests pass
- FileGroup is all files that belong to a fileID within a partition
- FileSlice is a generation of data and log files, starting at a base commit
2017-06-22 17:16:13 -07:00
Vinoth Chandar
23e7badd8a
Rename IO Handles & introduce stub for BucketedIndex
...
- UpdateHandle -> MergeHandle, InsertHandle -> CreateHandle
- Also bunch of code cleanup in different places
2017-06-22 17:16:13 -07:00
Kaushik Devarajaiah
3aa8083913
Correct clean bug that causes clean failure when partitionPaths are empty
2017-06-20 15:45:32 -07:00
gekath
52c507f83e
Writes relative paths to .commit files
...
Handle case where path is read in as null from commit file
Merged with updated release
2017-06-16 12:51:19 -07:00
gekath
db7311f85e
Writes relative paths to .commit files instead of absolute paths
...
Clean up code
Removed commented out code
Fixed merge conflict with master
2017-06-16 12:51:19 -07:00
Kaushik Devarajaiah
521555c576
Parallelize file version deletes during clean and related tests
2017-06-15 18:20:42 -07:00
Prasanna Rajaperumal
dda28c0b4b
Rollback inflight commits as well when rolling back to savepoint
2017-06-14 11:03:27 -07:00
Prasanna Rajaperumal
db6150c5ef
Refactor hoodie-hive
2017-06-09 13:06:33 -07:00
Prasanna Rajaperumal
bae98efeee
Delete other instant files (.clean) as well during commit archival
2017-05-24 13:51:49 -07:00
Prasanna Rajaperumal
240c91241b
Implement HoodieLogFormat replacing Avro as the default log format
2017-05-23 08:35:11 -07:00
Nishith Agarwal
3c984447da
view scheme added
2017-05-22 12:27:40 -07:00
Prasanna Rajaperumal
70dd7a25ea
Clean should not create a .inflight file
2017-05-22 10:48:35 -07:00
Zeeshan Qureshi
43a55b09fd
Add GCS to supported filesystems
2017-05-18 10:30:34 -07:00
Vinoth Chandar
b4e787ce1d
Update docs
2017-05-01 21:48:27 -07:00
Vinoth Chandar
da17c5c607
Introduce getCommitsAndCompactionsTimeline() explicitly & adjust usage across code base
2017-05-01 21:48:27 -07:00
Vinoth Chandar
bae0528013
Cleanup calls to HoodieTimeline.compareTimeStamps
2017-05-01 21:48:27 -07:00
Vinoth Chandar
7b1446548f
Initial impl of HoodieRealtimeInputFormat
...
- Works end-end for flat schemas
- Schema evolution & hardening remains
- HoodieClientExample can now write mor tables as well
2017-05-01 21:48:27 -07:00
Vinoth Chandar
9f526396a0
Add support for merge_on_read tables to HoodieClientExample
2017-05-01 21:48:27 -07:00
Prasanna Rajaperumal
7bca428a0a
Test to check if properties set are properly propogated
2017-04-28 12:47:14 -07:00
Prasanna Rajaperumal
3f97bdcccf
Test to check if properties set are properly propogated
2017-04-28 12:40:58 -07:00
Prasanna Rajaperumal
8974e11161
Make sure properties set in HoodieWriteConfig is propogated down to individual configs. Fix a race condition which lets InputFormat to think file size is 0 when it is actually not
2017-04-27 10:52:25 -07:00
Prasanna Rajaperumal
91b088f29f
Implement Compaction policy abstraction. Implement LogSizeBased Bounded IO Compaction as the default strategy
2017-04-20 16:59:06 -07:00
Vinoth Chandar
2b6322318c
CR feedback
2017-04-03 18:28:01 -07:00
Vinoth Chandar
e0fc4ec38e
Documentation update + helper method for WriteConfig builder
2017-04-03 18:28:01 -07:00