Vinoth Chandar
64e0573aca
Adding hoodie-spark to support Spark Datasource for Hoodie
...
- Write with COW/MOR paths work fully
- Read with RO view works on both storages*
- Incremental view supported on COW
- Refactored out HoodieReadClient methods, to just contain key based access
- HoodieDataSourceHelpers class can be now used to construct inputs to datasource
- Tests in hoodie-client using new helpers and mechanisms
- Basic tests around save modes & insert/upserts (more to follow)
- Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest
- Updated documentation to describe usage
- New sample app written using the DataSource API
2017-10-02 20:44:53 -07:00
Kaushik Devarajaiah
c98ee057fc
capture record metadata before deflating for record counting
2017-10-02 10:46:06 -07:00
Vinoth Chandar
f2980052cd
Revert effects of PR #259
2017-09-28 10:29:58 -07:00
Vinoth Chandar
9f98ae643b
Adding canIndexLogFiles(), isImplicitWithStorage(), isGlobal() to HoodieIndex
2017-09-28 10:19:29 -07:00
Eric Sayle
6230e15191
Update deprecated hash function
...
Guava deprecated hashString(String) in v15, and removed it in v16.
Replace call with hashUnencodedString(String), which replace it, to
be compatible with newer versions of Guava.
2017-09-18 17:39:19 -07:00
Omkar Joshi
5c639c0b05
Adding support for UserDefinedBulkInsertPartitioner
2017-09-08 20:55:13 -07:00
Omkar Joshi
ec40d04d51
Fixing UpsertPartitioner to ensure that input records are deterministically assigned to output partitions
2017-09-07 17:03:56 -07:00
Nishith Agarwal
e2d13c6305
Fix build failing issues
2017-09-07 10:54:36 -07:00
Nishith Agarwal
e484e91807
adding new config to separate shuffle and write parallelism
2017-08-18 16:05:25 -07:00
Nishith Agarwal
6a3c94aaa3
suppressing logs (under 4MB) for jenkins
2017-08-15 16:30:51 -07:00
Nishith Agarwal
5ee4ac40ae
Use CompletedFileSystemView instead of CompactedView considering deltacommits
2017-08-07 12:26:42 -07:00
Vinoth Chandar
45dd8980c3
Temporary fix for build break after rebase
2017-08-04 17:36:39 -07:00
Vinoth Chandar
86209640f7
Adding range based pruning to bloom index
...
- keys compared lexicographically using String::compareTo
- Range metadata additionally written into parquet file footers
- Trim fat & few optimizations to speed up indexing
- Add param to control whether input shall be cached, to speed up lookup
- Add param to turn on/off range pruning
- Auto compute of parallelism now simply factors in amount of comparisons done
- More accurate parallelism computation when range pruning is on
- tests added & hardened, docs updated
2017-08-04 13:22:13 -07:00
Nishith Agarwal
0b26b60a5c
fix for cleaning log files(mor)
2017-08-02 11:54:42 -07:00
Nishith Agarwal
19c22b231e
1. Use HoodieLogFormat to archive commits and other actions 2. Introduced avro schema for commits and compactions and an avro wrapper schema
2017-07-26 14:27:44 -07:00
Nishith Agarwal
616c9a68c3
Enabled deletes in merge_on_read
2017-07-26 13:37:27 -07:00
Prasanna Rajaperumal
7d3963b4ab
Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR)
2017-06-30 11:41:23 -07:00
Nishith Agarwal
3eba812a1b
[maven-release-plugin] prepare for next development iteration
2017-06-30 11:17:07 -07:00
Nishith Agarwal
06d44daea3
[maven-release-plugin] prepare release hoodie-0.3.9
2017-06-30 11:16:58 -07:00
Prasanna Rajaperumal
5cc071f74e
Savepoint should not create a hole in the commit timeline
2017-06-27 16:36:09 -07:00
Vinoth Chandar
754ab88a2d
Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView
...
- Usage now marks code as clearly using either RO or RT views, for future evolution
- Tests on all of FileGroups and FileSlices
2017-06-22 17:16:13 -07:00
Vinoth Chandar
c00f1a9ed9
Refactoring HoodieTableFileSystemView using FileGroups/FileSlices
...
- Merged all filter* and get* methods
- new constructor takes filestatus[]
- All existing tests pass
- FileGroup is all files that belong to a fileID within a partition
- FileSlice is a generation of data and log files, starting at a base commit
2017-06-22 17:16:13 -07:00
Vinoth Chandar
23e7badd8a
Rename IO Handles & introduce stub for BucketedIndex
...
- UpdateHandle -> MergeHandle, InsertHandle -> CreateHandle
- Also bunch of code cleanup in different places
2017-06-22 17:16:13 -07:00
Kaushik Devarajaiah
3aa8083913
Correct clean bug that causes clean failure when partitionPaths are empty
2017-06-20 15:45:32 -07:00
gekath
52c507f83e
Writes relative paths to .commit files
...
Handle case where path is read in as null from commit file
Merged with updated release
2017-06-16 12:51:19 -07:00
gekath
db7311f85e
Writes relative paths to .commit files instead of absolute paths
...
Clean up code
Removed commented out code
Fixed merge conflict with master
2017-06-16 12:51:19 -07:00
Prasanna Rajaperumal
0ed3fac5e3
[maven-release-plugin] prepare for next development iteration
2017-06-16 11:03:17 -07:00
Prasanna Rajaperumal
45732e440c
[maven-release-plugin] prepare release hoodie-0.3.8
2017-06-16 10:59:58 -07:00
Kaushik Devarajaiah
521555c576
Parallelize file version deletes during clean and related tests
2017-06-15 18:20:42 -07:00
Prasanna Rajaperumal
dda28c0b4b
Rollback inflight commits as well when rolling back to savepoint
2017-06-14 11:03:27 -07:00
Prasanna Rajaperumal
db6150c5ef
Refactor hoodie-hive
2017-06-09 13:06:33 -07:00
Prasanna Rajaperumal
933cc8071f
[maven-release-plugin] prepare for next development iteration
2017-05-24 14:02:50 -07:00
Prasanna Rajaperumal
bebae06b5b
[maven-release-plugin] prepare release hoodie-0.3.7
2017-05-24 14:02:41 -07:00
Prasanna Rajaperumal
bae98efeee
Delete other instant files (.clean) as well during commit archival
2017-05-24 13:51:49 -07:00
Prasanna Rajaperumal
240c91241b
Implement HoodieLogFormat replacing Avro as the default log format
2017-05-23 08:35:11 -07:00
Nishith Agarwal
3c984447da
view scheme added
2017-05-22 12:27:40 -07:00
Prasanna Rajaperumal
70dd7a25ea
Clean should not create a .inflight file
2017-05-22 10:48:35 -07:00
Zeeshan Qureshi
43a55b09fd
Add GCS to supported filesystems
2017-05-18 10:30:34 -07:00
Vinoth Chandar
b4e787ce1d
Update docs
2017-05-01 21:48:27 -07:00
Vinoth Chandar
da17c5c607
Introduce getCommitsAndCompactionsTimeline() explicitly & adjust usage across code base
2017-05-01 21:48:27 -07:00
Vinoth Chandar
bae0528013
Cleanup calls to HoodieTimeline.compareTimeStamps
2017-05-01 21:48:27 -07:00
Vinoth Chandar
7b1446548f
Initial impl of HoodieRealtimeInputFormat
...
- Works end-end for flat schemas
- Schema evolution & hardening remains
- HoodieClientExample can now write mor tables as well
2017-05-01 21:48:27 -07:00
Vinoth Chandar
9f526396a0
Add support for merge_on_read tables to HoodieClientExample
2017-05-01 21:48:27 -07:00
Prasanna Rajaperumal
7bca428a0a
Test to check if properties set are properly propogated
2017-04-28 12:47:14 -07:00
Prasanna Rajaperumal
3f97bdcccf
Test to check if properties set are properly propogated
2017-04-28 12:40:58 -07:00
Prasanna Rajaperumal
c3258039f0
[maven-release-plugin] prepare for next development iteration
2017-04-27 11:00:56 -07:00
Prasanna Rajaperumal
de1bdad756
[maven-release-plugin] prepare release hoodie-0.3.6
2017-04-27 11:00:45 -07:00
Prasanna Rajaperumal
8974e11161
Make sure properties set in HoodieWriteConfig is propogated down to individual configs. Fix a race condition which lets InputFormat to think file size is 0 when it is actually not
2017-04-27 10:52:25 -07:00
Prasanna Rajaperumal
91b088f29f
Implement Compaction policy abstraction. Implement LogSizeBased Bounded IO Compaction as the default strategy
2017-04-20 16:59:06 -07:00
Vinoth Chandar
2b6322318c
CR feedback
2017-04-03 18:28:01 -07:00