vinoth chandar
a1c0d0dbad
Update README.md
...
Reflect hudi
2017-12-10 07:50:37 -08:00
Nishith Agarwal
4aed5c7338
Adding a new Partition/Time based compaction strategy
2017-12-05 16:30:38 -08:00
Nishith Agarwal
051f600b7f
Enable hive sync even if there is no compaction commit
2017-11-30 18:22:58 -08:00
Nishith Agarwal
9b610f82c7
Separating out compaction() API
2017-11-14 22:56:29 -08:00
Vinoth Chandar
e45679f5e2
Reformatting code per Google Code Style all over
2017-11-12 23:19:02 -08:00
Vinoth Chandar
5a62480a92
Update docs on code style setup
2017-11-12 23:19:02 -08:00
Nishith Agarwal
abe964bebd
Implementing custom payload/merge hooks abstractions for application specific merge logic
2017-11-07 18:55:55 -08:00
Nishith Agarwal
c7d63a7622
1) Separated rollback as a table operation 2) Implement rollback for MOR
2017-10-12 07:36:46 -07:00
Vinoth Chandar
e1fe3ab937
[maven-release-plugin] prepare for next development iteration
2017-10-02 22:42:54 -07:00
Vinoth Chandar
50139fe904
[maven-release-plugin] prepare release hoodie-0.4.0
2017-10-02 22:42:32 -07:00
Vinoth Chandar
3768ad45fb
Release notes for 0.4.0
2017-10-02 22:26:22 -07:00
Vinoth Chandar
274aaf49fe
Incorporating code review feedback for DataSource
2017-10-02 20:44:53 -07:00
Vinoth Chandar
64e0573aca
Adding hoodie-spark to support Spark Datasource for Hoodie
...
- Write with COW/MOR paths work fully
- Read with RO view works on both storages*
- Incremental view supported on COW
- Refactored out HoodieReadClient methods, to just contain key based access
- HoodieDataSourceHelpers class can be now used to construct inputs to datasource
- Tests in hoodie-client using new helpers and mechanisms
- Basic tests around save modes & insert/upserts (more to follow)
- Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest
- Updated documentation to describe usage
- New sample app written using the DataSource API
2017-10-02 20:44:53 -07:00
Kaushik Devarajaiah
c98ee057fc
capture record metadata before deflating for record counting
2017-10-02 10:46:06 -07:00
Vinoth Chandar
f2980052cd
Revert effects of PR #259
2017-09-28 10:29:58 -07:00
Vinoth Chandar
9f98ae643b
Adding canIndexLogFiles(), isImplicitWithStorage(), isGlobal() to HoodieIndex
2017-09-28 10:19:29 -07:00
Eric Sayle
6230e15191
Update deprecated hash function
...
Guava deprecated hashString(String) in v15, and removed it in v16.
Replace call with hashUnencodedString(String), which replace it, to
be compatible with newer versions of Guava.
2017-09-18 17:39:19 -07:00
Jian Xu
7e9a4a89dd
Use getFileStatus to get single FileStatus for single file
2017-09-11 11:24:44 -07:00
Omkar Joshi
5c639c0b05
Adding support for UserDefinedBulkInsertPartitioner
2017-09-08 20:55:13 -07:00
Omkar Joshi
ec40d04d51
Fixing UpsertPartitioner to ensure that input records are deterministically assigned to output partitions
2017-09-07 17:03:56 -07:00
Nishith Agarwal
e2d13c6305
Fix build failing issues
2017-09-07 10:54:36 -07:00
Nishith Agarwal
63f1b12355
adding ability to read archived files written in log format
2017-08-25 14:40:07 -07:00
Nishith Agarwal
e484e91807
adding new config to separate shuffle and write parallelism
2017-08-18 16:05:25 -07:00
Jian Xu
b1cf097b0c
Add nested fields support for MOR tables
2017-08-16 10:35:26 -07:00
Nishith Agarwal
6a3c94aaa3
suppressing logs (under 4MB) for jenkins
2017-08-15 16:30:51 -07:00
Nishith Agarwal
5ee4ac40ae
Use CompletedFileSystemView instead of CompactedView considering deltacommits
2017-08-07 12:26:42 -07:00
Vinoth Chandar
45dd8980c3
Temporary fix for build break after rebase
2017-08-04 17:36:39 -07:00
Vinoth Chandar
86209640f7
Adding range based pruning to bloom index
...
- keys compared lexicographically using String::compareTo
- Range metadata additionally written into parquet file footers
- Trim fat & few optimizations to speed up indexing
- Add param to control whether input shall be cached, to speed up lookup
- Add param to turn on/off range pruning
- Auto compute of parallelism now simply factors in amount of comparisons done
- More accurate parallelism computation when range pruning is on
- tests added & hardened, docs updated
2017-08-04 13:22:13 -07:00
Nishith Agarwal
0b26b60a5c
fix for cleaning log files(mor)
2017-08-02 11:54:42 -07:00
Nishith Agarwal
19c22b231e
1. Use HoodieLogFormat to archive commits and other actions 2. Introduced avro schema for commits and compactions and an avro wrapper schema
2017-07-26 14:27:44 -07:00
Nishith Agarwal
616c9a68c3
Enabled deletes in merge_on_read
2017-07-26 13:37:27 -07:00
Vinoth Chandar
cf1dde0323
Add recent talks/presentations to documentation
2017-07-08 22:47:15 -07:00
Vinoth Chandar
e8b3ddd7cb
Add note on community engagement to committership guidelines
2017-07-08 22:47:15 -07:00
Prasanna Rajaperumal
7d3963b4ab
Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR)
2017-06-30 11:41:23 -07:00
Nishith Agarwal
3eba812a1b
[maven-release-plugin] prepare for next development iteration
2017-06-30 11:17:07 -07:00
Nishith Agarwal
06d44daea3
[maven-release-plugin] prepare release hoodie-0.3.9
2017-06-30 11:16:58 -07:00
Nishith Agarwal
348250d960
Using FsUtils instead of Files API to extract file extension
2017-06-29 19:26:31 -07:00
Nishith Agarwal
e5d9b818bc
Sync Tool registers 2 tables, RO and RT Tables
2017-06-28 15:41:36 -07:00
Prasanna Rajaperumal
5cc071f74e
Savepoint should not create a hole in the commit timeline
2017-06-27 16:36:09 -07:00
Jian Xu
29b906b763
Fix TimestampBasedKeyGenerator when DATE_STRING is used for partitionpath.field
2017-06-27 13:02:06 -07:00
Vinoth Chandar
754ab88a2d
Introduce ReadOptimizedView & RealtimeView out of TableFileSystemView
...
- Usage now marks code as clearly using either RO or RT views, for future evolution
- Tests on all of FileGroups and FileSlices
2017-06-22 17:16:13 -07:00
Vinoth Chandar
c00f1a9ed9
Refactoring HoodieTableFileSystemView using FileGroups/FileSlices
...
- Merged all filter* and get* methods
- new constructor takes filestatus[]
- All existing tests pass
- FileGroup is all files that belong to a fileID within a partition
- FileSlice is a generation of data and log files, starting at a base commit
2017-06-22 17:16:13 -07:00
Vinoth Chandar
23e7badd8a
Rename IO Handles & introduce stub for BucketedIndex
...
- UpdateHandle -> MergeHandle, InsertHandle -> CreateHandle
- Also bunch of code cleanup in different places
2017-06-22 17:16:13 -07:00
prazanna
b0a2a23372
Adding Nishith to Contributors list
2017-06-20 15:48:43 -07:00
prazanna
649475c5cb
Adding Kaushik to contributors list
2017-06-20 15:47:05 -07:00
Kaushik Devarajaiah
3aa8083913
Correct clean bug that causes clean failure when partitionPaths are empty
2017-06-20 15:45:32 -07:00
prazanna
7ef76a4de0
Adding Kathy Ge to the contributors list
2017-06-16 12:52:54 -07:00
gekath
52c507f83e
Writes relative paths to .commit files
...
Handle case where path is read in as null from commit file
Merged with updated release
2017-06-16 12:51:19 -07:00
gekath
db7311f85e
Writes relative paths to .commit files instead of absolute paths
...
Clean up code
Removed commented out code
Fixed merge conflict with master
2017-06-16 12:51:19 -07:00
Prasanna Rajaperumal
0ed3fac5e3
[maven-release-plugin] prepare for next development iteration
2017-06-16 11:03:17 -07:00