1
0
Commit Graph

125 Commits

Author SHA1 Message Date
shenhong
3387b3841f [HUDI-1005] fix NPE in HoodieWriteClient.clean 2020-06-09 05:57:04 -07:00
Shen Hong
6318e943d1 [HUDI-1016] Code optimization in MergeOnReadRollbackActionExecutor(#1718) 2020-06-09 19:14:26 +08:00
garyli1019
22cd824d99 HUDI-494 fix incorrect record size estimation 2020-06-08 20:29:29 -07:00
Balaji Varadarajan
fb283934a3 [HUDI-990] Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly. Also ensure timeline gets reloaded after we revert committed transactions 2020-06-04 02:52:21 -07:00
Raymond Xu
742c204099 [HUDI-811] Restructure test packages in hudi-client/cli (#1689) 2020-06-02 10:25:42 +08:00
sathyaprakashg
d3edac4612 HUDI-921 Remove inlineCompactionEvery method in HoodieCompactionConfig.Builder (#1654)
Co-authored-by: Sathyaprakash Govindasamy <sathyaprakashg@zillowgroup.com>
2020-05-24 01:09:18 -07:00
Raymond Xu
f34de3fb27 [HUDI-836] Implement datadog metrics reporter (#1572)
- Adds support for emitting metrics to datadog
- Tests, configs..
2020-05-22 09:14:21 -07:00
Balaji Varadarajan
74ecc27e92 [HUDI-846][HUDI-848] Enable Incremental cleaning and embedded timeline-server by default (#1634) 2020-05-20 05:29:43 -07:00
Balaji Varadarajan
e6f3bf10cf [HUDI-858] Allow multiple operations to be executed within a single commit (#1633) 2020-05-18 19:27:24 -07:00
Sivabalan Narayanan
29edf4b3b8 [HUDI-407] Adding Simple Index to Hoodie. (#1402)
This index finds the location by joining incoming records with records from base files.
2020-05-17 18:32:24 -07:00
Balaji Varadarajan
3c9da2e5f0 [HUDI-895] Remove unnecessary listing .hoodie folder when using timeline server (#1636) 2020-05-17 18:18:53 -07:00
Mathieu
25a0080b2f [HUDI-714]Add javadoc and comments to hudi write method link (#1409)
* [HUDI-714] Add javadoc and comments to hudi write method link
2020-05-16 08:36:51 -04:00
Shen Hong
e8ffc6f0aa [HUDI-881] Replace part of spark context by hadoop configuration in AbstractHoodieClient and HoodieReadClient (#1620) 2020-05-12 09:33:29 -07:00
Shen Hong
295d00beea [HUDI-880] Replace part of spark context by hadoop configuration in HoodieTable. (#1614) 2020-05-11 23:33:57 -07:00
Shen Hong
6dac10115c [HUDI-870] Remove spark context in ClientUtils and HoodieIndex (#1609) 2020-05-11 19:05:36 +08:00
Carm
fa6aba751d [MINOR] fixed building IndexFileFilter with a wrong condition in HoodieGlobalBloomIndex class (#1537) 2020-05-10 09:45:07 +08:00
Udit Mehrotra
d54b4b8a52 [HUDI-838] Support schema from HoodieCommitMetadata for HiveSync (#1559)
Co-authored-by: Mehrotra <uditme@amazon.com>
2020-05-07 16:33:09 -07:00
Balaji Varadarajan
506447fd4f [HUDI-850] Avoid unnecessary listings in incremental cleaning mode (#1576) 2020-05-01 21:37:21 -07:00
vinoth chandar
c4b71622b9 [MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability (#1575)
- reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc
2020-04-30 09:19:39 -07:00
satishkotha
6de9f5d9e5 [HUDI-819] Fix a bug with MergeOnReadLazyInsertIterable.
Variable declared here[1] masks protected statuses variable. So although hoodie writes data, will not include writestatus in the completed section. This can cause duplicates being written (#1540)
[1] https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53
2020-04-27 12:50:39 -07:00
vinoth chandar
19ca0b5629 [HUDI-785] Refactor compaction/savepoint execution based on ActionExector abstraction (#1548)
- Savepoint and compaction classes moved to table.action.* packages
 - HoodieWriteClient#savepoint(...) returns void
 - Renamed HoodieCommitArchiveLog -> HoodieTimelineArchiveLog
 - Fixed tests to take into account the additional validation done
 - Moved helper code into CompactHelpers and SavepointHelpers
2020-04-25 18:26:44 -07:00
Alexander Filipchik
aea7c1657e [HUDI-795] Handle auto-deleted empty aux folder (#1515)
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>
2020-04-22 09:47:32 -07:00
leesf
26684f5984 [HUDI-816] Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678 (#1536) 2020-04-22 16:33:18 +08:00
Dongwook
ddd105bb31 [HUDI-772] Make UserDefinedBulkInsertPartitioner configurable for DataSource (#1500) 2020-04-20 08:38:18 -07:00
lw0090
09fd6f64c5 [HUDI-800] Fix Metrics getReporter().close() throws NPE. (#1529) 2020-04-19 21:33:07 +08:00
baobaoyeye
75523657a4 [MINOR] use Option and fix description in toString method (#1527)
* [MINOR] fix some places are not elegant, as a newcomer

* [MINOR] fix some places are not elegant, as a newcomer
2020-04-18 12:51:37 +08:00
Prashant Wason
19d29ac7d0 [HUDI-741] Added checks to validate Hoodie's schema evolution.
HUDI specific validation of schema evolution should ensure that a newer schema can be used for the dataset by checking that the data written using the old schema can be read using the new schema.

Code changes:

1. Added a new config in HoodieWriteConfig to enable schema validation check (disabled by default)
2. Moved code that reads schema from base/log files into hudi-common from hudi-hive-sync
3. Added writerSchema to the extraMetadata of compaction commits in MOR table. This is same as that for commits on COW table.

Testing changes:

4. Extended TestHoodieClientBase to add insertBatch API which allows inserting a new batch of unique records into a HUDI table
5. Added a unit test to verify schema evolution for both COW and MOR tables.
6. Added unit tests for schema compatiblity checks.
2020-04-15 23:34:59 -07:00
vinoth chandar
661b0b3bab [HUDI-761] Refactoring rollback and restore actions using the ActionExecutor abstraction (#1492)
- rollback() and restore() table level APIs introduced
- Restore is implemented by wrapping calls to rollback executor
- Existing tests transparently cover this, since its just a refactor
2020-04-13 08:29:19 -07:00
Balaji Varadarajan
17bf930342 [HUDI-770] Organize upsert/insert API implementation under a single package (#1495) 2020-04-12 23:11:00 -07:00
Pratyaksh Sharma
d610252d6b [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment (#1150)
* [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment
2020-04-07 16:10:26 -07:00
vinoth chandar
eaf6cc2d90 [HUDI-756] Organize Cleaning Action execution into a single package in hudi-client (#1485)
- Introduced a thin abstraction ActionExecutor, that all actions will implement
- Pulled cleaning code from table, writeclient into a single package
- CleanHelper is now CleanPlanner, HoodieCleanClient is no longer around
- Minor refactor of HoodieTable factory method
- HoodieTable.create() methods with and without metaclient passed in
- HoodieTable constructor now does not do a redundant instantiation
- Fixed existing unit tests to work at the HoodieWriteClient level
2020-04-04 00:07:34 -07:00
Shaofeng Shi
78b3194e82 [HUDI-751] Fix some coding issues reported by FindBugs (#1470) 2020-03-31 21:19:32 +08:00
Edwin Guo
9ecf0ccfb2 [HUDI-742] Fix Java Math Exception (#1466) 2020-03-31 12:56:20 +08:00
lamber-ken
dbc9acd23a [HUDI-716] Exception: Not an Avro data file when running HoodieCleanClient.runClean (#1432) 2020-03-30 11:19:17 -07:00
ffcchi
1f5b0c77d6 [HUDI-724] Parallelize getSmallFiles for partitions (#1421)
Co-authored-by: Feichi Feng <feicfeng@amazon.com>
2020-03-30 00:14:38 -07:00
Suneel Marthi
fa36082554 [HUDI-746] Reduce build warnings < 10 (#1465) 2020-03-30 11:46:52 +08:00
vinoth chandar
e057c27603 [HUDI-744] Restructure hudi-common and clean up files under util packages (#1462)
- Brings more order and cohesion to the classes in hudi-common
 - Utils classes related to a particular concept (avro, timeline,...) are placed near to the package
 - common.fs package now contains all the filesystem level classes including wrapper filesystem
 - bloom.filter package renamed to just bloom
 - config package contains classes that help store properties
 - common.fs.inline package contains all the inline filesystem classes/impl
 - common.table.timeline now consolidates all timeline related classes
 - common.table.view consolidates all the classes related to filesystem view metadata
 - common.table.timeline.versioning contains all classes related to versioning of timeline
 - Fix few unit tests as a result
 - Moved the test packages around to match the source file move
 - Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos
2020-03-29 10:58:49 -07:00
leesf
07c3c5d797 [HUDI-679] Make io package Spark free (#1460)
* [HUDI-679] Make io package Spark free
2020-03-29 16:54:00 +08:00
Suneel Marthi
04449f33fe [HUDI-743]: Remove FileIOUtils.close() (#1461) 2020-03-28 18:03:15 +08:00
Suneel Marthi
8c3001363d HUDI-479: Eliminate or Minimize use of Guava if possible (#1159) 2020-03-28 03:11:32 -04:00
Raymond Xu
1713f686f8 [MINOR] Add error message when check arguments (#1451) 2020-03-27 10:21:38 +08:00
leesf
8b0a4009a9 [HUDI-678] Make config package spark free (#1418) 2020-03-26 08:30:27 -07:00
Mathieu
5eed6c98a8 [MINOR] Fix javadoc of InsertBucket (#1445) 2020-03-25 22:25:47 +08:00
hongdd
cafc87041b [HUDI-697]Add unit test for ArchivedCommitsCommand (#1424) 2020-03-23 13:46:10 +08:00
Zhiyuan Zhao
0241b21f77 [HUDI-65] commitTime rename to instantTime (#1431) 2020-03-22 18:06:00 -07:00
Zhiyuan Zhao
06652aa935 [MINOR] Add omissive param desc on method doc and cleanup redundant code (#1437) 2020-03-22 21:39:33 +08:00
satishkotha
83fb9651f3 [HUDI-650] Modify handleUpdate path to validate partitionPath (#1368) 2020-03-20 08:37:22 -07:00
ForwardXu
1e321c2fc0 [HUDI-209] Implement JMX metrics reporter (#1106) 2020-03-19 20:10:35 +08:00
leesf
0a4902ecce [HUDI-437] Support user-defined index (#1408)
* [hotfix] set default value for index class config
* class config takes precedence over `hoodie.index.type`
2020-03-17 19:27:40 -07:00
Suneel Marthi
99b7e9eb9e [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java (#1350)
* [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java
2020-03-13 20:28:05 -04:00