1
0
Commit Graph

942 Commits

Author SHA1 Message Date
Udit Mehrotra
404c7e82d9 [HUDI-884] Shade avro and parquet-avro in hudi-hive-sync-bundle (#1618)
Co-authored-by: Mehrotra <uditme@amazon.com>
2020-05-12 11:40:31 -07:00
Shen Hong
e8ffc6f0aa [HUDI-881] Replace part of spark context by hadoop configuration in AbstractHoodieClient and HoodieReadClient (#1620) 2020-05-12 09:33:29 -07:00
Shen Hong
b54517aad0 [HUDI-886] Replace jsc.hadoopConfiguration by hadoop configuration in hudi-client testcase (#1621) 2020-05-12 08:51:31 -07:00
Shen Hong
295d00beea [HUDI-880] Replace part of spark context by hadoop configuration in HoodieTable. (#1614) 2020-05-11 23:33:57 -07:00
liujinhui
5d37e66b7e [MINOR] Fix HoodieNotSupportedException description in KafkaOffsetGen (#1615) 2020-05-11 23:14:43 +08:00
Shen Hong
6dac10115c [HUDI-870] Remove spark context in ClientUtils and HoodieIndex (#1609) 2020-05-11 19:05:36 +08:00
Balaji Varadarajan
8d0e23173b [HUDI-820] cleaner repair command should only inspect clean metadata files (#1542) 2020-05-11 09:25:54 +08:00
vinoth chandar
f92b9fdcc4 [MINOR] Fix hardcoding of ports in TestHoodieJmxMetrics (#1606) 2020-05-10 19:23:26 -04:00
Carm
fa6aba751d [MINOR] fixed building IndexFileFilter with a wrong condition in HoodieGlobalBloomIndex class (#1537) 2020-05-10 09:45:07 +08:00
Udit Mehrotra
d54b4b8a52 [HUDI-838] Support schema from HoodieCommitMetadata for HiveSync (#1559)
Co-authored-by: Mehrotra <uditme@amazon.com>
2020-05-07 16:33:09 -07:00
Alexander Filipchik
e783ab1749 [HUDI-784] Adressing issue with log reader on GCS (#1516)
[HUDI-784] Adressing issue with log reader on GCS (#1516)
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>
2020-05-07 13:05:32 -07:00
hongdd
f921469afc [HUDI-704] Add test for RepairsCommand (#1554) 2020-05-07 23:02:28 +08:00
Raymond Xu
366bb10d8c [HUDI-812] Migrate hudi common tests to JUnit 5 (#1590)
* [HUDI-812] Migrate hudi-common tests to JUnit 5
2020-05-06 19:15:20 +08:00
bschell
e21441ad83 Add changes for presto mor queries (#1578)
Adds the neccessary changes to hudi for support of presto querying hudi
merge-on-read table's realtime view.

Co-authored-by: Brandon Scheller <bschelle@amazon.com>
2020-05-04 11:27:14 -07:00
AakashPradeep
5e0f5e5521 [HUDI-852] adding check for table name for Append Save mode (#1580)
* adding check for table name for Append Save mode

* adding existing table validation for delete and upsert operation

Co-authored-by: Aakash Pradeep <apradeep@twilio.com>
2020-05-03 23:09:17 -07:00
Raymond Xu
096f7f55b2 [HUDI-813] Migrate hudi-utilities tests to JUnit 5 (#1589) 2020-05-04 12:43:42 +08:00
Balaji Varadarajan
506447fd4f [HUDI-850] Avoid unnecessary listings in incremental cleaning mode (#1576) 2020-05-01 21:37:21 -07:00
vinoth chandar
c4b71622b9 [MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability (#1575)
- reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc
2020-04-30 09:19:39 -07:00
hongdd
9059bce977 [HUDI-702] Add test for HoodieLogFileCommand (#1522) 2020-04-29 18:47:27 +08:00
Raymond Xu
69b16309c8 [HUDI-814] Migrate hudi-client tests to JUnit 5 (#1570) 2020-04-29 13:57:28 +08:00
Raymond Xu
06dae30297 [HUDI-810] Migrate ClientTestHarness to JUnit 5 (#1553) 2020-04-28 23:38:16 +08:00
satishkotha
6de9f5d9e5 [HUDI-819] Fix a bug with MergeOnReadLazyInsertIterable.
Variable declared here[1] masks protected statuses variable. So although hoodie writes data, will not include writestatus in the completed section. This can cause duplicates being written (#1540)
[1] https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53
2020-04-27 12:50:39 -07:00
vinoth chandar
19ca0b5629 [HUDI-785] Refactor compaction/savepoint execution based on ActionExector abstraction (#1548)
- Savepoint and compaction classes moved to table.action.* packages
 - HoodieWriteClient#savepoint(...) returns void
 - Renamed HoodieCommitArchiveLog -> HoodieTimelineArchiveLog
 - Fixed tests to take into account the additional validation done
 - Moved helper code into CompactHelpers and SavepointHelpers
2020-04-25 18:26:44 -07:00
dengziming
19cc15c098 [MINOR]: Fix cli docs for DeltaStreamer (#1547) 2020-04-22 11:37:17 -07:00
Alexander Filipchik
aea7c1657e [HUDI-795] Handle auto-deleted empty aux folder (#1515)
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>
2020-04-22 09:47:32 -07:00
leesf
26684f5984 [HUDI-816] Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678 (#1536) 2020-04-22 16:33:18 +08:00
Raymond Xu
6e15eebd81 [HUDI-809] Migrate CommonTestHarness to JUnit 5 (#1530) 2020-04-22 14:10:25 +08:00
Alexander Filipchik
2a56f82908 [HUDI-821] Fixing JCommander param parsing in deltastreamer (#1525)
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>
2020-04-21 20:12:34 -07:00
Prashant Wason
62bd3e7ded [HUDI-757] Added hudi-cli command to export metadata of Instants.
Example:
hudi:db.table-> export instants --localFolder /tmp/ --limit 5 --actions clean,rollback,commit --desc false
2020-04-21 12:41:19 -07:00
hongdd
84dd9047d3 [HUDI-789]Adjust logic of upsert in HDFSParquetImporter (#1511) 2020-04-21 14:21:30 +08:00
n3nash
332072bc6d [HUDI-371] Supporting hive combine input format for realtime tables (#1503) 2020-04-20 20:40:06 -07:00
Mathieu
2a2f31d919 [MINOR] Remove reduntant code and fix typo in HoodieDefaultTimeline (#1535) 2020-04-21 09:40:22 +08:00
Dongwook
ddd105bb31 [HUDI-772] Make UserDefinedBulkInsertPartitioner configurable for DataSource (#1500) 2020-04-20 08:38:18 -07:00
lw0090
09fd6f64c5 [HUDI-800] Fix Metrics getReporter().close() throws NPE. (#1529) 2020-04-19 21:33:07 +08:00
baobaoyeye
75523657a4 [MINOR] use Option and fix description in toString method (#1527)
* [MINOR] fix some places are not elegant, as a newcomer

* [MINOR] fix some places are not elegant, as a newcomer
2020-04-18 12:51:37 +08:00
Alexander Filipchik
acb1ada2f7 [HUDI-799] Use appropriate FS when loading configs (#1517)
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>
2020-04-16 13:49:39 -07:00
Raymond Xu
acdc4a8d00 [HUDI-798] Migrate to Mockito Jupiter for JUnit 5 (#1521) 2020-04-16 16:07:32 +08:00
Prashant Wason
19d29ac7d0 [HUDI-741] Added checks to validate Hoodie's schema evolution.
HUDI specific validation of schema evolution should ensure that a newer schema can be used for the dataset by checking that the data written using the old schema can be read using the new schema.

Code changes:

1. Added a new config in HoodieWriteConfig to enable schema validation check (disabled by default)
2. Moved code that reads schema from base/log files into hudi-common from hudi-hive-sync
3. Added writerSchema to the extraMetadata of compaction commits in MOR table. This is same as that for commits on COW table.

Testing changes:

4. Extended TestHoodieClientBase to add insertBatch API which allows inserting a new batch of unique records into a HUDI table
5. Added a unit test to verify schema evolution for both COW and MOR tables.
6. Added unit tests for schema compatiblity checks.
2020-04-15 23:34:59 -07:00
Iftach Schonbaum
9ca710cb02 [HUDI-777] Updated description for --target-table parameter (#1519) 2020-04-15 14:56:13 -07:00
Raymond Xu
d65efe659d [HUDI-780] Migrate test cases to Junit 5 (#1504) 2020-04-15 12:35:01 -07:00
Gary Li
14d4fea833 [HUDI-759] Integrate checkpoint provider with delta streamer (#1486) 2020-04-14 14:51:04 -07:00
hongdd
644c1cc8bd [HUDI-698]Add unit test for CleansCommand (#1449) 2020-04-14 17:54:47 +08:00
vinoth chandar
661b0b3bab [HUDI-761] Refactoring rollback and restore actions using the ActionExecutor abstraction (#1492)
- rollback() and restore() table level APIs introduced
- Restore is implemented by wrapping calls to rollback executor
- Existing tests transparently cover this, since its just a refactor
2020-04-13 08:29:19 -07:00
Balaji Varadarajan
17bf930342 [HUDI-770] Organize upsert/insert API implementation under a single package (#1495) 2020-04-12 23:11:00 -07:00
Sivabalan Narayanan
447ba3bae6 [MINOR] Disabling flaky test in InlineFileSystem (#1510) 2020-04-12 19:38:56 -07:00
Pratyaksh Sharma
6d7ca2cf7e [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema (#1427) 2020-04-12 17:55:26 -07:00
Shen Hong
5d717a28f4 [HUDI-782] Add support of Aliyun object storage service. (#1506) 2020-04-12 10:06:30 +08:00
hongdd
a464a2972e [HUDI-700]Add unit test for FileSystemViewCommand (#1490) 2020-04-11 10:12:21 +08:00
satishkotha
c0f96e0726 [HUDI-687] Stop incremental reader on RO table when there is a pending compaction (#1396) 2020-04-10 10:45:41 -07:00
Bhavani Sudha Saktheeswaran
8c7cef3e50 [HUDI - 738] Add validation to DeltaStreamer to fail fast when filterDupes is enabled on UPSERT mode. (#1505)
Summary:
This fix ensures for UPSERT operation, '--filter-dupes' is disabled and fails fast if not. Otherwise it would drop all updates silently and only take in new records.
2020-04-10 08:58:55 -07:00