Udit Mehrotra
d54b4b8a52
[HUDI-838] Support schema from HoodieCommitMetadata for HiveSync ( #1559 )
...
Co-authored-by: Mehrotra <uditme@amazon.com >
2020-05-07 16:33:09 -07:00
Alexander Filipchik
e783ab1749
[HUDI-784] Adressing issue with log reader on GCS ( #1516 )
...
[HUDI-784] Adressing issue with log reader on GCS (#1516 )
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
2020-05-07 13:05:32 -07:00
hongdd
f921469afc
[HUDI-704] Add test for RepairsCommand ( #1554 )
2020-05-07 23:02:28 +08:00
Raymond Xu
366bb10d8c
[HUDI-812] Migrate hudi common tests to JUnit 5 ( #1590 )
...
* [HUDI-812] Migrate hudi-common tests to JUnit 5
2020-05-06 19:15:20 +08:00
bschell
e21441ad83
Add changes for presto mor queries ( #1578 )
...
Adds the neccessary changes to hudi for support of presto querying hudi
merge-on-read table's realtime view.
Co-authored-by: Brandon Scheller <bschelle@amazon.com >
2020-05-04 11:27:14 -07:00
AakashPradeep
5e0f5e5521
[HUDI-852] adding check for table name for Append Save mode ( #1580 )
...
* adding check for table name for Append Save mode
* adding existing table validation for delete and upsert operation
Co-authored-by: Aakash Pradeep <apradeep@twilio.com >
2020-05-03 23:09:17 -07:00
Raymond Xu
096f7f55b2
[HUDI-813] Migrate hudi-utilities tests to JUnit 5 ( #1589 )
2020-05-04 12:43:42 +08:00
Balaji Varadarajan
506447fd4f
[HUDI-850] Avoid unnecessary listings in incremental cleaning mode ( #1576 )
2020-05-01 21:37:21 -07:00
vinoth chandar
c4b71622b9
[MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability ( #1575 )
...
- reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc
2020-04-30 09:19:39 -07:00
hongdd
9059bce977
[HUDI-702] Add test for HoodieLogFileCommand ( #1522 )
2020-04-29 18:47:27 +08:00
Raymond Xu
69b16309c8
[HUDI-814] Migrate hudi-client tests to JUnit 5 ( #1570 )
2020-04-29 13:57:28 +08:00
Raymond Xu
06dae30297
[HUDI-810] Migrate ClientTestHarness to JUnit 5 ( #1553 )
2020-04-28 23:38:16 +08:00
satishkotha
6de9f5d9e5
[HUDI-819] Fix a bug with MergeOnReadLazyInsertIterable.
...
Variable declared here[1] masks protected statuses variable. So although hoodie writes data, will not include writestatus in the completed section. This can cause duplicates being written (#1540 )
[1] https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53
2020-04-27 12:50:39 -07:00
vinoth chandar
19ca0b5629
[HUDI-785] Refactor compaction/savepoint execution based on ActionExector abstraction ( #1548 )
...
- Savepoint and compaction classes moved to table.action.* packages
- HoodieWriteClient#savepoint(...) returns void
- Renamed HoodieCommitArchiveLog -> HoodieTimelineArchiveLog
- Fixed tests to take into account the additional validation done
- Moved helper code into CompactHelpers and SavepointHelpers
2020-04-25 18:26:44 -07:00
dengziming
19cc15c098
[MINOR]: Fix cli docs for DeltaStreamer ( #1547 )
2020-04-22 11:37:17 -07:00
Alexander Filipchik
aea7c1657e
[HUDI-795] Handle auto-deleted empty aux folder ( #1515 )
...
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
2020-04-22 09:47:32 -07:00
leesf
26684f5984
[HUDI-816] Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678 ( #1536 )
2020-04-22 16:33:18 +08:00
Raymond Xu
6e15eebd81
[HUDI-809] Migrate CommonTestHarness to JUnit 5 ( #1530 )
2020-04-22 14:10:25 +08:00
Alexander Filipchik
2a56f82908
[HUDI-821] Fixing JCommander param parsing in deltastreamer ( #1525 )
...
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
2020-04-21 20:12:34 -07:00
Prashant Wason
62bd3e7ded
[HUDI-757] Added hudi-cli command to export metadata of Instants.
...
Example:
hudi:db.table-> export instants --localFolder /tmp/ --limit 5 --actions clean,rollback,commit --desc false
2020-04-21 12:41:19 -07:00
hongdd
84dd9047d3
[HUDI-789]Adjust logic of upsert in HDFSParquetImporter ( #1511 )
2020-04-21 14:21:30 +08:00
n3nash
332072bc6d
[HUDI-371] Supporting hive combine input format for realtime tables ( #1503 )
2020-04-20 20:40:06 -07:00
Mathieu
2a2f31d919
[MINOR] Remove reduntant code and fix typo in HoodieDefaultTimeline ( #1535 )
2020-04-21 09:40:22 +08:00
Dongwook
ddd105bb31
[HUDI-772] Make UserDefinedBulkInsertPartitioner configurable for DataSource ( #1500 )
2020-04-20 08:38:18 -07:00
lw0090
09fd6f64c5
[HUDI-800] Fix Metrics getReporter().close() throws NPE. ( #1529 )
2020-04-19 21:33:07 +08:00
baobaoyeye
75523657a4
[MINOR] use Option and fix description in toString method ( #1527 )
...
* [MINOR] fix some places are not elegant, as a newcomer
* [MINOR] fix some places are not elegant, as a newcomer
2020-04-18 12:51:37 +08:00
Alexander Filipchik
acb1ada2f7
[HUDI-799] Use appropriate FS when loading configs ( #1517 )
...
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
2020-04-16 13:49:39 -07:00
Raymond Xu
acdc4a8d00
[HUDI-798] Migrate to Mockito Jupiter for JUnit 5 ( #1521 )
2020-04-16 16:07:32 +08:00
Prashant Wason
19d29ac7d0
[HUDI-741] Added checks to validate Hoodie's schema evolution.
...
HUDI specific validation of schema evolution should ensure that a newer schema can be used for the dataset by checking that the data written using the old schema can be read using the new schema.
Code changes:
1. Added a new config in HoodieWriteConfig to enable schema validation check (disabled by default)
2. Moved code that reads schema from base/log files into hudi-common from hudi-hive-sync
3. Added writerSchema to the extraMetadata of compaction commits in MOR table. This is same as that for commits on COW table.
Testing changes:
4. Extended TestHoodieClientBase to add insertBatch API which allows inserting a new batch of unique records into a HUDI table
5. Added a unit test to verify schema evolution for both COW and MOR tables.
6. Added unit tests for schema compatiblity checks.
2020-04-15 23:34:59 -07:00
Iftach Schonbaum
9ca710cb02
[HUDI-777] Updated description for --target-table parameter ( #1519 )
2020-04-15 14:56:13 -07:00
Raymond Xu
d65efe659d
[HUDI-780] Migrate test cases to Junit 5 ( #1504 )
2020-04-15 12:35:01 -07:00
Gary Li
14d4fea833
[HUDI-759] Integrate checkpoint provider with delta streamer ( #1486 )
2020-04-14 14:51:04 -07:00
hongdd
644c1cc8bd
[HUDI-698]Add unit test for CleansCommand ( #1449 )
2020-04-14 17:54:47 +08:00
vinoth chandar
661b0b3bab
[HUDI-761] Refactoring rollback and restore actions using the ActionExecutor abstraction ( #1492 )
...
- rollback() and restore() table level APIs introduced
- Restore is implemented by wrapping calls to rollback executor
- Existing tests transparently cover this, since its just a refactor
2020-04-13 08:29:19 -07:00
Balaji Varadarajan
17bf930342
[HUDI-770] Organize upsert/insert API implementation under a single package ( #1495 )
2020-04-12 23:11:00 -07:00
Sivabalan Narayanan
447ba3bae6
[MINOR] Disabling flaky test in InlineFileSystem ( #1510 )
2020-04-12 19:38:56 -07:00
Pratyaksh Sharma
6d7ca2cf7e
[HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema ( #1427 )
2020-04-12 17:55:26 -07:00
Shen Hong
5d717a28f4
[HUDI-782] Add support of Aliyun object storage service. ( #1506 )
2020-04-12 10:06:30 +08:00
hongdd
a464a2972e
[HUDI-700]Add unit test for FileSystemViewCommand ( #1490 )
2020-04-11 10:12:21 +08:00
satishkotha
c0f96e0726
[HUDI-687] Stop incremental reader on RO table when there is a pending compaction ( #1396 )
2020-04-10 10:45:41 -07:00
Bhavani Sudha Saktheeswaran
8c7cef3e50
[HUDI - 738] Add validation to DeltaStreamer to fail fast when filterDupes is enabled on UPSERT mode. ( #1505 )
...
Summary:
This fix ensures for UPSERT operation, '--filter-dupes' is disabled and fails fast if not. Otherwise it would drop all updates silently and only take in new records.
2020-04-10 08:58:55 -07:00
Ramachandran Madtas Subramaniam
f5f34bb1c1
[HUDI-568] Improve unit test coverage
...
Classes improved:
* HoodieTableMetaClient
* RocksDBDAO
* HoodieRealtimeFileSplit
2020-04-09 10:15:34 -07:00
Abhishek Modi
996f761232
Trying git merge --squash
2020-04-09 08:18:02 -07:00
Satish Kotha
3c803421e0
rename variable per review comments
2020-04-08 21:56:59 -07:00
Satish Kotha
1f6be820f3
[HUDI-758] Modify Integration test to include incremental queries for MOR tables
2020-04-08 21:56:59 -07:00
Jiayi Liao
f7b55afb74
[MINOR] Fix typo in TimelineService ( #1497 )
...
Co-authored-by: Jiayi Liao <bupt_ljy@163.com >
2020-04-08 18:14:50 -07:00
hongdd
4e5c8671ef
[HUDI-740]Fix can not specify the sparkMaster and code clean for SparkUtil ( #1452 )
2020-04-08 21:33:15 +08:00
Pratyaksh Sharma
d610252d6b
[HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment ( #1150 )
...
* [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment
2020-04-07 16:10:26 -07:00
Zhiyuan Zhao
b5d093a21b
[MINOR] Clear up the redundant comment. ( #1489 )
2020-04-06 16:31:54 +08:00
vinoth chandar
eaf6cc2d90
[HUDI-756] Organize Cleaning Action execution into a single package in hudi-client ( #1485 )
...
- Introduced a thin abstraction ActionExecutor, that all actions will implement
- Pulled cleaning code from table, writeclient into a single package
- CleanHelper is now CleanPlanner, HoodieCleanClient is no longer around
- Minor refactor of HoodieTable factory method
- HoodieTable.create() methods with and without metaclient passed in
- HoodieTable constructor now does not do a redundant instantiation
- Fixed existing unit tests to work at the HoodieWriteClient level
2020-04-04 00:07:34 -07:00