1
0
Commit Graph

917 Commits

Author SHA1 Message Date
leesf
26684f5984 [HUDI-816] Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678 (#1536) 2020-04-22 16:33:18 +08:00
Raymond Xu
6e15eebd81 [HUDI-809] Migrate CommonTestHarness to JUnit 5 (#1530) 2020-04-22 14:10:25 +08:00
Alexander Filipchik
2a56f82908 [HUDI-821] Fixing JCommander param parsing in deltastreamer (#1525)
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>
2020-04-21 20:12:34 -07:00
Prashant Wason
62bd3e7ded [HUDI-757] Added hudi-cli command to export metadata of Instants.
Example:
hudi:db.table-> export instants --localFolder /tmp/ --limit 5 --actions clean,rollback,commit --desc false
2020-04-21 12:41:19 -07:00
hongdd
84dd9047d3 [HUDI-789]Adjust logic of upsert in HDFSParquetImporter (#1511) 2020-04-21 14:21:30 +08:00
n3nash
332072bc6d [HUDI-371] Supporting hive combine input format for realtime tables (#1503) 2020-04-20 20:40:06 -07:00
Mathieu
2a2f31d919 [MINOR] Remove reduntant code and fix typo in HoodieDefaultTimeline (#1535) 2020-04-21 09:40:22 +08:00
Dongwook
ddd105bb31 [HUDI-772] Make UserDefinedBulkInsertPartitioner configurable for DataSource (#1500) 2020-04-20 08:38:18 -07:00
lw0090
09fd6f64c5 [HUDI-800] Fix Metrics getReporter().close() throws NPE. (#1529) 2020-04-19 21:33:07 +08:00
baobaoyeye
75523657a4 [MINOR] use Option and fix description in toString method (#1527)
* [MINOR] fix some places are not elegant, as a newcomer

* [MINOR] fix some places are not elegant, as a newcomer
2020-04-18 12:51:37 +08:00
Alexander Filipchik
acb1ada2f7 [HUDI-799] Use appropriate FS when loading configs (#1517)
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>
2020-04-16 13:49:39 -07:00
Raymond Xu
acdc4a8d00 [HUDI-798] Migrate to Mockito Jupiter for JUnit 5 (#1521) 2020-04-16 16:07:32 +08:00
Prashant Wason
19d29ac7d0 [HUDI-741] Added checks to validate Hoodie's schema evolution.
HUDI specific validation of schema evolution should ensure that a newer schema can be used for the dataset by checking that the data written using the old schema can be read using the new schema.

Code changes:

1. Added a new config in HoodieWriteConfig to enable schema validation check (disabled by default)
2. Moved code that reads schema from base/log files into hudi-common from hudi-hive-sync
3. Added writerSchema to the extraMetadata of compaction commits in MOR table. This is same as that for commits on COW table.

Testing changes:

4. Extended TestHoodieClientBase to add insertBatch API which allows inserting a new batch of unique records into a HUDI table
5. Added a unit test to verify schema evolution for both COW and MOR tables.
6. Added unit tests for schema compatiblity checks.
2020-04-15 23:34:59 -07:00
Iftach Schonbaum
9ca710cb02 [HUDI-777] Updated description for --target-table parameter (#1519) 2020-04-15 14:56:13 -07:00
Raymond Xu
d65efe659d [HUDI-780] Migrate test cases to Junit 5 (#1504) 2020-04-15 12:35:01 -07:00
Gary Li
14d4fea833 [HUDI-759] Integrate checkpoint provider with delta streamer (#1486) 2020-04-14 14:51:04 -07:00
hongdd
644c1cc8bd [HUDI-698]Add unit test for CleansCommand (#1449) 2020-04-14 17:54:47 +08:00
vinoth chandar
661b0b3bab [HUDI-761] Refactoring rollback and restore actions using the ActionExecutor abstraction (#1492)
- rollback() and restore() table level APIs introduced
- Restore is implemented by wrapping calls to rollback executor
- Existing tests transparently cover this, since its just a refactor
2020-04-13 08:29:19 -07:00
Balaji Varadarajan
17bf930342 [HUDI-770] Organize upsert/insert API implementation under a single package (#1495) 2020-04-12 23:11:00 -07:00
Sivabalan Narayanan
447ba3bae6 [MINOR] Disabling flaky test in InlineFileSystem (#1510) 2020-04-12 19:38:56 -07:00
Pratyaksh Sharma
6d7ca2cf7e [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema (#1427) 2020-04-12 17:55:26 -07:00
Shen Hong
5d717a28f4 [HUDI-782] Add support of Aliyun object storage service. (#1506) 2020-04-12 10:06:30 +08:00
hongdd
a464a2972e [HUDI-700]Add unit test for FileSystemViewCommand (#1490) 2020-04-11 10:12:21 +08:00
satishkotha
c0f96e0726 [HUDI-687] Stop incremental reader on RO table when there is a pending compaction (#1396) 2020-04-10 10:45:41 -07:00
Bhavani Sudha Saktheeswaran
8c7cef3e50 [HUDI - 738] Add validation to DeltaStreamer to fail fast when filterDupes is enabled on UPSERT mode. (#1505)
Summary:
This fix ensures for UPSERT operation, '--filter-dupes' is disabled and fails fast if not. Otherwise it would drop all updates silently and only take in new records.
2020-04-10 08:58:55 -07:00
Ramachandran Madtas Subramaniam
f5f34bb1c1 [HUDI-568] Improve unit test coverage
Classes improved:
* HoodieTableMetaClient
* RocksDBDAO
* HoodieRealtimeFileSplit
2020-04-09 10:15:34 -07:00
Abhishek Modi
996f761232 Trying git merge --squash 2020-04-09 08:18:02 -07:00
Satish Kotha
3c803421e0 rename variable per review comments 2020-04-08 21:56:59 -07:00
Satish Kotha
1f6be820f3 [HUDI-758] Modify Integration test to include incremental queries for MOR tables 2020-04-08 21:56:59 -07:00
Jiayi Liao
f7b55afb74 [MINOR] Fix typo in TimelineService (#1497)
Co-authored-by: Jiayi Liao <bupt_ljy@163.com>
2020-04-08 18:14:50 -07:00
hongdd
4e5c8671ef [HUDI-740]Fix can not specify the sparkMaster and code clean for SparkUtil (#1452) 2020-04-08 21:33:15 +08:00
Pratyaksh Sharma
d610252d6b [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment (#1150)
* [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment
2020-04-07 16:10:26 -07:00
Zhiyuan Zhao
b5d093a21b [MINOR] Clear up the redundant comment. (#1489) 2020-04-06 16:31:54 +08:00
vinoth chandar
eaf6cc2d90 [HUDI-756] Organize Cleaning Action execution into a single package in hudi-client (#1485)
- Introduced a thin abstraction ActionExecutor, that all actions will implement
- Pulled cleaning code from table, writeclient into a single package
- CleanHelper is now CleanPlanner, HoodieCleanClient is no longer around
- Minor refactor of HoodieTable factory method
- HoodieTable.create() methods with and without metaclient passed in
- HoodieTable constructor now does not do a redundant instantiation
- Fixed existing unit tests to work at the HoodieWriteClient level
2020-04-04 00:07:34 -07:00
YanJia-Gary-Li
575d87cf7d HUDI-644 kafka connect checkpoint provider (#1453) 2020-04-03 18:57:34 -07:00
Prashant Wason
deb95ad996 [HUDI-748] Adding .codecov.yml to set exclusions for code coverage reports. (#1468) 2020-04-03 16:25:01 -07:00
Prashant Wason
6808559b01 [HUDI-717] Fixed usage of HiveDriver for DDL statements. (#1416)
When using HiveDriver mode in HudiHiveClient, Hive 2.x DDL operations like ALTER PARTITION may fail. This is because Hive 2.x doesn't like `db`.`table_name` for operations. In this fix, we set the name of the database in the SessionState create for the Driver.
2020-04-03 16:23:05 -07:00
Ramachandran Madtas Subramaniam
639ec20412 [HUDI-562] Enable testing at debug log level
This is to ensure that tests will execute all code paths, even the ones
written under DEBUG log levels. This will improve coverage as well as
ensure there are no surprised when DEBUG log level is enabled in
production.
2020-04-02 11:14:35 -07:00
yanghua
bd716ece18 [MINIOR] Add license header for .asf.yaml and adjust labels 2020-04-02 16:14:35 +08:00
vinoyang
194e20e661 [MINOR] Fix label issue in .asf.yaml (#1478) 2020-04-02 15:51:51 +08:00
Raymond Xu
5b53b0d85e [HUDI-731] Add ChainedTransformer (#1440)
* [HUDI-731] Add ChainedTransformer
2020-04-01 23:21:31 +08:00
Trevor
2a611f4ad3 [HUDI-749] Fix hudi-timeline-server-bundle run_server.sh start error (#1477) 2020-04-01 22:19:54 +08:00
vinoyang
c146ca90fd [HUDI-754] Configure .asf.yaml for Hudi Github repository (#1472)
* [HUDI-754] Configure .asf.yaml for Hudi Github repository
2020-04-01 10:02:47 +08:00
Shaofeng Shi
78b3194e82 [HUDI-751] Fix some coding issues reported by FindBugs (#1470) 2020-03-31 21:19:32 +08:00
Edwin Guo
9ecf0ccfb2 [HUDI-742] Fix Java Math Exception (#1466) 2020-03-31 12:56:20 +08:00
wenningd
ce0a4c64d0 [HUDI-713] Fix conversion of Spark array of struct type to Avro schema (#1406)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-03-30 15:52:15 -07:00
lamber-ken
dbc9acd23a [HUDI-716] Exception: Not an Avro data file when running HoodieCleanClient.runClean (#1432) 2020-03-30 11:19:17 -07:00
Prashant Wason
9f51b99174 [MINOR] Updated HoodieMergeOnReadTestUtils for future testing requirements (#1456)
1. getRecordsUsingInputFormat() can take a custom Configuration which can be used to specify HUDI table properties (e.g. <table>.consume.mode or <table>.consume.start.timestamp)
2. Fixed the return to return an empty List rather than raise an Exception if no records are found
2020-03-30 07:36:12 -07:00
ffcchi
1f5b0c77d6 [HUDI-724] Parallelize getSmallFiles for partitions (#1421)
Co-authored-by: Feichi Feng <feicfeng@amazon.com>
2020-03-30 00:14:38 -07:00
Suneel Marthi
fa36082554 [HUDI-746] Reduce build warnings < 10 (#1465) 2020-03-30 11:46:52 +08:00