1
0
Commit Graph

903 Commits

Author SHA1 Message Date
Raymond Xu
d65efe659d [HUDI-780] Migrate test cases to Junit 5 (#1504) 2020-04-15 12:35:01 -07:00
Gary Li
14d4fea833 [HUDI-759] Integrate checkpoint provider with delta streamer (#1486) 2020-04-14 14:51:04 -07:00
hongdd
644c1cc8bd [HUDI-698]Add unit test for CleansCommand (#1449) 2020-04-14 17:54:47 +08:00
vinoth chandar
661b0b3bab [HUDI-761] Refactoring rollback and restore actions using the ActionExecutor abstraction (#1492)
- rollback() and restore() table level APIs introduced
- Restore is implemented by wrapping calls to rollback executor
- Existing tests transparently cover this, since its just a refactor
2020-04-13 08:29:19 -07:00
Balaji Varadarajan
17bf930342 [HUDI-770] Organize upsert/insert API implementation under a single package (#1495) 2020-04-12 23:11:00 -07:00
Sivabalan Narayanan
447ba3bae6 [MINOR] Disabling flaky test in InlineFileSystem (#1510) 2020-04-12 19:38:56 -07:00
Pratyaksh Sharma
6d7ca2cf7e [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema (#1427) 2020-04-12 17:55:26 -07:00
Shen Hong
5d717a28f4 [HUDI-782] Add support of Aliyun object storage service. (#1506) 2020-04-12 10:06:30 +08:00
hongdd
a464a2972e [HUDI-700]Add unit test for FileSystemViewCommand (#1490) 2020-04-11 10:12:21 +08:00
satishkotha
c0f96e0726 [HUDI-687] Stop incremental reader on RO table when there is a pending compaction (#1396) 2020-04-10 10:45:41 -07:00
Bhavani Sudha Saktheeswaran
8c7cef3e50 [HUDI - 738] Add validation to DeltaStreamer to fail fast when filterDupes is enabled on UPSERT mode. (#1505)
Summary:
This fix ensures for UPSERT operation, '--filter-dupes' is disabled and fails fast if not. Otherwise it would drop all updates silently and only take in new records.
2020-04-10 08:58:55 -07:00
Ramachandran Madtas Subramaniam
f5f34bb1c1 [HUDI-568] Improve unit test coverage
Classes improved:
* HoodieTableMetaClient
* RocksDBDAO
* HoodieRealtimeFileSplit
2020-04-09 10:15:34 -07:00
Abhishek Modi
996f761232 Trying git merge --squash 2020-04-09 08:18:02 -07:00
Satish Kotha
3c803421e0 rename variable per review comments 2020-04-08 21:56:59 -07:00
Satish Kotha
1f6be820f3 [HUDI-758] Modify Integration test to include incremental queries for MOR tables 2020-04-08 21:56:59 -07:00
Jiayi Liao
f7b55afb74 [MINOR] Fix typo in TimelineService (#1497)
Co-authored-by: Jiayi Liao <bupt_ljy@163.com>
2020-04-08 18:14:50 -07:00
hongdd
4e5c8671ef [HUDI-740]Fix can not specify the sparkMaster and code clean for SparkUtil (#1452) 2020-04-08 21:33:15 +08:00
Pratyaksh Sharma
d610252d6b [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment (#1150)
* [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment
2020-04-07 16:10:26 -07:00
Zhiyuan Zhao
b5d093a21b [MINOR] Clear up the redundant comment. (#1489) 2020-04-06 16:31:54 +08:00
vinoth chandar
eaf6cc2d90 [HUDI-756] Organize Cleaning Action execution into a single package in hudi-client (#1485)
- Introduced a thin abstraction ActionExecutor, that all actions will implement
- Pulled cleaning code from table, writeclient into a single package
- CleanHelper is now CleanPlanner, HoodieCleanClient is no longer around
- Minor refactor of HoodieTable factory method
- HoodieTable.create() methods with and without metaclient passed in
- HoodieTable constructor now does not do a redundant instantiation
- Fixed existing unit tests to work at the HoodieWriteClient level
2020-04-04 00:07:34 -07:00
YanJia-Gary-Li
575d87cf7d HUDI-644 kafka connect checkpoint provider (#1453) 2020-04-03 18:57:34 -07:00
Prashant Wason
deb95ad996 [HUDI-748] Adding .codecov.yml to set exclusions for code coverage reports. (#1468) 2020-04-03 16:25:01 -07:00
Prashant Wason
6808559b01 [HUDI-717] Fixed usage of HiveDriver for DDL statements. (#1416)
When using HiveDriver mode in HudiHiveClient, Hive 2.x DDL operations like ALTER PARTITION may fail. This is because Hive 2.x doesn't like `db`.`table_name` for operations. In this fix, we set the name of the database in the SessionState create for the Driver.
2020-04-03 16:23:05 -07:00
Ramachandran Madtas Subramaniam
639ec20412 [HUDI-562] Enable testing at debug log level
This is to ensure that tests will execute all code paths, even the ones
written under DEBUG log levels. This will improve coverage as well as
ensure there are no surprised when DEBUG log level is enabled in
production.
2020-04-02 11:14:35 -07:00
yanghua
bd716ece18 [MINIOR] Add license header for .asf.yaml and adjust labels 2020-04-02 16:14:35 +08:00
vinoyang
194e20e661 [MINOR] Fix label issue in .asf.yaml (#1478) 2020-04-02 15:51:51 +08:00
Raymond Xu
5b53b0d85e [HUDI-731] Add ChainedTransformer (#1440)
* [HUDI-731] Add ChainedTransformer
2020-04-01 23:21:31 +08:00
Trevor
2a611f4ad3 [HUDI-749] Fix hudi-timeline-server-bundle run_server.sh start error (#1477) 2020-04-01 22:19:54 +08:00
vinoyang
c146ca90fd [HUDI-754] Configure .asf.yaml for Hudi Github repository (#1472)
* [HUDI-754] Configure .asf.yaml for Hudi Github repository
2020-04-01 10:02:47 +08:00
Shaofeng Shi
78b3194e82 [HUDI-751] Fix some coding issues reported by FindBugs (#1470) 2020-03-31 21:19:32 +08:00
Edwin Guo
9ecf0ccfb2 [HUDI-742] Fix Java Math Exception (#1466) 2020-03-31 12:56:20 +08:00
wenningd
ce0a4c64d0 [HUDI-713] Fix conversion of Spark array of struct type to Avro schema (#1406)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-03-30 15:52:15 -07:00
lamber-ken
dbc9acd23a [HUDI-716] Exception: Not an Avro data file when running HoodieCleanClient.runClean (#1432) 2020-03-30 11:19:17 -07:00
Prashant Wason
9f51b99174 [MINOR] Updated HoodieMergeOnReadTestUtils for future testing requirements (#1456)
1. getRecordsUsingInputFormat() can take a custom Configuration which can be used to specify HUDI table properties (e.g. <table>.consume.mode or <table>.consume.start.timestamp)
2. Fixed the return to return an empty List rather than raise an Exception if no records are found
2020-03-30 07:36:12 -07:00
ffcchi
1f5b0c77d6 [HUDI-724] Parallelize getSmallFiles for partitions (#1421)
Co-authored-by: Feichi Feng <feicfeng@amazon.com>
2020-03-30 00:14:38 -07:00
Suneel Marthi
fa36082554 [HUDI-746] Reduce build warnings < 10 (#1465) 2020-03-30 11:46:52 +08:00
vinoth chandar
fad4bd377b [HUDI-745] CI should fail PRs with unapproved license files (#1464) 2020-03-29 10:59:40 -07:00
vinoth chandar
e057c27603 [HUDI-744] Restructure hudi-common and clean up files under util packages (#1462)
- Brings more order and cohesion to the classes in hudi-common
 - Utils classes related to a particular concept (avro, timeline,...) are placed near to the package
 - common.fs package now contains all the filesystem level classes including wrapper filesystem
 - bloom.filter package renamed to just bloom
 - config package contains classes that help store properties
 - common.fs.inline package contains all the inline filesystem classes/impl
 - common.table.timeline now consolidates all timeline related classes
 - common.table.view consolidates all the classes related to filesystem view metadata
 - common.table.timeline.versioning contains all classes related to versioning of timeline
 - Fix few unit tests as a result
 - Moved the test packages around to match the source file move
 - Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos
2020-03-29 10:58:49 -07:00
leesf
07c3c5d797 [HUDI-679] Make io package Spark free (#1460)
* [HUDI-679] Make io package Spark free
2020-03-29 16:54:00 +08:00
Sivabalan Narayanan
ac73bdcdc3 [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile (#1176)
* Adding InlineFileSystem to support embedding any file format (parquet, hfile, etc). Supports reading the embedded file using respective readers.
2020-03-28 12:13:35 -04:00
Suneel Marthi
04449f33fe [HUDI-743]: Remove FileIOUtils.close() (#1461) 2020-03-28 18:03:15 +08:00
Suneel Marthi
8c3001363d HUDI-479: Eliminate or Minimize use of Guava if possible (#1159) 2020-03-28 03:11:32 -04:00
Raymond Xu
1713f686f8 [MINOR] Add error message when check arguments (#1451) 2020-03-27 10:21:38 +08:00
leesf
8b0a4009a9 [HUDI-678] Make config package spark free (#1418) 2020-03-26 08:30:27 -07:00
Suneel Marthi
e101ea9bd4 [MINOR] Update DOAP with 0.5.2 Release (#1448) 2020-03-25 23:37:32 -04:00
Mathieu
5eed6c98a8 [MINOR] Fix javadoc of InsertBucket (#1445) 2020-03-25 22:25:47 +08:00
Raymond Xu
bc82e2be6c [HUDI-711] Refactor exporter main logic (#1436)
* Refactor exporter main logic
* break main method into multiple readable methods
* fix bug of passing wrong file list
* avoid deleting output path when exists
* throw exception to early abort on multiple cases
* use JavaSparkContext instead of SparkSession
* improve unit test for expected exceptions
2020-03-25 18:02:24 +08:00
hongdd
cafc87041b [HUDI-697]Add unit test for ArchivedCommitsCommand (#1424) 2020-03-23 13:46:10 +08:00
Zhiyuan Zhao
0241b21f77 [HUDI-65] commitTime rename to instantTime (#1431) 2020-03-22 18:06:00 -07:00
lamber-ken
38c3ccc51a [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly (#1377) 2020-03-22 10:31:48 -07:00