1
0
Commit Graph

953 Commits

Author SHA1 Message Date
n3nash
332072bc6d [HUDI-371] Supporting hive combine input format for realtime tables (#1503) 2020-04-20 20:40:06 -07:00
Dongwook
ddd105bb31 [HUDI-772] Make UserDefinedBulkInsertPartitioner configurable for DataSource (#1500) 2020-04-20 08:38:18 -07:00
lw0090
09fd6f64c5 [HUDI-800] Fix Metrics getReporter().close() throws NPE. (#1529) 2020-04-19 21:33:07 +08:00
baobaoyeye
75523657a4 [MINOR] use Option and fix description in toString method (#1527)
* [MINOR] fix some places are not elegant, as a newcomer

* [MINOR] fix some places are not elegant, as a newcomer
2020-04-18 12:51:37 +08:00
Raymond Xu
acdc4a8d00 [HUDI-798] Migrate to Mockito Jupiter for JUnit 5 (#1521) 2020-04-16 16:07:32 +08:00
Prashant Wason
19d29ac7d0 [HUDI-741] Added checks to validate Hoodie's schema evolution.
HUDI specific validation of schema evolution should ensure that a newer schema can be used for the dataset by checking that the data written using the old schema can be read using the new schema.

Code changes:

1. Added a new config in HoodieWriteConfig to enable schema validation check (disabled by default)
2. Moved code that reads schema from base/log files into hudi-common from hudi-hive-sync
3. Added writerSchema to the extraMetadata of compaction commits in MOR table. This is same as that for commits on COW table.

Testing changes:

4. Extended TestHoodieClientBase to add insertBatch API which allows inserting a new batch of unique records into a HUDI table
5. Added a unit test to verify schema evolution for both COW and MOR tables.
6. Added unit tests for schema compatiblity checks.
2020-04-15 23:34:59 -07:00
Raymond Xu
d65efe659d [HUDI-780] Migrate test cases to Junit 5 (#1504) 2020-04-15 12:35:01 -07:00
vinoth chandar
661b0b3bab [HUDI-761] Refactoring rollback and restore actions using the ActionExecutor abstraction (#1492)
- rollback() and restore() table level APIs introduced
- Restore is implemented by wrapping calls to rollback executor
- Existing tests transparently cover this, since its just a refactor
2020-04-13 08:29:19 -07:00
Balaji Varadarajan
17bf930342 [HUDI-770] Organize upsert/insert API implementation under a single package (#1495) 2020-04-12 23:11:00 -07:00
satishkotha
c0f96e0726 [HUDI-687] Stop incremental reader on RO table when there is a pending compaction (#1396) 2020-04-10 10:45:41 -07:00
Pratyaksh Sharma
d610252d6b [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment (#1150)
* [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment
2020-04-07 16:10:26 -07:00
vinoth chandar
eaf6cc2d90 [HUDI-756] Organize Cleaning Action execution into a single package in hudi-client (#1485)
- Introduced a thin abstraction ActionExecutor, that all actions will implement
- Pulled cleaning code from table, writeclient into a single package
- CleanHelper is now CleanPlanner, HoodieCleanClient is no longer around
- Minor refactor of HoodieTable factory method
- HoodieTable.create() methods with and without metaclient passed in
- HoodieTable constructor now does not do a redundant instantiation
- Fixed existing unit tests to work at the HoodieWriteClient level
2020-04-04 00:07:34 -07:00
Ramachandran Madtas Subramaniam
639ec20412 [HUDI-562] Enable testing at debug log level
This is to ensure that tests will execute all code paths, even the ones
written under DEBUG log levels. This will improve coverage as well as
ensure there are no surprised when DEBUG log level is enabled in
production.
2020-04-02 11:14:35 -07:00
Shaofeng Shi
78b3194e82 [HUDI-751] Fix some coding issues reported by FindBugs (#1470) 2020-03-31 21:19:32 +08:00
Edwin Guo
9ecf0ccfb2 [HUDI-742] Fix Java Math Exception (#1466) 2020-03-31 12:56:20 +08:00
wenningd
ce0a4c64d0 [HUDI-713] Fix conversion of Spark array of struct type to Avro schema (#1406)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-03-30 15:52:15 -07:00
lamber-ken
dbc9acd23a [HUDI-716] Exception: Not an Avro data file when running HoodieCleanClient.runClean (#1432) 2020-03-30 11:19:17 -07:00
Prashant Wason
9f51b99174 [MINOR] Updated HoodieMergeOnReadTestUtils for future testing requirements (#1456)
1. getRecordsUsingInputFormat() can take a custom Configuration which can be used to specify HUDI table properties (e.g. <table>.consume.mode or <table>.consume.start.timestamp)
2. Fixed the return to return an empty List rather than raise an Exception if no records are found
2020-03-30 07:36:12 -07:00
ffcchi
1f5b0c77d6 [HUDI-724] Parallelize getSmallFiles for partitions (#1421)
Co-authored-by: Feichi Feng <feicfeng@amazon.com>
2020-03-30 00:14:38 -07:00
Suneel Marthi
fa36082554 [HUDI-746] Reduce build warnings < 10 (#1465) 2020-03-30 11:46:52 +08:00
vinoth chandar
e057c27603 [HUDI-744] Restructure hudi-common and clean up files under util packages (#1462)
- Brings more order and cohesion to the classes in hudi-common
 - Utils classes related to a particular concept (avro, timeline,...) are placed near to the package
 - common.fs package now contains all the filesystem level classes including wrapper filesystem
 - bloom.filter package renamed to just bloom
 - config package contains classes that help store properties
 - common.fs.inline package contains all the inline filesystem classes/impl
 - common.table.timeline now consolidates all timeline related classes
 - common.table.view consolidates all the classes related to filesystem view metadata
 - common.table.timeline.versioning contains all classes related to versioning of timeline
 - Fix few unit tests as a result
 - Moved the test packages around to match the source file move
 - Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos
2020-03-29 10:58:49 -07:00
leesf
07c3c5d797 [HUDI-679] Make io package Spark free (#1460)
* [HUDI-679] Make io package Spark free
2020-03-29 16:54:00 +08:00
Sivabalan Narayanan
ac73bdcdc3 [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile (#1176)
* Adding InlineFileSystem to support embedding any file format (parquet, hfile, etc). Supports reading the embedded file using respective readers.
2020-03-28 12:13:35 -04:00
Suneel Marthi
04449f33fe [HUDI-743]: Remove FileIOUtils.close() (#1461) 2020-03-28 18:03:15 +08:00
Suneel Marthi
8c3001363d HUDI-479: Eliminate or Minimize use of Guava if possible (#1159) 2020-03-28 03:11:32 -04:00
Raymond Xu
1713f686f8 [MINOR] Add error message when check arguments (#1451) 2020-03-27 10:21:38 +08:00
leesf
8b0a4009a9 [HUDI-678] Make config package spark free (#1418) 2020-03-26 08:30:27 -07:00
Mathieu
5eed6c98a8 [MINOR] Fix javadoc of InsertBucket (#1445) 2020-03-25 22:25:47 +08:00
hongdd
cafc87041b [HUDI-697]Add unit test for ArchivedCommitsCommand (#1424) 2020-03-23 13:46:10 +08:00
Zhiyuan Zhao
0241b21f77 [HUDI-65] commitTime rename to instantTime (#1431) 2020-03-22 18:06:00 -07:00
Zhiyuan Zhao
06652aa935 [MINOR] Add omissive param desc on method doc and cleanup redundant code (#1437) 2020-03-22 21:39:33 +08:00
satishkotha
83fb9651f3 [HUDI-650] Modify handleUpdate path to validate partitionPath (#1368) 2020-03-20 08:37:22 -07:00
Sivabalan Narayanan
a752b7b18c Merge pull request #1165 from yihua/HUDI-76-deltastreamer-csv-source
[HUDI-76] Add CSV Source support for Hudi Delta Streamer
2020-03-19 10:00:53 -04:00
ForwardXu
1e321c2fc0 [HUDI-209] Implement JMX metrics reporter (#1106) 2020-03-19 20:10:35 +08:00
leesf
0a4902ecce [HUDI-437] Support user-defined index (#1408)
* [hotfix] set default value for index class config
* class config takes precedence over `hoodie.index.type`
2020-03-17 19:27:40 -07:00
Y Ethan Guo
cf765df606 [HUDI-76] Add CSV Source support for Hudi Delta Streamer 2020-03-15 19:03:37 -07:00
Suneel Marthi
99b7e9eb9e [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java (#1350)
* [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java
2020-03-13 20:28:05 -04:00
vinoth chandar
fb7fba365f [HUDI-646] fix failing test due to improper filesytem cleanup (#1373) 2020-03-12 23:59:09 -07:00
Prashant Wason
7d66831444 [MINOR] Removing code which is duplicated from the base class HoodieWriteHandle. (#1399) 2020-03-11 16:43:04 -07:00
Sivabalan Narayanan
1ca912af09 [HUDI-667] Fixing delete tests for DeltaStreamer (#1395) 2020-03-11 16:19:23 -07:00
Prashant Wason
77d5b92d88 [HUDI-668] Added additional unit-tests for HUDI metrics. (#1380) 2020-03-09 23:15:42 -04:00
hongdd
f93e64fee4 [HUDI-681]Remove embeddedTimelineService from HoodieReadClient (#1388)
* [HUDI-681]Remove embeddedTimelineService from HoodieReadClient
2020-03-09 18:31:04 +08:00
Prashant Wason
5f8bf97005 [HUDI-671] Added unit-test for HBaseIndex (#1381) 2020-03-07 16:48:43 -08:00
vinoyang
ee5b32f5d4 [HUDI-652] Decouple HoodieReadClient and AbstractHoodieClient to break the inheritance chain (#1372)
* Removed timeline server support
* Removed try-with-resource
2020-03-06 09:59:35 -08:00
hongdd
8306205d7a [HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata (#1157)
[HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata (#1157)
2020-03-03 10:10:29 -08:00
yanghua
0dc8e493aa Moving to 0.6.0-SNAPSHOT on master branch. 2020-03-01 15:08:30 +08:00
vinoth chandar
71170fafe7 [HUDI-554] Cleanup package structure in hudi-client (#1346)
- Just package, class moves and renames with the following intent
 - `client` now has all the various client classes, that do the transaction management
 - `func` renamed to `execution` and some helpers moved to `client/utils`
 - All compaction code under `io` now under `table/compact`
 - Rollback code under `table/rollback` and in general all code for individual operations under `table`
 - `exception` `config`, `metrics` left untouched
 - Moved the tests also accordingly
 - Fixed some flaky tests
2020-02-27 08:05:58 -08:00
Suneel Marthi
078d4825d9 [HUDI-624]: Split some of the code from PR for HUDI-479 (#1344) 2020-02-21 14:22:21 +08:00
Suneel Marthi
f9d2f66dc1 [HUDI-622]: Remove VisibleForTesting annotation and import from code (#1343)
* HUDI:622: Remove VisibleForTesting annotation and import from code
2020-02-20 15:17:53 +08:00
Sivabalan Narayanan
00493235f5 [HUDI-108] Removing 2GB spark partition limitations in HoodieBloomIndex with spark 2.4.4 (#1315) 2020-02-18 11:12:20 -08:00