Raymond Xu
ca36c44cb3
[HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common ( #1873 )
2020-07-27 19:21:45 +08:00
Sivabalan Narayanan
5b6026ba43
[HUDI-802] Fixing deletes for inserts in same batch in write path ( #1792 )
...
* Fixing deletes for inserts in same batch in write path
* Fixing delta streamer tests
* Adding tests for OverwriteWithLatestAvroPayload
2020-07-22 19:39:57 -07:00
lw0090
1ec89e9a94
[HUDI-839] Introducing support for rollbacks using marker files ( #1756 )
...
* [HUDI-839] Introducing rollback strategy using marker files
- Adds a new mechanism for rollbacks where it's based on the marker files generated during the write
- Consequently, marker file/dir deletion now happens post commit, instead of during finalize
- Marker files are also generated for AppendHandle, making it consistent throughout the write path
- Until upgrade-downgrade mechanism can upgrade non-marker based inflight writes to marker based, this should only be turned on for new datasets.
- Added marker dir deletion after successful commit/rollback, individual files are not deleted during finalize
- Fail safe for deleting marker directories, now during timeline archival process
- Added check to ensure completed instants are not rolled back using marker based strategy. This will be incorrect
- Reworked tests to rollback inflight instants, instead of completed instants whenever necessary
- Added an unit test for MarkerBasedRollbackStrategy
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2020-07-20 22:41:42 -07:00
Prashant Wason
b71f25f210
[HUDI-92] Provide reasonable names for Spark DAG stages in HUDI. ( #1289 )
2020-07-19 10:29:25 -07:00
GuoPhilipse
abfebd30f3
[MINOR] Update parameter description ( #1821 )
2020-07-11 22:57:12 +08:00
Pratyaksh Sharma
9627a385fe
[HUDI-916]: Added support for multiple input formats in TimestampBasedKeyGenerator ( #1648 )
2020-07-10 15:28:45 -04:00
Pratyaksh Sharma
c7f1a781ab
[HUDI-728]: Implemented custom key generator ( #1433 )
2020-07-09 07:35:07 -04:00
Trevor
d58644b657
[HUDI-1062]Remove unnecessary maxEvent check and add some log in KafkaOffsetGen ( #1779 )
2020-07-08 21:07:34 -07:00
Raymond Xu
3b9a30528b
[HUDI-996] Add functional test suite for hudi-utilities ( #1746 )
...
- Share resources for functional tests
- Add suite for functional test classes from hudi-utilities
2020-07-05 16:44:31 -07:00
Balaji Varadarajan
8919be6a5d
[HUDI-855] Run Cleaner async with writing ( #1577 )
...
- Cleaner can now run concurrently with write operation
- Configs to turn on/off
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2020-06-28 02:04:50 -07:00
Prashant Wason
2603cfb33e
[HUDI-684] Introduced abstraction for writing and reading different types of base file formats. ( #1687 )
...
Notable changes:
1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format
2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock)
3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format
4. HiveSyncTool accepts the base file format as a CLI parameter
5. HoodieDeltaStreamer accepts the base file format as a CLI parameter
6. HoodieSparkSqlWriter accepts the base file format as a parameter
2020-06-25 23:46:55 -07:00
Shen Hong
89e37d5273
[HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. ( #1690 )
2020-06-22 08:13:28 -07:00
Raymond Xu
8a9fdd603e
[HUDI-1023] Add validation error messages in delta sync ( #1710 )
...
- Remove explicitly specifying BLOOM_INDEX since thats the default anyway
2020-06-19 12:12:35 -07:00
Raymond Xu
ab724af5c4
[MINOR] Rename TestSourceConfig to SourceConfigs ( #1749 )
2020-06-19 12:08:19 -07:00
Litianye
ede6c9bda4
[HUDI-1006] Deltastreamer use kafkaSource with offset reset strategy:latest can't consume data ( #1719 )
2020-06-14 18:01:44 +08:00
liujinhui
97ab97b726
[HUDI-918] Fix kafkaOffsetGen can not read kafka data bug ( #1652 )
2020-06-08 20:46:47 +08:00
Raymond Xu
742c204099
[HUDI-811] Restructure test packages in hudi-client/cli ( #1689 )
2020-06-02 10:25:42 +08:00
Raymond Xu
03f136361a
[HUDI-811] Restructure test packages in hudi-common ( #1644 )
...
* [HUDI-811] Restructure test packages in hudi-common
2020-05-27 16:28:17 +08:00
Raymond Xu
6c450957ce
[HUDI-690] Filter out inflight compaction in exporter ( #1667 )
2020-05-26 09:23:34 -07:00
Balaji Varadarajan
74ecc27e92
[HUDI-846][HUDI-848] Enable Incremental cleaning and embedded timeline-server by default ( #1634 )
2020-05-20 05:29:43 -07:00
rolandjohann
244d47494e
[HUDI-888] fix NullPointerException in HoodieCompactor ( #1622 )
2020-05-20 04:22:35 -07:00
wenningd
0dc2fa6172
[MINOR] Fix HoodieCompactor config abbreviation ( #1642 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-05-19 21:03:54 -07:00
Joey
2600d2de8d
[MINOR] Fix apache-rat violations ( #1639 )
...
* MINOR Fix apache-rat violations. Also, enabling RAT for hudi-utilities and hudi-integ-test
2020-05-18 11:16:49 -07:00
Mathieu
25a0080b2f
[HUDI-714]Add javadoc and comments to hudi write method link ( #1409 )
...
* [HUDI-714] Add javadoc and comments to hudi write method link
2020-05-16 08:36:51 -04:00
Raymond Xu
2ada2ef50f
[HUDI-902] Avoid exception when getSchemaProvider ( #1584 )
...
* When no new input data, don't throw exception for null SchemaProvider
* Return the newly added NullSchemaProvider instead
2020-05-15 21:33:02 -07:00
Alexander Filipchik
25e0b75b3d
[HUDI-723] Register avro schema if infered from SQL transformation ( #1518 )
...
* Register avro schema if infered from SQL transformation
* Make HoodieWriteClient creation done lazily always. Handle setting schema-provider and avro-schemas correctly when using SQL transformer
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
Co-authored-by: Balaji Varadarajan <varadarb@uber.com >
2020-05-15 12:44:03 -07:00
Alexander Filipchik
f094f42857
[HUDI-843] Add ability to specify time unit for TimestampBasedKeyGenerator ( #1541 )
...
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2020-05-14 13:37:59 -07:00
hongdd
3a2fe13fcb
[HUDI-701] Add unit test for HDFSParquetImportCommand ( #1574 )
2020-05-14 19:15:49 +08:00
Raymond Xu
0d4848b68b
[HUDI-811] Restructure test packages ( #1607 )
...
* restructure hudi-spark tests
* restructure hudi-timeline-service tests
* restructure hudi-hadoop-mr hudi-utilities tests
* restructure hudi-hive-sync tests
2020-05-13 15:37:03 -07:00
liujinhui
5d37e66b7e
[MINOR] Fix HoodieNotSupportedException description in KafkaOffsetGen ( #1615 )
2020-05-11 23:14:43 +08:00
Raymond Xu
366bb10d8c
[HUDI-812] Migrate hudi common tests to JUnit 5 ( #1590 )
...
* [HUDI-812] Migrate hudi-common tests to JUnit 5
2020-05-06 19:15:20 +08:00
Raymond Xu
096f7f55b2
[HUDI-813] Migrate hudi-utilities tests to JUnit 5 ( #1589 )
2020-05-04 12:43:42 +08:00
Balaji Varadarajan
506447fd4f
[HUDI-850] Avoid unnecessary listings in incremental cleaning mode ( #1576 )
2020-05-01 21:37:21 -07:00
vinoth chandar
c4b71622b9
[MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability ( #1575 )
...
- reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc
2020-04-30 09:19:39 -07:00
Raymond Xu
06dae30297
[HUDI-810] Migrate ClientTestHarness to JUnit 5 ( #1553 )
2020-04-28 23:38:16 +08:00
dengziming
19cc15c098
[MINOR]: Fix cli docs for DeltaStreamer ( #1547 )
2020-04-22 11:37:17 -07:00
Raymond Xu
6e15eebd81
[HUDI-809] Migrate CommonTestHarness to JUnit 5 ( #1530 )
2020-04-22 14:10:25 +08:00
Alexander Filipchik
2a56f82908
[HUDI-821] Fixing JCommander param parsing in deltastreamer ( #1525 )
...
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
2020-04-21 20:12:34 -07:00
hongdd
84dd9047d3
[HUDI-789]Adjust logic of upsert in HDFSParquetImporter ( #1511 )
2020-04-21 14:21:30 +08:00
Alexander Filipchik
acb1ada2f7
[HUDI-799] Use appropriate FS when loading configs ( #1517 )
...
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
2020-04-16 13:49:39 -07:00
Raymond Xu
acdc4a8d00
[HUDI-798] Migrate to Mockito Jupiter for JUnit 5 ( #1521 )
2020-04-16 16:07:32 +08:00
Iftach Schonbaum
9ca710cb02
[HUDI-777] Updated description for --target-table parameter ( #1519 )
2020-04-15 14:56:13 -07:00
Raymond Xu
d65efe659d
[HUDI-780] Migrate test cases to Junit 5 ( #1504 )
2020-04-15 12:35:01 -07:00
Gary Li
14d4fea833
[HUDI-759] Integrate checkpoint provider with delta streamer ( #1486 )
2020-04-14 14:51:04 -07:00
Bhavani Sudha Saktheeswaran
8c7cef3e50
[HUDI - 738] Add validation to DeltaStreamer to fail fast when filterDupes is enabled on UPSERT mode. ( #1505 )
...
Summary:
This fix ensures for UPSERT operation, '--filter-dupes' is disabled and fails fast if not. Otherwise it would drop all updates silently and only take in new records.
2020-04-10 08:58:55 -07:00
Pratyaksh Sharma
d610252d6b
[HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment ( #1150 )
...
* [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment
2020-04-07 16:10:26 -07:00
YanJia-Gary-Li
575d87cf7d
HUDI-644 kafka connect checkpoint provider ( #1453 )
2020-04-03 18:57:34 -07:00
Ramachandran Madtas Subramaniam
639ec20412
[HUDI-562] Enable testing at debug log level
...
This is to ensure that tests will execute all code paths, even the ones
written under DEBUG log levels. This will improve coverage as well as
ensure there are no surprised when DEBUG log level is enabled in
production.
2020-04-02 11:14:35 -07:00
Raymond Xu
5b53b0d85e
[HUDI-731] Add ChainedTransformer ( #1440 )
...
* [HUDI-731] Add ChainedTransformer
2020-04-01 23:21:31 +08:00
Shaofeng Shi
78b3194e82
[HUDI-751] Fix some coding issues reported by FindBugs ( #1470 )
2020-03-31 21:19:32 +08:00