Shen Hong
433d7d2c98
[HUDI-1058] Make delete marker configurable ( #1819 )
2020-08-03 11:06:31 -04:00
n3nash
727f1df62c
[MINOR] Suppressing spark logs for hudi-integ and hudi-utilities ( #1894 )
2020-07-31 19:01:25 -07:00
Nishith Agarwal
2fc2b01d86
[HUDI-394] Provide a basic implementation of test suite
2020-07-30 21:21:15 -07:00
Raymond Xu
ca36c44cb3
[HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common ( #1873 )
2020-07-27 19:21:45 +08:00
Sivabalan Narayanan
5b6026ba43
[HUDI-802] Fixing deletes for inserts in same batch in write path ( #1792 )
...
* Fixing deletes for inserts in same batch in write path
* Fixing delta streamer tests
* Adding tests for OverwriteWithLatestAvroPayload
2020-07-22 19:39:57 -07:00
lw0090
1ec89e9a94
[HUDI-839] Introducing support for rollbacks using marker files ( #1756 )
...
* [HUDI-839] Introducing rollback strategy using marker files
- Adds a new mechanism for rollbacks where it's based on the marker files generated during the write
- Consequently, marker file/dir deletion now happens post commit, instead of during finalize
- Marker files are also generated for AppendHandle, making it consistent throughout the write path
- Until upgrade-downgrade mechanism can upgrade non-marker based inflight writes to marker based, this should only be turned on for new datasets.
- Added marker dir deletion after successful commit/rollback, individual files are not deleted during finalize
- Fail safe for deleting marker directories, now during timeline archival process
- Added check to ensure completed instants are not rolled back using marker based strategy. This will be incorrect
- Reworked tests to rollback inflight instants, instead of completed instants whenever necessary
- Added an unit test for MarkerBasedRollbackStrategy
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2020-07-20 22:41:42 -07:00
Prashant Wason
b71f25f210
[HUDI-92] Provide reasonable names for Spark DAG stages in HUDI. ( #1289 )
2020-07-19 10:29:25 -07:00
GuoPhilipse
abfebd30f3
[MINOR] Update parameter description ( #1821 )
2020-07-11 22:57:12 +08:00
Pratyaksh Sharma
9627a385fe
[HUDI-916]: Added support for multiple input formats in TimestampBasedKeyGenerator ( #1648 )
2020-07-10 15:28:45 -04:00
Pratyaksh Sharma
c7f1a781ab
[HUDI-728]: Implemented custom key generator ( #1433 )
2020-07-09 07:35:07 -04:00
Trevor
d58644b657
[HUDI-1062]Remove unnecessary maxEvent check and add some log in KafkaOffsetGen ( #1779 )
2020-07-08 21:07:34 -07:00
Raymond Xu
3b9a30528b
[HUDI-996] Add functional test suite for hudi-utilities ( #1746 )
...
- Share resources for functional tests
- Add suite for functional test classes from hudi-utilities
2020-07-05 16:44:31 -07:00
Balaji Varadarajan
8919be6a5d
[HUDI-855] Run Cleaner async with writing ( #1577 )
...
- Cleaner can now run concurrently with write operation
- Configs to turn on/off
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2020-06-28 02:04:50 -07:00
Prashant Wason
2603cfb33e
[HUDI-684] Introduced abstraction for writing and reading different types of base file formats. ( #1687 )
...
Notable changes:
1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format
2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock)
3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format
4. HiveSyncTool accepts the base file format as a CLI parameter
5. HoodieDeltaStreamer accepts the base file format as a CLI parameter
6. HoodieSparkSqlWriter accepts the base file format as a parameter
2020-06-25 23:46:55 -07:00
Shen Hong
89e37d5273
[HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. ( #1690 )
2020-06-22 08:13:28 -07:00
Raymond Xu
8a9fdd603e
[HUDI-1023] Add validation error messages in delta sync ( #1710 )
...
- Remove explicitly specifying BLOOM_INDEX since thats the default anyway
2020-06-19 12:12:35 -07:00
Raymond Xu
ab724af5c4
[MINOR] Rename TestSourceConfig to SourceConfigs ( #1749 )
2020-06-19 12:08:19 -07:00
Litianye
ede6c9bda4
[HUDI-1006] Deltastreamer use kafkaSource with offset reset strategy:latest can't consume data ( #1719 )
2020-06-14 18:01:44 +08:00
liujinhui
97ab97b726
[HUDI-918] Fix kafkaOffsetGen can not read kafka data bug ( #1652 )
2020-06-08 20:46:47 +08:00
Raymond Xu
742c204099
[HUDI-811] Restructure test packages in hudi-client/cli ( #1689 )
2020-06-02 10:25:42 +08:00
Raymond Xu
03f136361a
[HUDI-811] Restructure test packages in hudi-common ( #1644 )
...
* [HUDI-811] Restructure test packages in hudi-common
2020-05-27 16:28:17 +08:00
Raymond Xu
6c450957ce
[HUDI-690] Filter out inflight compaction in exporter ( #1667 )
2020-05-26 09:23:34 -07:00
Balaji Varadarajan
74ecc27e92
[HUDI-846][HUDI-848] Enable Incremental cleaning and embedded timeline-server by default ( #1634 )
2020-05-20 05:29:43 -07:00
rolandjohann
244d47494e
[HUDI-888] fix NullPointerException in HoodieCompactor ( #1622 )
2020-05-20 04:22:35 -07:00
wenningd
0dc2fa6172
[MINOR] Fix HoodieCompactor config abbreviation ( #1642 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-05-19 21:03:54 -07:00
Joey
2600d2de8d
[MINOR] Fix apache-rat violations ( #1639 )
...
* MINOR Fix apache-rat violations. Also, enabling RAT for hudi-utilities and hudi-integ-test
2020-05-18 11:16:49 -07:00
Mathieu
25a0080b2f
[HUDI-714]Add javadoc and comments to hudi write method link ( #1409 )
...
* [HUDI-714] Add javadoc and comments to hudi write method link
2020-05-16 08:36:51 -04:00
Raymond Xu
2ada2ef50f
[HUDI-902] Avoid exception when getSchemaProvider ( #1584 )
...
* When no new input data, don't throw exception for null SchemaProvider
* Return the newly added NullSchemaProvider instead
2020-05-15 21:33:02 -07:00
Alexander Filipchik
25e0b75b3d
[HUDI-723] Register avro schema if infered from SQL transformation ( #1518 )
...
* Register avro schema if infered from SQL transformation
* Make HoodieWriteClient creation done lazily always. Handle setting schema-provider and avro-schemas correctly when using SQL transformer
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
Co-authored-by: Balaji Varadarajan <varadarb@uber.com >
2020-05-15 12:44:03 -07:00
Alexander Filipchik
f094f42857
[HUDI-843] Add ability to specify time unit for TimestampBasedKeyGenerator ( #1541 )
...
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2020-05-14 13:37:59 -07:00
hongdd
3a2fe13fcb
[HUDI-701] Add unit test for HDFSParquetImportCommand ( #1574 )
2020-05-14 19:15:49 +08:00
Raymond Xu
0d4848b68b
[HUDI-811] Restructure test packages ( #1607 )
...
* restructure hudi-spark tests
* restructure hudi-timeline-service tests
* restructure hudi-hadoop-mr hudi-utilities tests
* restructure hudi-hive-sync tests
2020-05-13 15:37:03 -07:00
liujinhui
5d37e66b7e
[MINOR] Fix HoodieNotSupportedException description in KafkaOffsetGen ( #1615 )
2020-05-11 23:14:43 +08:00
Raymond Xu
366bb10d8c
[HUDI-812] Migrate hudi common tests to JUnit 5 ( #1590 )
...
* [HUDI-812] Migrate hudi-common tests to JUnit 5
2020-05-06 19:15:20 +08:00
Raymond Xu
096f7f55b2
[HUDI-813] Migrate hudi-utilities tests to JUnit 5 ( #1589 )
2020-05-04 12:43:42 +08:00
Balaji Varadarajan
506447fd4f
[HUDI-850] Avoid unnecessary listings in incremental cleaning mode ( #1576 )
2020-05-01 21:37:21 -07:00
vinoth chandar
c4b71622b9
[MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability ( #1575 )
...
- reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc
2020-04-30 09:19:39 -07:00
Raymond Xu
06dae30297
[HUDI-810] Migrate ClientTestHarness to JUnit 5 ( #1553 )
2020-04-28 23:38:16 +08:00
dengziming
19cc15c098
[MINOR]: Fix cli docs for DeltaStreamer ( #1547 )
2020-04-22 11:37:17 -07:00
Raymond Xu
6e15eebd81
[HUDI-809] Migrate CommonTestHarness to JUnit 5 ( #1530 )
2020-04-22 14:10:25 +08:00
Alexander Filipchik
2a56f82908
[HUDI-821] Fixing JCommander param parsing in deltastreamer ( #1525 )
...
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
2020-04-21 20:12:34 -07:00
hongdd
84dd9047d3
[HUDI-789]Adjust logic of upsert in HDFSParquetImporter ( #1511 )
2020-04-21 14:21:30 +08:00
Alexander Filipchik
acb1ada2f7
[HUDI-799] Use appropriate FS when loading configs ( #1517 )
...
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
2020-04-16 13:49:39 -07:00
Raymond Xu
acdc4a8d00
[HUDI-798] Migrate to Mockito Jupiter for JUnit 5 ( #1521 )
2020-04-16 16:07:32 +08:00
Iftach Schonbaum
9ca710cb02
[HUDI-777] Updated description for --target-table parameter ( #1519 )
2020-04-15 14:56:13 -07:00
Raymond Xu
d65efe659d
[HUDI-780] Migrate test cases to Junit 5 ( #1504 )
2020-04-15 12:35:01 -07:00
Gary Li
14d4fea833
[HUDI-759] Integrate checkpoint provider with delta streamer ( #1486 )
2020-04-14 14:51:04 -07:00
Bhavani Sudha Saktheeswaran
8c7cef3e50
[HUDI - 738] Add validation to DeltaStreamer to fail fast when filterDupes is enabled on UPSERT mode. ( #1505 )
...
Summary:
This fix ensures for UPSERT operation, '--filter-dupes' is disabled and fails fast if not. Otherwise it would drop all updates silently and only take in new records.
2020-04-10 08:58:55 -07:00
Pratyaksh Sharma
d610252d6b
[HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment ( #1150 )
...
* [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment
2020-04-07 16:10:26 -07:00
YanJia-Gary-Li
575d87cf7d
HUDI-644 kafka connect checkpoint provider ( #1453 )
2020-04-03 18:57:34 -07:00