Prashant Wason
b71f25f210
[HUDI-92] Provide reasonable names for Spark DAG stages in HUDI. ( #1289 )
2020-07-19 10:29:25 -07:00
GuoPhilipse
abfebd30f3
[MINOR] Update parameter description ( #1821 )
2020-07-11 22:57:12 +08:00
Pratyaksh Sharma
9627a385fe
[HUDI-916]: Added support for multiple input formats in TimestampBasedKeyGenerator ( #1648 )
2020-07-10 15:28:45 -04:00
Pratyaksh Sharma
c7f1a781ab
[HUDI-728]: Implemented custom key generator ( #1433 )
2020-07-09 07:35:07 -04:00
Trevor
d58644b657
[HUDI-1062]Remove unnecessary maxEvent check and add some log in KafkaOffsetGen ( #1779 )
2020-07-08 21:07:34 -07:00
Raymond Xu
3b9a30528b
[HUDI-996] Add functional test suite for hudi-utilities ( #1746 )
...
- Share resources for functional tests
- Add suite for functional test classes from hudi-utilities
2020-07-05 16:44:31 -07:00
Balaji Varadarajan
8919be6a5d
[HUDI-855] Run Cleaner async with writing ( #1577 )
...
- Cleaner can now run concurrently with write operation
- Configs to turn on/off
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2020-06-28 02:04:50 -07:00
Prashant Wason
2603cfb33e
[HUDI-684] Introduced abstraction for writing and reading different types of base file formats. ( #1687 )
...
Notable changes:
1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format
2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock)
3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format
4. HiveSyncTool accepts the base file format as a CLI parameter
5. HoodieDeltaStreamer accepts the base file format as a CLI parameter
6. HoodieSparkSqlWriter accepts the base file format as a parameter
2020-06-25 23:46:55 -07:00
Shen Hong
89e37d5273
[HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. ( #1690 )
2020-06-22 08:13:28 -07:00
Raymond Xu
8a9fdd603e
[HUDI-1023] Add validation error messages in delta sync ( #1710 )
...
- Remove explicitly specifying BLOOM_INDEX since thats the default anyway
2020-06-19 12:12:35 -07:00
Raymond Xu
ab724af5c4
[MINOR] Rename TestSourceConfig to SourceConfigs ( #1749 )
2020-06-19 12:08:19 -07:00
Litianye
ede6c9bda4
[HUDI-1006] Deltastreamer use kafkaSource with offset reset strategy:latest can't consume data ( #1719 )
2020-06-14 18:01:44 +08:00
liujinhui
97ab97b726
[HUDI-918] Fix kafkaOffsetGen can not read kafka data bug ( #1652 )
2020-06-08 20:46:47 +08:00
Raymond Xu
742c204099
[HUDI-811] Restructure test packages in hudi-client/cli ( #1689 )
2020-06-02 10:25:42 +08:00
Raymond Xu
03f136361a
[HUDI-811] Restructure test packages in hudi-common ( #1644 )
...
* [HUDI-811] Restructure test packages in hudi-common
2020-05-27 16:28:17 +08:00
Raymond Xu
6c450957ce
[HUDI-690] Filter out inflight compaction in exporter ( #1667 )
2020-05-26 09:23:34 -07:00
Balaji Varadarajan
74ecc27e92
[HUDI-846][HUDI-848] Enable Incremental cleaning and embedded timeline-server by default ( #1634 )
2020-05-20 05:29:43 -07:00
rolandjohann
244d47494e
[HUDI-888] fix NullPointerException in HoodieCompactor ( #1622 )
2020-05-20 04:22:35 -07:00
wenningd
0dc2fa6172
[MINOR] Fix HoodieCompactor config abbreviation ( #1642 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-05-19 21:03:54 -07:00
Joey
2600d2de8d
[MINOR] Fix apache-rat violations ( #1639 )
...
* MINOR Fix apache-rat violations. Also, enabling RAT for hudi-utilities and hudi-integ-test
2020-05-18 11:16:49 -07:00
Mathieu
25a0080b2f
[HUDI-714]Add javadoc and comments to hudi write method link ( #1409 )
...
* [HUDI-714] Add javadoc and comments to hudi write method link
2020-05-16 08:36:51 -04:00
Raymond Xu
2ada2ef50f
[HUDI-902] Avoid exception when getSchemaProvider ( #1584 )
...
* When no new input data, don't throw exception for null SchemaProvider
* Return the newly added NullSchemaProvider instead
2020-05-15 21:33:02 -07:00
Alexander Filipchik
25e0b75b3d
[HUDI-723] Register avro schema if infered from SQL transformation ( #1518 )
...
* Register avro schema if infered from SQL transformation
* Make HoodieWriteClient creation done lazily always. Handle setting schema-provider and avro-schemas correctly when using SQL transformer
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
Co-authored-by: Balaji Varadarajan <varadarb@uber.com >
2020-05-15 12:44:03 -07:00
Alexander Filipchik
f094f42857
[HUDI-843] Add ability to specify time unit for TimestampBasedKeyGenerator ( #1541 )
...
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
Co-authored-by: Vinoth Chandar <vinoth@apache.org >
2020-05-14 13:37:59 -07:00
hongdd
3a2fe13fcb
[HUDI-701] Add unit test for HDFSParquetImportCommand ( #1574 )
2020-05-14 19:15:49 +08:00
Raymond Xu
0d4848b68b
[HUDI-811] Restructure test packages ( #1607 )
...
* restructure hudi-spark tests
* restructure hudi-timeline-service tests
* restructure hudi-hadoop-mr hudi-utilities tests
* restructure hudi-hive-sync tests
2020-05-13 15:37:03 -07:00
liujinhui
5d37e66b7e
[MINOR] Fix HoodieNotSupportedException description in KafkaOffsetGen ( #1615 )
2020-05-11 23:14:43 +08:00
Raymond Xu
366bb10d8c
[HUDI-812] Migrate hudi common tests to JUnit 5 ( #1590 )
...
* [HUDI-812] Migrate hudi-common tests to JUnit 5
2020-05-06 19:15:20 +08:00
Raymond Xu
096f7f55b2
[HUDI-813] Migrate hudi-utilities tests to JUnit 5 ( #1589 )
2020-05-04 12:43:42 +08:00
Balaji Varadarajan
506447fd4f
[HUDI-850] Avoid unnecessary listings in incremental cleaning mode ( #1576 )
2020-05-01 21:37:21 -07:00
vinoth chandar
c4b71622b9
[MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability ( #1575 )
...
- reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc
2020-04-30 09:19:39 -07:00
Raymond Xu
06dae30297
[HUDI-810] Migrate ClientTestHarness to JUnit 5 ( #1553 )
2020-04-28 23:38:16 +08:00
dengziming
19cc15c098
[MINOR]: Fix cli docs for DeltaStreamer ( #1547 )
2020-04-22 11:37:17 -07:00
Raymond Xu
6e15eebd81
[HUDI-809] Migrate CommonTestHarness to JUnit 5 ( #1530 )
2020-04-22 14:10:25 +08:00
Alexander Filipchik
2a56f82908
[HUDI-821] Fixing JCommander param parsing in deltastreamer ( #1525 )
...
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
2020-04-21 20:12:34 -07:00
hongdd
84dd9047d3
[HUDI-789]Adjust logic of upsert in HDFSParquetImporter ( #1511 )
2020-04-21 14:21:30 +08:00
Alexander Filipchik
acb1ada2f7
[HUDI-799] Use appropriate FS when loading configs ( #1517 )
...
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com >
2020-04-16 13:49:39 -07:00
Raymond Xu
acdc4a8d00
[HUDI-798] Migrate to Mockito Jupiter for JUnit 5 ( #1521 )
2020-04-16 16:07:32 +08:00
Iftach Schonbaum
9ca710cb02
[HUDI-777] Updated description for --target-table parameter ( #1519 )
2020-04-15 14:56:13 -07:00
Raymond Xu
d65efe659d
[HUDI-780] Migrate test cases to Junit 5 ( #1504 )
2020-04-15 12:35:01 -07:00
Gary Li
14d4fea833
[HUDI-759] Integrate checkpoint provider with delta streamer ( #1486 )
2020-04-14 14:51:04 -07:00
Bhavani Sudha Saktheeswaran
8c7cef3e50
[HUDI - 738] Add validation to DeltaStreamer to fail fast when filterDupes is enabled on UPSERT mode. ( #1505 )
...
Summary:
This fix ensures for UPSERT operation, '--filter-dupes' is disabled and fails fast if not. Otherwise it would drop all updates silently and only take in new records.
2020-04-10 08:58:55 -07:00
Pratyaksh Sharma
d610252d6b
[HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment ( #1150 )
...
* [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment
2020-04-07 16:10:26 -07:00
YanJia-Gary-Li
575d87cf7d
HUDI-644 kafka connect checkpoint provider ( #1453 )
2020-04-03 18:57:34 -07:00
Ramachandran Madtas Subramaniam
639ec20412
[HUDI-562] Enable testing at debug log level
...
This is to ensure that tests will execute all code paths, even the ones
written under DEBUG log levels. This will improve coverage as well as
ensure there are no surprised when DEBUG log level is enabled in
production.
2020-04-02 11:14:35 -07:00
Raymond Xu
5b53b0d85e
[HUDI-731] Add ChainedTransformer ( #1440 )
...
* [HUDI-731] Add ChainedTransformer
2020-04-01 23:21:31 +08:00
Shaofeng Shi
78b3194e82
[HUDI-751] Fix some coding issues reported by FindBugs ( #1470 )
2020-03-31 21:19:32 +08:00
wenningd
ce0a4c64d0
[HUDI-713] Fix conversion of Spark array of struct type to Avro schema ( #1406 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2020-03-30 15:52:15 -07:00
Suneel Marthi
fa36082554
[HUDI-746] Reduce build warnings < 10 ( #1465 )
2020-03-30 11:46:52 +08:00
vinoth chandar
e057c27603
[HUDI-744] Restructure hudi-common and clean up files under util packages ( #1462 )
...
- Brings more order and cohesion to the classes in hudi-common
- Utils classes related to a particular concept (avro, timeline,...) are placed near to the package
- common.fs package now contains all the filesystem level classes including wrapper filesystem
- bloom.filter package renamed to just bloom
- config package contains classes that help store properties
- common.fs.inline package contains all the inline filesystem classes/impl
- common.table.timeline now consolidates all timeline related classes
- common.table.view consolidates all the classes related to filesystem view metadata
- common.table.timeline.versioning contains all classes related to versioning of timeline
- Fix few unit tests as a result
- Moved the test packages around to match the source file move
- Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos
2020-03-29 10:58:49 -07:00