1
0
Commit Graph

88 Commits

Author SHA1 Message Date
Sivabalan Narayanan
ab11ba43e1 [REVERT] "[HUDI-1058] Make delete marker configurable (#1819)" (#1914)
This reverts commit 433d7d2c98.
2020-08-04 15:20:38 -07:00
vinoth chandar
539621bd33 [HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876)
- [HUDI-418] Bootstrap Index Implementation using HFile with unit-test
 - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests
 - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices
 - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices
 - [HUDI-421] Bootstrap Write Client with tests
 - [HUDI-425] Added HoodieDeltaStreamer support
 - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap
 - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly
 - [HUDI-424] Simplify Record reader implementation
 - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices
 - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables

Co-authored-by: Mehrotra <uditme@amazon.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
Co-authored-by: Balaji Varadarajan <varadarb@uber.com>
2020-08-03 20:19:21 -07:00
Shen Hong
433d7d2c98 [HUDI-1058] Make delete marker configurable (#1819) 2020-08-03 11:06:31 -04:00
n3nash
727f1df62c [MINOR] Suppressing spark logs for hudi-integ and hudi-utilities (#1894) 2020-07-31 19:01:25 -07:00
Nishith Agarwal
2fc2b01d86 [HUDI-394] Provide a basic implementation of test suite 2020-07-30 21:21:15 -07:00
Raymond Xu
ca36c44cb3 [HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common (#1873) 2020-07-27 19:21:45 +08:00
Sivabalan Narayanan
5b6026ba43 [HUDI-802] Fixing deletes for inserts in same batch in write path (#1792)
* Fixing deletes for inserts in same batch in write path
* Fixing delta streamer tests
* Adding tests for OverwriteWithLatestAvroPayload
2020-07-22 19:39:57 -07:00
lw0090
1ec89e9a94 [HUDI-839] Introducing support for rollbacks using marker files (#1756)
* [HUDI-839] Introducing rollback strategy using marker files

 - Adds a new mechanism for rollbacks where it's based on the marker files generated during the write
 - Consequently, marker file/dir deletion now happens post commit, instead of during finalize 
 - Marker files are also generated for AppendHandle, making it consistent throughout the write path 
 - Until upgrade-downgrade mechanism can upgrade non-marker based inflight writes to marker based, this should only be turned on for new datasets.
 - Added marker dir deletion after successful commit/rollback, individual files are not deleted during finalize
 - Fail safe for deleting marker directories, now during timeline archival process
 - Added check to ensure completed instants are not rolled back using marker based strategy. This will be incorrect
 - Reworked tests to rollback inflight instants, instead of completed instants whenever necessary
 - Added an unit test for MarkerBasedRollbackStrategy


Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-07-20 22:41:42 -07:00
Pratyaksh Sharma
c7f1a781ab [HUDI-728]: Implemented custom key generator (#1433) 2020-07-09 07:35:07 -04:00
Trevor
d58644b657 [HUDI-1062]Remove unnecessary maxEvent check and add some log in KafkaOffsetGen (#1779) 2020-07-08 21:07:34 -07:00
Raymond Xu
3b9a30528b [HUDI-996] Add functional test suite for hudi-utilities (#1746)
- Share resources for functional tests
- Add suite for functional test classes from hudi-utilities
2020-07-05 16:44:31 -07:00
Shen Hong
89e37d5273 [HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690) 2020-06-22 08:13:28 -07:00
Raymond Xu
ab724af5c4 [MINOR] Rename TestSourceConfig to SourceConfigs (#1749) 2020-06-19 12:08:19 -07:00
Litianye
ede6c9bda4 [HUDI-1006] Deltastreamer use kafkaSource with offset reset strategy:latest can't consume data (#1719) 2020-06-14 18:01:44 +08:00
Raymond Xu
742c204099 [HUDI-811] Restructure test packages in hudi-client/cli (#1689) 2020-06-02 10:25:42 +08:00
Raymond Xu
03f136361a [HUDI-811] Restructure test packages in hudi-common (#1644)
* [HUDI-811] Restructure test packages in hudi-common
2020-05-27 16:28:17 +08:00
Joey
2600d2de8d [MINOR] Fix apache-rat violations (#1639)
* MINOR Fix apache-rat violations. Also, enabling RAT for hudi-utilities and hudi-integ-test
2020-05-18 11:16:49 -07:00
Raymond Xu
2ada2ef50f [HUDI-902] Avoid exception when getSchemaProvider (#1584)
* When no new input data, don't throw exception for null SchemaProvider
* Return the newly added NullSchemaProvider instead
2020-05-15 21:33:02 -07:00
Alexander Filipchik
f094f42857 [HUDI-843] Add ability to specify time unit for TimestampBasedKeyGenerator (#1541)
Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-05-14 13:37:59 -07:00
hongdd
3a2fe13fcb [HUDI-701] Add unit test for HDFSParquetImportCommand (#1574) 2020-05-14 19:15:49 +08:00
Raymond Xu
0d4848b68b [HUDI-811] Restructure test packages (#1607)
* restructure hudi-spark tests
* restructure hudi-timeline-service tests
* restructure hudi-hadoop-mr hudi-utilities tests
* restructure hudi-hive-sync tests
2020-05-13 15:37:03 -07:00
Raymond Xu
366bb10d8c [HUDI-812] Migrate hudi common tests to JUnit 5 (#1590)
* [HUDI-812] Migrate hudi-common tests to JUnit 5
2020-05-06 19:15:20 +08:00
Raymond Xu
096f7f55b2 [HUDI-813] Migrate hudi-utilities tests to JUnit 5 (#1589) 2020-05-04 12:43:42 +08:00
Balaji Varadarajan
506447fd4f [HUDI-850] Avoid unnecessary listings in incremental cleaning mode (#1576) 2020-05-01 21:37:21 -07:00
vinoth chandar
c4b71622b9 [MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability (#1575)
- reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc
2020-04-30 09:19:39 -07:00
Raymond Xu
06dae30297 [HUDI-810] Migrate ClientTestHarness to JUnit 5 (#1553) 2020-04-28 23:38:16 +08:00
Raymond Xu
6e15eebd81 [HUDI-809] Migrate CommonTestHarness to JUnit 5 (#1530) 2020-04-22 14:10:25 +08:00
hongdd
84dd9047d3 [HUDI-789]Adjust logic of upsert in HDFSParquetImporter (#1511) 2020-04-21 14:21:30 +08:00
Raymond Xu
d65efe659d [HUDI-780] Migrate test cases to Junit 5 (#1504) 2020-04-15 12:35:01 -07:00
Gary Li
14d4fea833 [HUDI-759] Integrate checkpoint provider with delta streamer (#1486) 2020-04-14 14:51:04 -07:00
Bhavani Sudha Saktheeswaran
8c7cef3e50 [HUDI - 738] Add validation to DeltaStreamer to fail fast when filterDupes is enabled on UPSERT mode. (#1505)
Summary:
This fix ensures for UPSERT operation, '--filter-dupes' is disabled and fails fast if not. Otherwise it would drop all updates silently and only take in new records.
2020-04-10 08:58:55 -07:00
Pratyaksh Sharma
d610252d6b [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment (#1150)
* [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment
2020-04-07 16:10:26 -07:00
YanJia-Gary-Li
575d87cf7d HUDI-644 kafka connect checkpoint provider (#1453) 2020-04-03 18:57:34 -07:00
Ramachandran Madtas Subramaniam
639ec20412 [HUDI-562] Enable testing at debug log level
This is to ensure that tests will execute all code paths, even the ones
written under DEBUG log levels. This will improve coverage as well as
ensure there are no surprised when DEBUG log level is enabled in
production.
2020-04-02 11:14:35 -07:00
Raymond Xu
5b53b0d85e [HUDI-731] Add ChainedTransformer (#1440)
* [HUDI-731] Add ChainedTransformer
2020-04-01 23:21:31 +08:00
wenningd
ce0a4c64d0 [HUDI-713] Fix conversion of Spark array of struct type to Avro schema (#1406)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2020-03-30 15:52:15 -07:00
Suneel Marthi
fa36082554 [HUDI-746] Reduce build warnings < 10 (#1465) 2020-03-30 11:46:52 +08:00
vinoth chandar
e057c27603 [HUDI-744] Restructure hudi-common and clean up files under util packages (#1462)
- Brings more order and cohesion to the classes in hudi-common
 - Utils classes related to a particular concept (avro, timeline,...) are placed near to the package
 - common.fs package now contains all the filesystem level classes including wrapper filesystem
 - bloom.filter package renamed to just bloom
 - config package contains classes that help store properties
 - common.fs.inline package contains all the inline filesystem classes/impl
 - common.table.timeline now consolidates all timeline related classes
 - common.table.view consolidates all the classes related to filesystem view metadata
 - common.table.timeline.versioning contains all classes related to versioning of timeline
 - Fix few unit tests as a result
 - Moved the test packages around to match the source file move
 - Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos
2020-03-29 10:58:49 -07:00
Sivabalan Narayanan
ac73bdcdc3 [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile (#1176)
* Adding InlineFileSystem to support embedding any file format (parquet, hfile, etc). Supports reading the embedded file using respective readers.
2020-03-28 12:13:35 -04:00
Suneel Marthi
8c3001363d HUDI-479: Eliminate or Minimize use of Guava if possible (#1159) 2020-03-28 03:11:32 -04:00
Raymond Xu
bc82e2be6c [HUDI-711] Refactor exporter main logic (#1436)
* Refactor exporter main logic
* break main method into multiple readable methods
* fix bug of passing wrong file list
* avoid deleting output path when exists
* throw exception to early abort on multiple cases
* use JavaSparkContext instead of SparkSession
* improve unit test for expected exceptions
2020-03-25 18:02:24 +08:00
Zhiyuan Zhao
0241b21f77 [HUDI-65] commitTime rename to instantTime (#1431) 2020-03-22 18:06:00 -07:00
Pratyaksh Sharma
1e1d9e1d34 [HUDI-616] Fixed parquet files getting created on local FS (#1434) 2020-03-22 22:19:47 +08:00
Sivabalan Narayanan
a752b7b18c Merge pull request #1165 from yihua/HUDI-76-deltastreamer-csv-source
[HUDI-76] Add CSV Source support for Hudi Delta Streamer
2020-03-19 10:00:53 -04:00
Raymond Xu
779edc0688 [HUDI-344] Add partitioner param to Exporter (#1405) 2020-03-18 19:24:04 +08:00
Y Ethan Guo
cf765df606 [HUDI-76] Add CSV Source support for Hudi Delta Streamer 2020-03-15 19:03:37 -07:00
Raymond Xu
14323cb100 [HUDI-344] Improve exporter tests (#1404) 2020-03-15 20:24:30 +08:00
Sivabalan Narayanan
1ca912af09 [HUDI-667] Fixing delete tests for DeltaStreamer (#1395) 2020-03-11 16:19:23 -07:00
openopen2
44700d531a [HUDI-344] Hudi Dataset Snapshot Exporter (#1360)
Co-authored-by: jason1993 <261049174@qq.com>
2020-03-10 09:17:51 +08:00
vinoth chandar
71170fafe7 [HUDI-554] Cleanup package structure in hudi-client (#1346)
- Just package, class moves and renames with the following intent
 - `client` now has all the various client classes, that do the transaction management
 - `func` renamed to `execution` and some helpers moved to `client/utils`
 - All compaction code under `io` now under `table/compact`
 - Rollback code under `table/rollback` and in general all code for individual operations under `table`
 - `exception` `config`, `metrics` left untouched
 - Moved the tests also accordingly
 - Fixed some flaky tests
2020-02-27 08:05:58 -08:00