lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Udit Mehrotra	4e64226844	[HUDI-1450] Use metadata table for listing in HoodieROTablePathFilter (apache#2326) [HUDI-1394] [RFC-15] Use metadata table (if present) to get all partition paths (apache#2351)	2021-01-04 07:59:47 -08:00
lw0090	9e6889a8ce	[HUDI-1481] add structured streaming and delta streamer clustering unit test (#2360 )	2020-12-27 20:27:09 -08:00
Bhavani Sudha Saktheeswaran	14d5d1100c	[HUDI-1406] Add date partition based source input selector for Delta streamer (#2264 ) - Adds ability to list only recent date based partitions from source data. - Parallelizes listing for faster tailing of DFSSources	2020-12-17 03:59:30 -08:00
liujinhui	62b392b49c	[HUDI-1343] Add standard schema postprocessor which would rewrite the schema using spark-avro conversion (#2192 ) Co-authored-by: liujh <liujh@t3go.cn>	2020-12-03 19:28:34 -08:00
wangxianghu	a23230c8c2	[HUDI-1400] Replace Operation enum with WriteOperationType (#2259 )	2020-11-19 13:40:04 +08:00
Ho Tien Vu	af5ef4d49d	[HUDI-1330] handle prefix filtering at directory level (#2157 ) The current DFSPathSelector only ignore prefix(_, .) at the file level while files under subdirectories e.g. (.checkpoint/*) are still considered which result in bad-format exception during reading.	2020-10-20 23:20:19 -07:00
lw0090	2126f13e13	[HUDI-791] Replace null by Option in Delta Streamer (#2171 )	2020-10-11 18:29:57 -07:00
Mathieu	1f7add9291	[HUDI-1089] Refactor hudi-client to support multi-engine (#1827 ) - This change breaks `hudi-client` into `hudi-client-common` and `hudi-spark-client` modules - Simple usages of Spark using jsc.parallelize() has been redone using EngineContext#map, EngineContext#flatMap etc - Code changes in the PR, break classes into `BaseXYZ` parent classes with no spark dependencies living in `hudi-client-common` - Classes on `hudi-spark-client` are named `SparkXYZ` extending the parent classes with all the Spark dependencies - To simplify/cleanup, HoodieIndex#fetchRecordLocation has been removed and its usages in tests replaced with alternatives Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-01 14:25:29 -07:00
liujinhui	20b9b399c9	[HUDI-1233] Deltastreamer Kafka consumption delay reporting indicators (#2074 )	2020-09-29 13:44:31 +08:00
Alexander Filipchik	c8e19e2def	[HUDI-801] Adding a way to post process schema after it is fetched (#1524 ) * [HUDI-801] Adding a way to post process schema after it is fetched Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com> Co-authored-by: Balaji Varadarajan <balaji.varadarajan@robinhood.com>	2020-09-19 11:18:36 -07:00
shenh062326	581d54097c	[HUDI-1143] Change timestamp field in HoodieTestDataGenerator from double to long	2020-09-15 20:58:29 -07:00
Abhishek Modi	53d1e55110	Test Suite should work with Docker + Unit Tests	2020-09-08 22:41:14 -07:00
Dongwook	8d19ebfd0f	[HUDI-993] Let delete API use "hoodie.delete.shuffle.parallelism" (#1703 ) For Delete API, "hoodie.delete.shuffle.parallelism" isn't used as opposed to "hoodie.upsert.shuffle.parallelism" is used for upsert, this creates the performance difference between delete by upsert API with "EmptyHoodieRecordPayload" and delete API for certain cases. This patch makes the following fixes in this regard. - Let deduplicateKeys method use "hoodie.delete.shuffle.parallelism" - Repartition inputRDD as "hoodie.delete.shuffle.parallelism" in case "hoodie.combine.before.delete=false"	2020-09-01 12:55:31 -04:00
Udit Mehrotra	e4a2d98f79	[HUDI-426] Bootstrap datasource integration (#1702 )	2020-08-09 14:06:13 -07:00
Sreeram Ramji	217a84192c	[HUDI-1140] Fix Jcommander issue for --hoodie-conf in DeltaStreamer (#1898 )	2020-08-04 21:42:51 -07:00
Sivabalan Narayanan	ab11ba43e1	[REVERT] "[HUDI-1058] Make delete marker configurable (#1819 )" (#1914 ) This reverts commit `433d7d2c98`.	2020-08-04 15:20:38 -07:00
vinoth chandar	539621bd33	[HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876 ) - [HUDI-418] Bootstrap Index Implementation using HFile with unit-test - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-421] Bootstrap Write Client with tests - [HUDI-425] Added HoodieDeltaStreamer support - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly - [HUDI-424] Simplify Record reader implementation - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables Co-authored-by: Mehrotra <uditme@amazon.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-08-03 20:19:21 -07:00
Shen Hong	433d7d2c98	[HUDI-1058] Make delete marker configurable (#1819 )	2020-08-03 11:06:31 -04:00
n3nash	727f1df62c	[MINOR] Suppressing spark logs for hudi-integ and hudi-utilities (#1894 )	2020-07-31 19:01:25 -07:00
Nishith Agarwal	2fc2b01d86	[HUDI-394] Provide a basic implementation of test suite	2020-07-30 21:21:15 -07:00
Raymond Xu	ca36c44cb3	[HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common (#1873 )	2020-07-27 19:21:45 +08:00
Sivabalan Narayanan	5b6026ba43	[HUDI-802] Fixing deletes for inserts in same batch in write path (#1792 ) * Fixing deletes for inserts in same batch in write path * Fixing delta streamer tests * Adding tests for OverwriteWithLatestAvroPayload	2020-07-22 19:39:57 -07:00
lw0090	1ec89e9a94	[HUDI-839] Introducing support for rollbacks using marker files (#1756 ) * [HUDI-839] Introducing rollback strategy using marker files - Adds a new mechanism for rollbacks where it's based on the marker files generated during the write - Consequently, marker file/dir deletion now happens post commit, instead of during finalize - Marker files are also generated for AppendHandle, making it consistent throughout the write path - Until upgrade-downgrade mechanism can upgrade non-marker based inflight writes to marker based, this should only be turned on for new datasets. - Added marker dir deletion after successful commit/rollback, individual files are not deleted during finalize - Fail safe for deleting marker directories, now during timeline archival process - Added check to ensure completed instants are not rolled back using marker based strategy. This will be incorrect - Reworked tests to rollback inflight instants, instead of completed instants whenever necessary - Added an unit test for MarkerBasedRollbackStrategy Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-07-20 22:41:42 -07:00
Pratyaksh Sharma	c7f1a781ab	[HUDI-728]: Implemented custom key generator (#1433 )	2020-07-09 07:35:07 -04:00
Trevor	d58644b657	[HUDI-1062]Remove unnecessary maxEvent check and add some log in KafkaOffsetGen (#1779 )	2020-07-08 21:07:34 -07:00
Raymond Xu	3b9a30528b	[HUDI-996] Add functional test suite for hudi-utilities (#1746 ) - Share resources for functional tests - Add suite for functional test classes from hudi-utilities	2020-07-05 16:44:31 -07:00
Shen Hong	89e37d5273	[HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690 )	2020-06-22 08:13:28 -07:00
Raymond Xu	ab724af5c4	[MINOR] Rename `TestSourceConfig` to `SourceConfigs` (#1749 )	2020-06-19 12:08:19 -07:00
Litianye	ede6c9bda4	[HUDI-1006] Deltastreamer use kafkaSource with offset reset strategy:latest can't consume data (#1719 )	2020-06-14 18:01:44 +08:00
Raymond Xu	742c204099	[HUDI-811] Restructure test packages in hudi-client/cli (#1689 )	2020-06-02 10:25:42 +08:00
Raymond Xu	03f136361a	[HUDI-811] Restructure test packages in hudi-common (#1644 ) * [HUDI-811] Restructure test packages in hudi-common	2020-05-27 16:28:17 +08:00
Joey	2600d2de8d	[MINOR] Fix apache-rat violations (#1639 ) * MINOR Fix apache-rat violations. Also, enabling RAT for hudi-utilities and hudi-integ-test	2020-05-18 11:16:49 -07:00
Raymond Xu	2ada2ef50f	[HUDI-902] Avoid exception when getSchemaProvider (#1584 ) * When no new input data, don't throw exception for null SchemaProvider * Return the newly added NullSchemaProvider instead	2020-05-15 21:33:02 -07:00
Alexander Filipchik	f094f42857	[HUDI-843] Add ability to specify time unit for TimestampBasedKeyGenerator (#1541 ) Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-05-14 13:37:59 -07:00
hongdd	3a2fe13fcb	[HUDI-701] Add unit test for HDFSParquetImportCommand (#1574 )	2020-05-14 19:15:49 +08:00
Raymond Xu	0d4848b68b	[HUDI-811] Restructure test packages (#1607 ) * restructure hudi-spark tests * restructure hudi-timeline-service tests * restructure hudi-hadoop-mr hudi-utilities tests * restructure hudi-hive-sync tests	2020-05-13 15:37:03 -07:00
Raymond Xu	366bb10d8c	[HUDI-812] Migrate hudi common tests to JUnit 5 (#1590 ) * [HUDI-812] Migrate hudi-common tests to JUnit 5	2020-05-06 19:15:20 +08:00
Raymond Xu	096f7f55b2	[HUDI-813] Migrate hudi-utilities tests to JUnit 5 (#1589 )	2020-05-04 12:43:42 +08:00
Balaji Varadarajan	506447fd4f	[HUDI-850] Avoid unnecessary listings in incremental cleaning mode (#1576 )	2020-05-01 21:37:21 -07:00
vinoth chandar	c4b71622b9	[MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability (#1575 ) - reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc	2020-04-30 09:19:39 -07:00
Raymond Xu	06dae30297	[HUDI-810] Migrate ClientTestHarness to JUnit 5 (#1553 )	2020-04-28 23:38:16 +08:00
Raymond Xu	6e15eebd81	[HUDI-809] Migrate CommonTestHarness to JUnit 5 (#1530 )	2020-04-22 14:10:25 +08:00
hongdd	84dd9047d3	[HUDI-789]Adjust logic of upsert in HDFSParquetImporter (#1511 )	2020-04-21 14:21:30 +08:00
Raymond Xu	d65efe659d	[HUDI-780] Migrate test cases to Junit 5 (#1504 )	2020-04-15 12:35:01 -07:00
Gary Li	14d4fea833	[HUDI-759] Integrate checkpoint provider with delta streamer (#1486 )	2020-04-14 14:51:04 -07:00
Bhavani Sudha Saktheeswaran	8c7cef3e50	[HUDI - 738] Add validation to DeltaStreamer to fail fast when filterDupes is enabled on UPSERT mode. (#1505 ) Summary: This fix ensures for UPSERT operation, '--filter-dupes' is disabled and fails fast if not. Otherwise it would drop all updates silently and only take in new records.	2020-04-10 08:58:55 -07:00
Pratyaksh Sharma	d610252d6b	[HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment (#1150 ) * [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment	2020-04-07 16:10:26 -07:00
YanJia-Gary-Li	575d87cf7d	HUDI-644 kafka connect checkpoint provider (#1453 )	2020-04-03 18:57:34 -07:00
Ramachandran Madtas Subramaniam	639ec20412	[HUDI-562] Enable testing at debug log level This is to ensure that tests will execute all code paths, even the ones written under DEBUG log levels. This will improve coverage as well as ensure there are no surprised when DEBUG log level is enabled in production.	2020-04-02 11:14:35 -07:00
Raymond Xu	5b53b0d85e	[HUDI-731] Add ChainedTransformer (#1440 ) * [HUDI-731] Add ChainedTransformer	2020-04-01 23:21:31 +08:00

1 2 3

103 Commits