lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Dongwook	8d19ebfd0f	[HUDI-993] Let delete API use "hoodie.delete.shuffle.parallelism" (#1703 ) For Delete API, "hoodie.delete.shuffle.parallelism" isn't used as opposed to "hoodie.upsert.shuffle.parallelism" is used for upsert, this creates the performance difference between delete by upsert API with "EmptyHoodieRecordPayload" and delete API for certain cases. This patch makes the following fixes in this regard. - Let deduplicateKeys method use "hoodie.delete.shuffle.parallelism" - Repartition inputRDD as "hoodie.delete.shuffle.parallelism" in case "hoodie.combine.before.delete=false"	2020-09-01 12:55:31 -04:00
Satish Kotha	ea983ff912	[HUDI-1137] Add option to configure different path selector	2020-08-24 13:26:44 -07:00
Mathieu	b883b6d268	[HUDI-1122] Introduce a kafka implementation of hoodie write commit ca… (#1886 )	2020-08-20 23:00:59 +08:00
Bhavani Sudha Saktheeswaran	4226d75144	Moving to 0.6.1-SNAPSHOT on master branch.	2020-08-14 12:54:15 -07:00
vinoth chandar	9bde6d616c	[HUDI-1190] Introduce @PublicAPIClass and @PublicAPIMethod annotations to mark public APIs (#1965 ) - Maturity levels one of : evolving, stable, deprecated - Took a pass and marked out most of the existing public API	2020-08-13 23:28:17 -07:00
Udit Mehrotra	e4a2d98f79	[HUDI-426] Bootstrap datasource integration (#1702 )	2020-08-09 14:06:13 -07:00
wenningd	9fe2d2b14a	[HUDI-427] [HUDI-971] Implement CLI support for performing bootstrap (#1869 ) * [HUDI-971] Clean partitions & fileIds returned by HFileBootstrapIndex * [HUDI-427] Implement CLI support for performing bootstrap Co-authored-by: Wenning Ding <wenningd@amazon.com> Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>	2020-08-08 12:37:29 -07:00
Mathieu	b51646dcc7	[HUDI-1151] Fix NPE when no new data in kafka using HoodieDeltaStreamer (#1921 )	2020-08-07 00:03:20 +08:00
lw0090	51ea27d665	[HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync, hudi-dla-sync (#1810 ) - Generalize the hive-sync module for syncing to multiple metastores - Added new options for datasource - Added new command line for delta streamer Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-05 21:34:55 -07:00
Balaji Varadarajan	7a2429f5ba	[HUDI-575] Spark Streaming with async compaction support (#1752 )	2020-08-05 07:50:15 -07:00
Sreeram Ramji	217a84192c	[HUDI-1140] Fix Jcommander issue for --hoodie-conf in DeltaStreamer (#1898 )	2020-08-04 21:42:51 -07:00
Sivabalan Narayanan	ab11ba43e1	[REVERT] "[HUDI-1058] Make delete marker configurable (#1819 )" (#1914 ) This reverts commit `433d7d2c98`.	2020-08-04 15:20:38 -07:00
vinoth chandar	539621bd33	[HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876 ) - [HUDI-418] Bootstrap Index Implementation using HFile with unit-test - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-421] Bootstrap Write Client with tests - [HUDI-425] Added HoodieDeltaStreamer support - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly - [HUDI-424] Simplify Record reader implementation - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables Co-authored-by: Mehrotra <uditme@amazon.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-08-03 20:19:21 -07:00
Shen Hong	433d7d2c98	[HUDI-1058] Make delete marker configurable (#1819 )	2020-08-03 11:06:31 -04:00
n3nash	727f1df62c	[MINOR] Suppressing spark logs for hudi-integ and hudi-utilities (#1894 )	2020-07-31 19:01:25 -07:00
Nishith Agarwal	2fc2b01d86	[HUDI-394] Provide a basic implementation of test suite	2020-07-30 21:21:15 -07:00
Raymond Xu	ca36c44cb3	[HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common (#1873 )	2020-07-27 19:21:45 +08:00
Sivabalan Narayanan	5b6026ba43	[HUDI-802] Fixing deletes for inserts in same batch in write path (#1792 ) * Fixing deletes for inserts in same batch in write path * Fixing delta streamer tests * Adding tests for OverwriteWithLatestAvroPayload	2020-07-22 19:39:57 -07:00
lw0090	1ec89e9a94	[HUDI-839] Introducing support for rollbacks using marker files (#1756 ) * [HUDI-839] Introducing rollback strategy using marker files - Adds a new mechanism for rollbacks where it's based on the marker files generated during the write - Consequently, marker file/dir deletion now happens post commit, instead of during finalize - Marker files are also generated for AppendHandle, making it consistent throughout the write path - Until upgrade-downgrade mechanism can upgrade non-marker based inflight writes to marker based, this should only be turned on for new datasets. - Added marker dir deletion after successful commit/rollback, individual files are not deleted during finalize - Fail safe for deleting marker directories, now during timeline archival process - Added check to ensure completed instants are not rolled back using marker based strategy. This will be incorrect - Reworked tests to rollback inflight instants, instead of completed instants whenever necessary - Added an unit test for MarkerBasedRollbackStrategy Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-07-20 22:41:42 -07:00
Prashant Wason	b71f25f210	[HUDI-92] Provide reasonable names for Spark DAG stages in HUDI. (#1289 )	2020-07-19 10:29:25 -07:00
GuoPhilipse	abfebd30f3	[MINOR] Update parameter description (#1821 )	2020-07-11 22:57:12 +08:00
Pratyaksh Sharma	9627a385fe	[HUDI-916]: Added support for multiple input formats in TimestampBasedKeyGenerator (#1648 )	2020-07-10 15:28:45 -04:00
Pratyaksh Sharma	c7f1a781ab	[HUDI-728]: Implemented custom key generator (#1433 )	2020-07-09 07:35:07 -04:00
Trevor	d58644b657	[HUDI-1062]Remove unnecessary maxEvent check and add some log in KafkaOffsetGen (#1779 )	2020-07-08 21:07:34 -07:00
Raymond Xu	3b9a30528b	[HUDI-996] Add functional test suite for hudi-utilities (#1746 ) - Share resources for functional tests - Add suite for functional test classes from hudi-utilities	2020-07-05 16:44:31 -07:00
Balaji Varadarajan	8919be6a5d	[HUDI-855] Run Cleaner async with writing (#1577 ) - Cleaner can now run concurrently with write operation - Configs to turn on/off Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-06-28 02:04:50 -07:00
Prashant Wason	2603cfb33e	[HUDI-684] Introduced abstraction for writing and reading different types of base file formats. (#1687 ) Notable changes: 1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format 2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock) 3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format 4. HiveSyncTool accepts the base file format as a CLI parameter 5. HoodieDeltaStreamer accepts the base file format as a CLI parameter 6. HoodieSparkSqlWriter accepts the base file format as a parameter	2020-06-25 23:46:55 -07:00
Shen Hong	89e37d5273	[HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690 )	2020-06-22 08:13:28 -07:00
Raymond Xu	8a9fdd603e	[HUDI-1023] Add validation error messages in delta sync (#1710 ) - Remove explicitly specifying BLOOM_INDEX since thats the default anyway	2020-06-19 12:12:35 -07:00
Raymond Xu	ab724af5c4	[MINOR] Rename `TestSourceConfig` to `SourceConfigs` (#1749 )	2020-06-19 12:08:19 -07:00
Litianye	ede6c9bda4	[HUDI-1006] Deltastreamer use kafkaSource with offset reset strategy:latest can't consume data (#1719 )	2020-06-14 18:01:44 +08:00
liujinhui	97ab97b726	[HUDI-918] Fix kafkaOffsetGen can not read kafka data bug (#1652 )	2020-06-08 20:46:47 +08:00
Raymond Xu	742c204099	[HUDI-811] Restructure test packages in hudi-client/cli (#1689 )	2020-06-02 10:25:42 +08:00
Raymond Xu	03f136361a	[HUDI-811] Restructure test packages in hudi-common (#1644 ) * [HUDI-811] Restructure test packages in hudi-common	2020-05-27 16:28:17 +08:00
Raymond Xu	6c450957ce	[HUDI-690] Filter out inflight compaction in exporter (#1667 )	2020-05-26 09:23:34 -07:00
Balaji Varadarajan	74ecc27e92	[HUDI-846][HUDI-848] Enable Incremental cleaning and embedded timeline-server by default (#1634 )	2020-05-20 05:29:43 -07:00
rolandjohann	244d47494e	[HUDI-888] fix NullPointerException in HoodieCompactor (#1622 )	2020-05-20 04:22:35 -07:00
wenningd	0dc2fa6172	[MINOR] Fix HoodieCompactor config abbreviation (#1642 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-05-19 21:03:54 -07:00
Joey	2600d2de8d	[MINOR] Fix apache-rat violations (#1639 ) * MINOR Fix apache-rat violations. Also, enabling RAT for hudi-utilities and hudi-integ-test	2020-05-18 11:16:49 -07:00
Mathieu	25a0080b2f	[HUDI-714]Add javadoc and comments to hudi write method link (#1409 ) * [HUDI-714] Add javadoc and comments to hudi write method link	2020-05-16 08:36:51 -04:00
Raymond Xu	2ada2ef50f	[HUDI-902] Avoid exception when getSchemaProvider (#1584 ) * When no new input data, don't throw exception for null SchemaProvider * Return the newly added NullSchemaProvider instead	2020-05-15 21:33:02 -07:00
Alexander Filipchik	25e0b75b3d	[HUDI-723] Register avro schema if infered from SQL transformation (#1518 ) * Register avro schema if infered from SQL transformation * Make HoodieWriteClient creation done lazily always. Handle setting schema-provider and avro-schemas correctly when using SQL transformer Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-05-15 12:44:03 -07:00
Alexander Filipchik	f094f42857	[HUDI-843] Add ability to specify time unit for TimestampBasedKeyGenerator (#1541 ) Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-05-14 13:37:59 -07:00
hongdd	3a2fe13fcb	[HUDI-701] Add unit test for HDFSParquetImportCommand (#1574 )	2020-05-14 19:15:49 +08:00
Raymond Xu	0d4848b68b	[HUDI-811] Restructure test packages (#1607 ) * restructure hudi-spark tests * restructure hudi-timeline-service tests * restructure hudi-hadoop-mr hudi-utilities tests * restructure hudi-hive-sync tests	2020-05-13 15:37:03 -07:00
liujinhui	5d37e66b7e	[MINOR] Fix HoodieNotSupportedException description in KafkaOffsetGen (#1615 )	2020-05-11 23:14:43 +08:00
Raymond Xu	366bb10d8c	[HUDI-812] Migrate hudi common tests to JUnit 5 (#1590 ) * [HUDI-812] Migrate hudi-common tests to JUnit 5	2020-05-06 19:15:20 +08:00
Raymond Xu	096f7f55b2	[HUDI-813] Migrate hudi-utilities tests to JUnit 5 (#1589 )	2020-05-04 12:43:42 +08:00
Balaji Varadarajan	506447fd4f	[HUDI-850] Avoid unnecessary listings in incremental cleaning mode (#1576 )	2020-05-01 21:37:21 -07:00
vinoth chandar	c4b71622b9	[MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability (#1575 ) - reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc	2020-04-30 09:19:39 -07:00

1 2 3 4

172 Commits