lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
satishkotha	7fa641ea9a	[HUDI-1302] Add support for timestamp field in HiveSync (#2129 )	2020-10-13 22:58:00 -07:00
lw0090	86db4da33c	[HUDI-1339] delete useless import in hudi-spark module (#2173 )	2020-10-11 17:10:52 -07:00
lw0090	585ce0094d	[HUDI-1301] use spark INCREMENTAL mode query hudi dataset support schema version. (#2125 )	2020-10-10 20:53:41 +08:00
Shen Hong	b335459c80	[HUDI-1208] Ordering Field should be optional when precombine is turned off (#2088 )	2020-10-04 11:34:21 -07:00
Mathieu	1f7add9291	[HUDI-1089] Refactor hudi-client to support multi-engine (#1827 ) - This change breaks `hudi-client` into `hudi-client-common` and `hudi-spark-client` modules - Simple usages of Spark using jsc.parallelize() has been redone using EngineContext#map, EngineContext#flatMap etc - Code changes in the PR, break classes into `BaseXYZ` parent classes with no spark dependencies living in `hudi-client-common` - Classes on `hudi-spark-client` are named `SparkXYZ` extending the parent classes with all the Spark dependencies - To simplify/cleanup, HoodieIndex#fetchRecordLocation has been removed and its usages in tests replaced with alternatives Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-01 14:25:29 -07:00
satishkotha	a99e93bed5	[HUDI-1072] Introduce REPLACE top level action. Implement insert_overwrite operation on top of replace action (#2048 )	2020-09-29 17:04:25 -07:00
liujinhui	a86f5574ed	[HUDI-1192] Make create hive database automatically configurable (#1968 )	2020-09-27 14:10:13 +08:00
Mathieu	1dd6635fbb	[MINOR] Fix ClassCastException when use QuickstartUtils generate data (#2105 )	2020-09-25 10:13:39 -07:00
hongdd	2eaba0962a	[HUDI-544] Archived commits command code cleanup (#1242 ) * Archived commits command code cleanup	2020-09-25 09:36:41 -07:00
Udit Mehrotra	bf65269f66	[HUDI-1230] Fix for preventing MOR datasource jobs from hanging via spark-submit (#2046 )	2020-09-17 20:03:35 -07:00
shenh062326	581d54097c	[HUDI-1143] Change timestamp field in HoodieTestDataGenerator from double to long	2020-09-15 20:58:29 -07:00
Balaji Varadarajan	5e61454a6c	[HUDI-802] AWSDmsTransformer does not handle insert and delete of a row in a single batch correctly (#2084 )	2020-09-11 16:11:42 -07:00
Abhishek Modi	53d1e55110	Test Suite should work with Docker + Unit Tests	2020-09-08 22:41:14 -07:00
wenningd	2fee087f0f	[HUDI-1181] Fix decimal type display issue for record key field (#1953 ) * [HUDI-1181] Fix decimal type display issue for record key field * Remove getNestedFieldVal method from DataSourceUtils * resolve comments Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-09-08 17:50:54 -07:00
Sreeram Ramji	6537af2676	[HUDI-1153] Spark DataSource and Streaming Write must fail when operation type is misconfigured (#2014 )	2020-09-04 09:08:30 -07:00
Thinking Chen	6b417d1a86	[HUDI-1225] Fix: Avro Date logical type not handled correctly when converting to Spark Row (#2047 )	2020-08-29 01:16:42 -07:00
Satish Kotha	f468c20c6c	[HUDI-1226] Fix ComplexKeyGenerator for non-partitioned tables	2020-08-25 20:55:48 -07:00
Mathieu	35b21855da	[HUDI-1150] Fix unable to parse input partition field :1 exception when using TimestampBasedKeyGenerator(#1920 )	2020-08-23 19:56:50 +08:00
Pratyaksh Sharma	a2312fa1b7	[HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator (#1987 ) Co-authored-by: Bhavani Sudha Saktheeswaran <bhavanisudhas@gmail.com>	2020-08-19 17:43:34 -07:00
Bhavani Sudha Saktheeswaran	824f23bcb8	[HUDI-1197] Fix import issue that fails scala 2.12 build (#1976 )	2020-08-18 08:41:16 -07:00
Balaji Varadarajan	b8f4a30efd	Fix Integration test flakiness in HoodieJavaStreamingApp (#1967 )	2020-08-14 01:42:15 -07:00
vinoth chandar	9bde6d616c	[HUDI-1190] Introduce @PublicAPIClass and @PublicAPIMethod annotations to mark public APIs (#1965 ) - Maturity levels one of : evolving, stable, deprecated - Took a pass and marked out most of the existing public API	2020-08-13 23:28:17 -07:00
Sivabalan Narayanan	379cf0786f	[HUDI-1013] Adding Bulk Insert V2 implementation (#1834 ) - Adding ability to use native spark row writing for bulk_insert - Controlled by `ENABLE_ROW_WRITER_OPT_KEY` datasource write option - Introduced KeyGeneratorInterface in hudi-client, moved KeyGenerator back to hudi-spark - Simplified the new API additions to just two new methods : getRecordKey(row), getPartitionPath(row) - Fixed all built-in key generators with new APIs - Made the field position map lazily created upon the first call to row based apis - Implemented native row based key generators for CustomKeyGenerator - Fixed all the tests, with these new APIs Co-authored-by: Balaji Varadarajan <varadarb@uber.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-13 00:33:39 -07:00
Udit Mehrotra	e4a2d98f79	[HUDI-426] Bootstrap datasource integration (#1702 )	2020-08-09 14:06:13 -07:00
Gary Li	4f74a84607	[HUDI-69] Support Spark Datasource for MOR table - RDD approach (#1848 ) - This PR implements Spark Datasource for MOR table in the RDD approach. - Implemented SnapshotRelation - Implemented HudiMergeOnReadRDD - Implemented separate Iterator to handle merge and unmerge record reader. - Added TestMORDataSource to verify this feature. - Clean up test file name, add tests for mixed query type tests - We can now revert the change made in DefaultSource Co-authored-by: Vinoth Chandar <vchandar@confluent.io>	2020-08-07 00:28:14 -07:00
lw0090	51ea27d665	[HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync, hudi-dla-sync (#1810 ) - Generalize the hive-sync module for syncing to multiple metastores - Added new options for datasource - Added new command line for delta streamer Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-05 21:34:55 -07:00
Balaji Varadarajan	7a2429f5ba	[HUDI-575] Spark Streaming with async compaction support (#1752 )	2020-08-05 07:50:15 -07:00
Sivabalan Narayanan	ab11ba43e1	[REVERT] "[HUDI-1058] Make delete marker configurable (#1819 )" (#1914 ) This reverts commit `433d7d2c98`.	2020-08-04 15:20:38 -07:00
vinoth chandar	539621bd33	[HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876 ) - [HUDI-418] Bootstrap Index Implementation using HFile with unit-test - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-421] Bootstrap Write Client with tests - [HUDI-425] Added HoodieDeltaStreamer support - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly - [HUDI-424] Simplify Record reader implementation - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables Co-authored-by: Mehrotra <uditme@amazon.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-08-03 20:19:21 -07:00
Shen Hong	433d7d2c98	[HUDI-1058] Make delete marker configurable (#1819 )	2020-08-03 11:06:31 -04:00
Y Ethan Guo	ccd70a7e48	[HUDI-472] Introduce configurations and new modes of sorting for bulk_insert (#1149 ) * [HUDI-472] Introduce the configuration and new modes of record sorting for bulk_insert(#1149). Three sorting modes are implemented: global sort ("global_sort"), local sort inside each RDD partition ("partition_sort") and no sort ("none")	2020-07-31 09:52:42 -04:00
Nishith Agarwal	2fc2b01d86	[HUDI-394] Provide a basic implementation of test suite	2020-07-30 21:21:15 -07:00
Udit Mehrotra	1aae437257	[HUDI-1102] Add common useful Spark related and Table path detection utilities (#1841 ) Co-authored-by: Mehrotra <uditme@amazon.com>	2020-07-18 16:16:32 -07:00
miaomiaomiao	10e457278b	[HUDI-1078]Fix IllegalArgumentException in Delete data demo of Quick-Start Guide (#1808 )	2020-07-13 11:38:06 -04:00
Pratyaksh Sharma	9627a385fe	[HUDI-916]: Added support for multiple input formats in TimestampBasedKeyGenerator (#1648 )	2020-07-10 15:28:45 -04:00
Pratyaksh Sharma	c7f1a781ab	[HUDI-728]: Implemented custom key generator (#1433 )	2020-07-09 07:35:07 -04:00
mabin001	8c4ff185f1	[HUDI-1064]Trim hoodie table name (#1805 )	2020-07-07 19:10:16 +08:00
Prashant Wason	2603cfb33e	[HUDI-684] Introduced abstraction for writing and reading different types of base file formats. (#1687 ) Notable changes: 1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format 2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock) 3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format 4. HiveSyncTool accepts the base file format as a CLI parameter 5. HoodieDeltaStreamer accepts the base file format as a CLI parameter 6. HoodieSparkSqlWriter accepts the base file format as a parameter	2020-06-25 23:46:55 -07:00
Shen Hong	89e37d5273	[HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690 )	2020-06-22 08:13:28 -07:00
Bhavani Sudha Saktheeswaran	9697fbf71e	[HUDI-936] Fix fetch ordering val in HoodieSparkSqlWriter to remove unnecessary conversion to String (#1659 )	2020-05-26 21:09:02 -07:00
rolandjohann	459356e292	[HUDI-863] get decimal properties from derived spark DataType (#1596 )	2020-05-18 04:28:27 -07:00
Mathieu	25a0080b2f	[HUDI-714]Add javadoc and comments to hudi write method link (#1409 ) * [HUDI-714] Add javadoc and comments to hudi write method link	2020-05-16 08:36:51 -04:00
Gary Li	a64afdfd17	HUDI-528 Handle empty commit in incremental pulling (#1612 )	2020-05-14 22:55:25 -07:00
cxzl25	32bada29dc	[HUDI-889] Writer supports useJdbc configuration when hive synchronization is enabled (#1627 )	2020-05-14 00:20:13 +08:00
Shen Hong	295d00beea	[HUDI-880] Replace part of spark context by hadoop configuration in HoodieTable. (#1614 )	2020-05-11 23:33:57 -07:00
AakashPradeep	5e0f5e5521	[HUDI-852] adding check for table name for Append Save mode (#1580 ) * adding check for table name for Append Save mode * adding existing table validation for delete and upsert operation Co-authored-by: Aakash Pradeep <apradeep@twilio.com>	2020-05-03 23:09:17 -07:00
Dongwook	ddd105bb31	[HUDI-772] Make UserDefinedBulkInsertPartitioner configurable for DataSource (#1500 )	2020-04-20 08:38:18 -07:00
Pratyaksh Sharma	d610252d6b	[HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment (#1150 ) * [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment	2020-04-07 16:10:26 -07:00
vinoth chandar	eaf6cc2d90	[HUDI-756] Organize Cleaning Action execution into a single package in hudi-client (#1485 ) - Introduced a thin abstraction ActionExecutor, that all actions will implement - Pulled cleaning code from table, writeclient into a single package - CleanHelper is now CleanPlanner, HoodieCleanClient is no longer around - Minor refactor of HoodieTable factory method - HoodieTable.create() methods with and without metaclient passed in - HoodieTable constructor now does not do a redundant instantiation - Fixed existing unit tests to work at the HoodieWriteClient level	2020-04-04 00:07:34 -07:00
wenningd	ce0a4c64d0	[HUDI-713] Fix conversion of Spark array of struct type to Avro schema (#1406 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-03-30 15:52:15 -07:00

1 2 3

103 Commits