lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
dugenkui	ae68b2b355	[MINOR] fix typos (#2116 )	2020-09-26 20:40:33 +08:00
Udit Mehrotra	bf65269f66	[HUDI-1230] Fix for preventing MOR datasource jobs from hanging via spark-submit (#2046 )	2020-09-17 20:03:35 -07:00
shenh062326	581d54097c	[HUDI-1143] Change timestamp field in HoodieTestDataGenerator from double to long	2020-09-15 20:58:29 -07:00
Balaji Varadarajan	5e61454a6c	[HUDI-802] AWSDmsTransformer does not handle insert and delete of a row in a single batch correctly (#2084 )	2020-09-11 16:11:42 -07:00
wenningd	2fee087f0f	[HUDI-1181] Fix decimal type display issue for record key field (#1953 ) * [HUDI-1181] Fix decimal type display issue for record key field * Remove getNestedFieldVal method from DataSourceUtils * resolve comments Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-09-08 17:50:54 -07:00
Sreeram Ramji	6537af2676	[HUDI-1153] Spark DataSource and Streaming Write must fail when operation type is misconfigured (#2014 )	2020-09-04 09:08:30 -07:00
Dongwook	8d19ebfd0f	[HUDI-993] Let delete API use "hoodie.delete.shuffle.parallelism" (#1703 ) For Delete API, "hoodie.delete.shuffle.parallelism" isn't used as opposed to "hoodie.upsert.shuffle.parallelism" is used for upsert, this creates the performance difference between delete by upsert API with "EmptyHoodieRecordPayload" and delete API for certain cases. This patch makes the following fixes in this regard. - Let deduplicateKeys method use "hoodie.delete.shuffle.parallelism" - Repartition inputRDD as "hoodie.delete.shuffle.parallelism" in case "hoodie.combine.before.delete=false"	2020-09-01 12:55:31 -04:00
Mathieu	6df8f88d86	[HUDI-1252] Remove unused class NoOpBulkInsertPartitioner in DataSourceTestUtils (#2054 )	2020-08-31 03:03:10 -07:00
Thinking Chen	6b417d1a86	[HUDI-1225] Fix: Avro Date logical type not handled correctly when converting to Spark Row (#2047 )	2020-08-29 01:16:42 -07:00
Satish Kotha	f468c20c6c	[HUDI-1226] Fix ComplexKeyGenerator for non-partitioned tables	2020-08-25 20:55:48 -07:00
Mathieu	35b21855da	[HUDI-1150] Fix unable to parse input partition field :1 exception when using TimestampBasedKeyGenerator(#1920 )	2020-08-23 19:56:50 +08:00
Balaji Varadarajan	b8f4a30efd	Fix Integration test flakiness in HoodieJavaStreamingApp (#1967 )	2020-08-14 01:42:15 -07:00
Sivabalan Narayanan	379cf0786f	[HUDI-1013] Adding Bulk Insert V2 implementation (#1834 ) - Adding ability to use native spark row writing for bulk_insert - Controlled by `ENABLE_ROW_WRITER_OPT_KEY` datasource write option - Introduced KeyGeneratorInterface in hudi-client, moved KeyGenerator back to hudi-spark - Simplified the new API additions to just two new methods : getRecordKey(row), getPartitionPath(row) - Fixed all built-in key generators with new APIs - Made the field position map lazily created upon the first call to row based apis - Implemented native row based key generators for CustomKeyGenerator - Fixed all the tests, with these new APIs Co-authored-by: Balaji Varadarajan <varadarb@uber.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-13 00:33:39 -07:00
wenningd	8b928e9bca	[HUDI-808] Support cleaning bootstrap source data (#1870 ) Co-authored-by: Wenning Ding <wenningd@amazon.com> Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>	2020-08-11 01:43:46 -07:00
Udit Mehrotra	e4a2d98f79	[HUDI-426] Bootstrap datasource integration (#1702 )	2020-08-09 14:06:13 -07:00
wenningd	9fe2d2b14a	[HUDI-427] [HUDI-971] Implement CLI support for performing bootstrap (#1869 ) * [HUDI-971] Clean partitions & fileIds returned by HFileBootstrapIndex * [HUDI-427] Implement CLI support for performing bootstrap Co-authored-by: Wenning Ding <wenningd@amazon.com> Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>	2020-08-08 12:37:29 -07:00
Gary Li	4f74a84607	[HUDI-69] Support Spark Datasource for MOR table - RDD approach (#1848 ) - This PR implements Spark Datasource for MOR table in the RDD approach. - Implemented SnapshotRelation - Implemented HudiMergeOnReadRDD - Implemented separate Iterator to handle merge and unmerge record reader. - Added TestMORDataSource to verify this feature. - Clean up test file name, add tests for mixed query type tests - We can now revert the change made in DefaultSource Co-authored-by: Vinoth Chandar <vchandar@confluent.io>	2020-08-07 00:28:14 -07:00
Udit Mehrotra	ab453f2623	[HUDI-999] [RFC-12] Parallelize fetching of source data files/partitions (#1924 )	2020-08-06 23:44:57 -07:00
Balaji Varadarajan	7a2429f5ba	[HUDI-575] Spark Streaming with async compaction support (#1752 )	2020-08-05 07:50:15 -07:00
Sivabalan Narayanan	ab11ba43e1	[REVERT] "[HUDI-1058] Make delete marker configurable (#1819 )" (#1914 ) This reverts commit `433d7d2c98`.	2020-08-04 15:20:38 -07:00
vinoth chandar	539621bd33	[HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876 ) - [HUDI-418] Bootstrap Index Implementation using HFile with unit-test - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-421] Bootstrap Write Client with tests - [HUDI-425] Added HoodieDeltaStreamer support - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly - [HUDI-424] Simplify Record reader implementation - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables Co-authored-by: Mehrotra <uditme@amazon.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-08-03 20:19:21 -07:00
Shen Hong	433d7d2c98	[HUDI-1058] Make delete marker configurable (#1819 )	2020-08-03 11:06:31 -04:00
Raymond Xu	10e4268792	[HUDI-995] Use Transformations, Assertions and SchemaTestUtil (#1884 ) - Consolidate transform functions for tests in Transformations.java - Consolidate assertion functions for tests in Assertions.java - Make use of SchemaTestUtil for loading schema from resource	2020-08-01 20:57:18 +08:00
Y Ethan Guo	ccd70a7e48	[HUDI-472] Introduce configurations and new modes of sorting for bulk_insert (#1149 ) * [HUDI-472] Introduce the configuration and new modes of record sorting for bulk_insert(#1149). Three sorting modes are implemented: global sort ("global_sort"), local sort inside each RDD partition ("partition_sort") and no sort ("none")	2020-07-31 09:52:42 -04:00
Nishith Agarwal	2fc2b01d86	[HUDI-394] Provide a basic implementation of test suite	2020-07-30 21:21:15 -07:00
Udit Mehrotra	5e7931b1f9	[MINOR] Fix master compilation failure (#1881 ) Co-authored-by: Udit Mehrotra <uditme@amazon.com>	2020-07-27 23:02:58 -07:00
hongdd	fa419213f6	[HUDI-703] Add test for HoodieSyncCommand (#1774 )	2020-07-28 08:31:43 +08:00
Raymond Xu	ca36c44cb3	[HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common (#1873 )	2020-07-27 19:21:45 +08:00
Udit Mehrotra	1aae437257	[HUDI-1102] Add common useful Spark related and Table path detection utilities (#1841 ) Co-authored-by: Mehrotra <uditme@amazon.com>	2020-07-18 16:16:32 -07:00
Pratyaksh Sharma	9627a385fe	[HUDI-916]: Added support for multiple input formats in TimestampBasedKeyGenerator (#1648 )	2020-07-10 15:28:45 -04:00
Pratyaksh Sharma	c7f1a781ab	[HUDI-728]: Implemented custom key generator (#1433 )	2020-07-09 07:35:07 -04:00
sathyaprakashg	df2e0c760e	HUDI-942 Increase default value number of delta commits for inline compaction (#1664 ) Co-authored-by: Sathyaprakash Govindasamy <sathyaprakashg@zillowgroup.com>	2020-06-10 16:16:44 -07:00
Raymond Xu	742c204099	[HUDI-811] Restructure test packages in hudi-client/cli (#1689 )	2020-06-02 10:25:42 +08:00
Raymond Xu	03f136361a	[HUDI-811] Restructure test packages in hudi-common (#1644 ) * [HUDI-811] Restructure test packages in hudi-common	2020-05-27 16:28:17 +08:00
Gary Li	a64afdfd17	HUDI-528 Handle empty commit in incremental pulling (#1612 )	2020-05-14 22:55:25 -07:00
Raymond Xu	0d4848b68b	[HUDI-811] Restructure test packages (#1607 ) * restructure hudi-spark tests * restructure hudi-timeline-service tests * restructure hudi-hadoop-mr hudi-utilities tests * restructure hudi-hive-sync tests	2020-05-13 15:37:03 -07:00
AakashPradeep	5e0f5e5521	[HUDI-852] adding check for table name for Append Save mode (#1580 ) * adding check for table name for Append Save mode * adding existing table validation for delete and upsert operation Co-authored-by: Aakash Pradeep <apradeep@twilio.com>	2020-05-03 23:09:17 -07:00
Dongwook	ddd105bb31	[HUDI-772] Make UserDefinedBulkInsertPartitioner configurable for DataSource (#1500 )	2020-04-20 08:38:18 -07:00
Raymond Xu	d65efe659d	[HUDI-780] Migrate test cases to Junit 5 (#1504 )	2020-04-15 12:35:01 -07:00
Ramachandran Madtas Subramaniam	639ec20412	[HUDI-562] Enable testing at debug log level This is to ensure that tests will execute all code paths, even the ones written under DEBUG log levels. This will improve coverage as well as ensure there are no surprised when DEBUG log level is enabled in production.	2020-04-02 11:14:35 -07:00
Suneel Marthi	fa36082554	[HUDI-746] Reduce build warnings < 10 (#1465 )	2020-03-30 11:46:52 +08:00
vinoth chandar	e057c27603	[HUDI-744] Restructure hudi-common and clean up files under util packages (#1462 ) - Brings more order and cohesion to the classes in hudi-common - Utils classes related to a particular concept (avro, timeline,...) are placed near to the package - common.fs package now contains all the filesystem level classes including wrapper filesystem - bloom.filter package renamed to just bloom - config package contains classes that help store properties - common.fs.inline package contains all the inline filesystem classes/impl - common.table.timeline now consolidates all timeline related classes - common.table.view consolidates all the classes related to filesystem view metadata - common.table.timeline.versioning contains all classes related to versioning of timeline - Fix few unit tests as a result - Moved the test packages around to match the source file move - Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos	2020-03-29 10:58:49 -07:00
Udit Mehrotra	2d04014581	[HUDI-607] Fix to allow creation/syncing of Hive tables partitioned by Date type columns (#1330 )	2020-03-01 10:42:58 -08:00
YanJia-Gary-Li	4e7fcde4a6	[HUDI-597] Enable incremental pulling from defined partitions (#1348 )	2020-02-24 11:46:30 -08:00
Suneel Marthi	5b7bb142dc	[HUDI-583] Code Cleanup, remove redundant code, and other changes (#1237 )	2020-02-02 18:03:44 +08:00
leesf	652224edc8	[HUDI-578] Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator (#1281 ) * [HUDI-578] Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator * add tests	2020-01-29 16:26:26 -08:00
vinoth chandar	c2c0f6b13d	[HUDI-509] Renaming code in sync with cWiki restructuring (#1212 ) - Storage Type replaced with Table Type (remaining instances) - View types replaced with query types; - ReadOptimized view referred as Snapshot Query - TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views - HoodieDataFile renamed to HoodieBaseFile - Hive Sync tool will register RO tables for MOR with a `_ro` suffix - Datasource/Deltastreamer options renamed accordingly - Support fallback to old config values as well, so migration is painless - Config for controlling _ro suffix addition - Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView	2020-01-16 23:58:47 -08:00
Scheller	1daba24065	Add GlobalDeleteKeyGenerator Adds new GlobalDeleteKeyGenerator for record_key deletes with global indices. Also refactors key generators into their own package.	2020-01-15 17:01:29 -08:00
Mehrotra	2bb0c21a3d	Fix conversion of Spark struct type to Avro schema cr https://code.amazon.com/reviews/CR-17184364	2020-01-14 00:27:56 -08:00
Udit Mehrotra	ad50008a59	[HUDI-91][HUDI-12]Migrate to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add support for Decimal/Date types - Upgrade Spark to 2.4.4, Parquet to 1.10.1, Avro to 1.8.2 - Remove spark-avro from hudi-spark-bundle. Users need to provide --packages org.apache.spark:spark-avro:2.4.4 when running spark-shell or spark-submit - Replace com.databricks:spark-avro with org.apache.spark:spark-avro - Shade avro in hudi-hadoop-mr-bundle to make sure it does not conflict with hive's avro version.	2020-01-12 15:03:11 -08:00

1 2

66 Commits