lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
vinoth chandar	5ca0625b27	[HUDI 1308] Harden RFC-15 Implementation based on production testing (#2441 ) Addresses leaks, perf degradation observed during testing. These were regressions from the original rfc-15 PoC implementation. * Pass a single instance of HoodieTableMetadata everywhere * Fix tests and add config for enabling metrics - Removed special casing of assumeDatePartitioning inside FSUtils#getAllPartitionPaths() - Consequently, IOException is never thrown and many files had to be adjusted - More diligent handling of open file handles in metadata table - Added config for controlling reuse of connections - Added config for turning off fallback to listing, so we can see tests fail - Changed all ipf listing code to cache/amortize the open/close for better performance - Timelineserver also reuses connections, for better performance - Without timelineserver, when metadata table is opened from executors, reuse is not allowed - HoodieMetadataConfig passed into HoodieTableMetadata#create as argument. - Fix TestHoodieBackedTableMetadata#testSync	2021-01-19 21:20:28 -08:00
lw0090	de42adc230	[HUDI-1520] add configure for spark sql overwrite use INSERT_OVERWRITE_TABLE (#2428 )	2021-01-11 09:07:47 -08:00
Udit Mehrotra	7ce3ac778e	[HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths (#2417 ) * [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partition paths * Adding testClass for FileSystemBackedTableMetadata Co-authored-by: Nishith Agarwal <nagarwal@uber.com>	2021-01-10 21:19:52 -08:00
lw0090	368c1a8f5c	[HUDI-1399] support a independent clustering spark job to asynchronously clustering (#2379 ) * [HUDI-1481] add structured streaming and delta streamer clustering unit test * [HUDI-1399] support a independent clustering spark job to asynchronously clustering * [HUDI-1399] support a independent clustering spark job to asynchronously clustering * [HUDI-1498] Read clustering plan from requested file for inflight instant (#2389) * [HUDI-1399] support a independent clustering spark job with schedule generate instant time Co-authored-by: satishkotha <satishkotha@uber.com>	2021-01-09 17:30:16 -08:00
Gary Li	79ec7b4894	[HUDI-920] Support Incremental query for MOR table (#1938 )	2021-01-09 08:02:08 -08:00
Ryan Pifer	4b94529aaf	[HUDI-1325] [RFC-15] Merge updates of unsynced instants to metadata table (apache#2342) [RFC-15] Fix partition key in metadata table when bootstrapping from file system (apache#2387) Co-authored-by: Ryan Pifer <ryanpife@amazon.com>	2021-01-04 07:59:47 -08:00
Udit Mehrotra	4e64226844	[HUDI-1450] Use metadata table for listing in HoodieROTablePathFilter (apache#2326) [HUDI-1394] [RFC-15] Use metadata table (if present) to get all partition paths (apache#2351)	2021-01-04 07:59:47 -08:00
lw0090	9e6889a8ce	[HUDI-1481] add structured streaming and delta streamer clustering unit test (#2360 )	2020-12-27 20:27:09 -08:00
lw0090	e807bb895e	[HUDI-1487] fix unit test testCopyOnWriteStorage random failed (#2364 )	2020-12-25 09:54:23 -08:00
wenningd	89f482eaf2	[HUDI-1489] Fix null pointer exception when reading updated written bootstrap table (#2370 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-12-23 11:26:24 -08:00
Sivabalan Narayanan	33d338f392	[HUDI-115] Adding DefaultHoodieRecordPayload to honor ordering with combineAndGetUpdateValue (#2311 ) * Added ability to pass in `properties` to payload methods, so they can perform table/record specific merges * Added default methods so existing payload classes are backwards compatible. * Adding DefaultHoodiePayload to honor ordering while merging two records * Fixing default payload based on feedback	2020-12-19 19:19:42 -08:00
wenningd	26cdc457f6	[HUDI-1376] Drop Hudi metadata cols at the beginning of Spark datasource writing (#2233 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-12-15 16:20:48 -08:00
wenningd	fce1453fa6	[HUDI-1040] Make Hudi support Spark 3 (#2208 ) * Fix flaky MOR unit test * Update Spark APIs to make it be compatible with both spark2 & spark3 * Refactor bulk insert v2 part to make Hudi be able to compile with Spark3 * Add spark3 profile to handle fasterxml & spark version * Create hudi-spark-common module & refactor hudi-spark related modules Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-12-09 15:52:23 -08:00

13 Commits