lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
lw0090	8b5d6f9430	[HUDI-1437] support more accurate spark JobGroup for better performance tracking (#2322 )	2020-12-17 15:20:13 -08:00
Bhavani Sudha Saktheeswaran	14d5d1100c	[HUDI-1406] Add date partition based source input selector for Delta streamer (#2264 ) - Adds ability to list only recent date based partitions from source data. - Parallelizes listing for faster tailing of DFSSources	2020-12-17 03:59:30 -08:00
wangxianghu	4ddfc61d70	[MINOR] Make QuickstartUtil generate random timestamp instead of 0 (#2340 )	2020-12-17 18:00:23 +08:00
ChangLi	6a6b772c49	[MINOR] Fix error information in exception (#2341 )	2020-12-16 19:37:01 +08:00
wenningd	26cdc457f6	[HUDI-1376] Drop Hudi metadata cols at the beginning of Spark datasource writing (#2233 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-12-15 16:20:48 -08:00
Danny Chan	93d9c25aee	[MINOR] Improve code readability by passing in the fileComparisonsRDD in bloom index (#2319 )	2020-12-14 22:35:24 -08:00
Balaji Varadarajan	069a1dcf24	[HUDI-1435] Fix bug in Marker File Reconciliation for Non-Partitioned datasets (#2301 )	2020-12-14 22:24:12 -08:00
lw0090	facde4c16f	[HUDI-1448] Hudi dla sync support skip rt table syncing (#2324 )	2020-12-14 23:25:10 +08:00
steven zhang	11bc1fe6f4	[HUDI-1428] Clean old fileslice is invalid (#2292 ) Co-authored-by: zhang wen <wen.zhang@dmall.com> Co-authored-by: zhang wen <steven@stevendeMac-mini.local>	2020-12-13 06:28:53 -08:00
Shen Hong	236d1b0dec	[HUDI-1439] Remove scala dependency from hudi-client-common (#2306 )	2020-12-11 00:36:37 -08:00
wangxianghu	6cf25d5c8a	[MINOR] Minor improve in IncrementalRelation (#2314 )	2020-12-10 20:16:00 +08:00
Danny Chan	4bc45a391a	[HUDI-1445] Refactor AbstractHoodieLogRecordScanner to use Builder (#2313 )	2020-12-10 20:02:02 +08:00
Raymond Xu	bd9cceccb5	[HUDI-1395] Fix partition path using FSUtils (#2312 ) Fixed the logic to get partition path in Copier and Exporter utilities.	2020-12-10 10:19:19 +08:00
wangxianghu	007014c1ef	[MINOR] Throw an exception when keyGenerator initialization failed (#2307 )	2020-12-10 09:56:19 +08:00
wenningd	fce1453fa6	[HUDI-1040] Make Hudi support Spark 3 (#2208 ) * Fix flaky MOR unit test * Update Spark APIs to make it be compatible with both spark2 & spark3 * Refactor bulk insert v2 part to make Hudi be able to compile with Spark3 * Add spark3 profile to handle fasterxml & spark version * Create hudi-spark-common module & refactor hudi-spark related modules Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-12-09 15:52:23 -08:00
jshmchenxi	3a91d26d62	fix typo (#2308 ) Co-authored-by: Xi Chen <chenxi07@qiyi.com>	2020-12-08 06:28:20 -08:00
wangxianghu	de2fbeac33	[HUDI-1412] Make HoodieWriteConfig support setting different default … (#2278 ) * [HUDI-1412] Make HoodieWriteConfig support setting different default value according to engine type	2020-12-07 09:29:53 +08:00
pengzhiwei	319b7a58e4	[HUDI-1427] Fix FileAlreadyExistsException when set HOODIE_AUTO_COMMIT_PROP to true (#2295 )	2020-12-05 08:07:25 +08:00
liujinhui	62b392b49c	[HUDI-1343] Add standard schema postprocessor which would rewrite the schema using spark-avro conversion (#2192 ) Co-authored-by: liujh <liujh@t3go.cn>	2020-12-03 19:28:34 -08:00
lw0090	1f0d5c077e	[HUDI-1349] spark sql support overwrite use insert_overwrite_table (#2196 )	2020-12-03 12:26:21 -08:00
rmpifer	78fd122594	[HUDI-1196] Update HoodieKey when deduplicating records with global index (#2248 ) - Works only for overwrite payload (default) - Does not alter current semantics otherwise Co-authored-by: Ryan Pifer <ryanpife@amazon.com>	2020-12-01 13:50:46 -08:00
Prashant Wason	ac23d2587f	[HUDI-1357] Added a check to validate records are not lost during merges. (#2216 ) - Turned off by default	2020-12-01 13:44:57 -08:00
Guy Khazma	b826c53e33	[HUDI-1373] Add Support for OpenJ9 JVM (#2231 ) * add supoort for OpenJ9 VM * add 32bit openJ9 * Pulled the memory layout specs into their own classes.	2020-12-01 13:19:40 -08:00
pengzhiwei	36ce5bcd92	[HUDI-1424] Write Type changed to BULK_INSERT when set ENABLE_ROW_WRITER_OPT_KEY=true (#2289 )	2020-11-30 23:07:21 +08:00
leesf	3d5e9fee7f	[MINOR] refactor code in HoodieMergeHandle (#2272 )	2020-11-28 21:47:05 +08:00
steven zhang	56866a11fe	[HUDI-1392] lose partition info when using spark parameter basePath (#2243 ) Co-authored-by: zhang wen <wen.zhang@dmall.com>	2020-11-25 11:55:33 +08:00
Balaji Varadarajan	0ebef1c0a0	[HUDI-1358] Fix leaks in DiskBasedMap and LazyFileIterable (#2249 )	2020-11-23 10:56:26 -08:00
wenningd	751e4ee882	[HUDI-1396] Fix for preventing bootstrap datasource jobs from hanging via spark-submit (#2253 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-11-23 10:43:24 -08:00
Shen Hong	d9411c38db	[HUDI-1364] Add HoodieJavaEngineContext to hudi-java-client (#2222 )	2020-11-23 10:06:28 -08:00
hongdd	971f028aaf	[HUDI-1393] Add compaction action in archive command (#2246 )	2020-11-23 16:53:01 +08:00
wangxianghu	537502a8ef	[MINOR] Add apacheflink label (#2268 )	2020-11-22 10:41:11 +08:00
Gary Li	c8d5ea2752	[MINOR] clean up and add comments to flink client (#2261 )	2020-11-19 15:27:52 +08:00
pengzhiwei	d7af8caa45	[HUDI-1384] Decoupling hive jdbc dependency when HIVE_USE_JDBC_OPT_KEY set false (#2241 )	2020-11-19 13:44:03 +08:00
wangxianghu	a23230c8c2	[HUDI-1400] Replace Operation enum with WriteOperationType (#2259 )	2020-11-19 13:40:04 +08:00
wangxianghu	4d05680038	[HUDI-1327] Introduce base implemetation of hudi-flink-client (#2176 )	2020-11-18 17:57:11 +08:00
Karl_Wang	430d4b428e	[HUDI-1377] remove duplicate code (#2235 )	2020-11-10 10:08:08 -08:00
Balaji Varadarajan	42b6aeca28	[HUDI-1358] Fix Memory Leak in HoodieLogFormatWriter (#2217 )	2020-11-09 19:26:13 -08:00
wenningd	0364498ae3	[HUDI-1375] Fix bug in HoodieAvroUtils.removeMetadataFields() method (#2232 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-11-05 17:30:17 -08:00
satishkotha	33ec88fc38	[HUDI-1352] Add FileSystemView APIs to query pending clustering operations (#2202 )	2020-11-05 08:49:58 -08:00
lw0090	5f5c15b0d9	[HUDI-892] RealtimeParquetInputFormat skip adding projection columns if there are no log files (#2190 ) * [HUDI-892] RealtimeParquetInputFormat skip adding projection columns if there are no log files * [HUDI-892] for test * [HUDI-892] fix bug generate array from split * [HUDI-892] revert test log	2020-11-02 20:00:12 -08:00
wangxianghu	d160abb437	[HUDI-912] Refactor and relocate KeyGenerator to support more engines (#2200 ) * [HUDI-912] Refactor and relocate KeyGenerator to support more engines * Rename KeyGenerators	2020-11-02 13:12:51 -08:00
Venkatesh Rudraraju	59f995a3f5	Use RateLimiter instead of sleep. Repartition WriteStatus to optimize Hbase index writes (#1484 )	2020-11-02 08:33:27 -08:00
Sivabalan Narayanan	a205dd10fa	[HUDI-1338] Adding Delete support to test suite framework (#2172 ) - Adding Delete support to test suite. Added DeleteNode Added support to generate delete records	2020-11-01 00:15:41 -04:00
Prashant Wason	6310a2307a	[HUDI-1351] Improvements to the hudi test suite for scalability and repeated testing. (#2197 ) 1. Added the --clean-input and --clean-output parameters to clean the input and output directories before starting the job 2. Added the --delete-old-input parameter to deleted older batches for data already ingested. This helps keep number of redundant files low. 3. Added the --input-parallelism parameter to restrict the parallelism when generating input data. This helps keeping the number of generated input files low. 4. Added an option start_offset to Dag Nodes. Without ability to specify start offsets, data is generated into existing partitions. With start offset, DAG can control on which partition, the data is to be written. 5. Fixed generation of records for correct number of partitions - In the existing implementation, the partition is chosen as a random long. This does not guarantee exact number of requested partitions to be created. 6. Changed variable blacklistedFields to be a Set as that is faster than List for membership checks. 7. Fixed integer division for Math.ceil. If two integers are divided, the result is not double unless one of the integer is casted to double.	2020-10-29 06:50:37 -07:00
liujinhui	736a940854	[HUDI-1274] Make hive synchronization supports hourly partition (#2122 )	2020-10-29 11:29:50 +08:00
n3nash	e109a61803	1. Fix merge on read DAG to make docker demo pass (#2092 ) 1. Fix merge on read DAG to make docker demo pass (#2092) 2. Fix repeat_count, rollback node	2020-10-28 22:34:26 -04:00
wangxianghu	e206ddd431	[MINOR] Private the NoArgsConstructor of SparkMergeHelper and code clean (#2194 )	2020-10-26 12:22:11 +08:00
lw0090	8545ea3856	[HUDI-1118] Cleanup rollback files residing in .hoodie folder (#2205 )	2020-10-25 21:04:56 -07:00
Prashant Wason	49e855c348	[HUDI-1326] Added an API to force publish metrics and flush them. (#2152 ) * [HUDI-1326] Added an API to force publish metrics and flush them. Using the added API, publish metrics after each level of the DAG completed in hudi-test-suite. * Code cleanups Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-24 16:47:24 -07:00
Raymond Xu	14c4611857	[MINOR] Fix caller to SparkBulkInsertCommitActionExecutor (#2195 ) Fixed calling the wrong constructor	2020-10-21 19:50:10 -07:00

1 2 3 4 5 ...

1245 Commits