lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Gary Li	c5e8a024f6	[HUDI-1418] Set up flink client unit test infra (#2281 )	2020-12-31 08:57:22 +08:00
Gary Li	605b617cfa	[HUDI-1434] fix incorrect log file path in HoodieWriteStat (#2300 ) * [HUDI-1434] fix incorrect log file path in HoodieWriteStat * HoodieWriteHandle#close() returns a list of WriteStatus objs * Handle rolled-over log files and return a WriteStatus per log file written - Combined data and delete block logging into a single call - Lazily initialize and manage write status based on returned AppendResult - Use FSUtils.getFileSize() to set final file size, consistent with other handles - Added tests around returned values in AppendResult - Added validation of the file sizes returned in write stat Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-12-30 14:22:15 -08:00
Danny Chan	0ecdec348e	[MINOR] Remove the duplicate code in AbstractHoodieWriteClient.startCommit (#2385 )	2020-12-29 10:49:24 +08:00
lw0090	e177466fd2	[HUDI-1350] Support Partition level delete API in HUDI (#2254 ) * [HUDI-1350] Support Partition level delete API in HUDI * [HUDI-1350] Support Partition level delete API in HUDI base InsertOverwriteCommitAction * [HUDI-1350] Support Partition level delete API in HUDI base InsertOverwriteCommitAction	2020-12-28 15:01:06 -08:00
lw0090	6cdf59d92b	[HUDI-1354] Block updates and replace on file groups in clustering (#2275 ) * [HUDI-1354] Block updates and replace on file groups in clustering * [HUDI-1354] Block updates and replace on file groups in clustering	2020-12-27 20:30:29 -08:00
wenningd	286055ce34	[HUDI-1451] Support bulk insert v2 with Spark 3.0.0 (#2328 ) Co-authored-by: Wenning Ding <wenningd@amazon.com> - Added support for bulk insert v2 with datasource v2 api in Spark 3.0.0.	2020-12-25 09:43:34 -05:00
wangxianghu	01ad449ad6	[HUDI-1485] Fix Deletes issued without any prior commits exception (#2361 )	2020-12-22 23:10:19 +08:00
satishkotha	959afb8ba4	Merge pull request #2263 from satishkotha/sk/clustering [HUDI-1075] Implement simple clustering strategies to create and run ClusteringPlan	2020-12-21 19:18:18 -08:00
Satish Kotha	6dc03b65bf	[HUDI-1075] Implement simple clustering strategies to create ClusteringPlan and to run the plan	2020-12-21 17:34:15 -08:00
jshmchenxi	0c821fecc2	[MINOR] Pass root exception to HoodieKeyGeneratorException for more information (#2354 ) Co-authored-by: Xi Chen <chenxi07@qiyi.com>	2020-12-22 09:02:23 +08:00
Shen Hong	e4e2fbc3bb	[HUDI-1419] Add base implementation for hudi java client (#2286 )	2020-12-19 19:25:27 -08:00
Sivabalan Narayanan	33d338f392	[HUDI-115] Adding DefaultHoodieRecordPayload to honor ordering with combineAndGetUpdateValue (#2311 ) * Added ability to pass in `properties` to payload methods, so they can perform table/record specific merges * Added default methods so existing payload classes are backwards compatible. * Adding DefaultHoodiePayload to honor ordering while merging two records * Fixing default payload based on feedback	2020-12-19 19:19:42 -08:00
lw0090	8b5d6f9430	[HUDI-1437] support more accurate spark JobGroup for better performance tracking (#2322 )	2020-12-17 15:20:13 -08:00
Balaji Varadarajan	069a1dcf24	[HUDI-1435] Fix bug in Marker File Reconciliation for Non-Partitioned datasets (#2301 )	2020-12-14 22:24:12 -08:00
steven zhang	11bc1fe6f4	[HUDI-1428] Clean old fileslice is invalid (#2292 ) Co-authored-by: zhang wen <wen.zhang@dmall.com> Co-authored-by: zhang wen <steven@stevendeMac-mini.local>	2020-12-13 06:28:53 -08:00
Shen Hong	236d1b0dec	[HUDI-1439] Remove scala dependency from hudi-client-common (#2306 )	2020-12-11 00:36:37 -08:00
wangxianghu	de2fbeac33	[HUDI-1412] Make HoodieWriteConfig support setting different default … (#2278 ) * [HUDI-1412] Make HoodieWriteConfig support setting different default value according to engine type	2020-12-07 09:29:53 +08:00
lw0090	1f0d5c077e	[HUDI-1349] spark sql support overwrite use insert_overwrite_table (#2196 )	2020-12-03 12:26:21 -08:00
Prashant Wason	ac23d2587f	[HUDI-1357] Added a check to validate records are not lost during merges. (#2216 ) - Turned off by default	2020-12-01 13:44:57 -08:00
leesf	3d5e9fee7f	[MINOR] refactor code in HoodieMergeHandle (#2272 )	2020-11-28 21:47:05 +08:00
Balaji Varadarajan	0ebef1c0a0	[HUDI-1358] Fix leaks in DiskBasedMap and LazyFileIterable (#2249 )	2020-11-23 10:56:26 -08:00
Shen Hong	d9411c38db	[HUDI-1364] Add HoodieJavaEngineContext to hudi-java-client (#2222 )	2020-11-23 10:06:28 -08:00
Gary Li	c8d5ea2752	[MINOR] clean up and add comments to flink client (#2261 )	2020-11-19 15:27:52 +08:00
wangxianghu	4d05680038	[HUDI-1327] Introduce base implemetation of hudi-flink-client (#2176 )	2020-11-18 17:57:11 +08:00
wangxianghu	d160abb437	[HUDI-912] Refactor and relocate KeyGenerator to support more engines (#2200 ) * [HUDI-912] Refactor and relocate KeyGenerator to support more engines * Rename KeyGenerators	2020-11-02 13:12:51 -08:00
lw0090	8545ea3856	[HUDI-1118] Cleanup rollback files residing in .hoodie folder (#2205 )	2020-10-25 21:04:56 -07:00
Prashant Wason	49e855c348	[HUDI-1326] Added an API to force publish metrics and flush them. (#2152 ) * [HUDI-1326] Added an API to force publish metrics and flush them. Using the added API, publish metrics after each level of the DAG completed in hudi-test-suite. * Code cleanups Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-24 16:47:24 -07:00
lw0090	4d80e1e221	[HUDI-284] add more test for UpdateSchemaEvolution (#2127 ) Unit test different schema evolution scenarios.	2020-10-19 07:38:04 -07:00
hj2016	c0472d3317	[HUDI-1184] Fix the support of hbase index partition path change (#1978 ) When the hbase index is used, when the record partition is changed to another partition, the path does not change according to the value of the partition column Co-authored-by: huangjing <huangjing@clinbrain.com>	2020-10-11 19:05:57 -07:00
dugenkui	b58daf29ba	[MINOR] remove unused generics type (#2163 )	2020-10-11 18:38:42 -07:00
vinoyang	eafd7bf289	[MINOR] Fix wrong javadoc and refactor some naming issues (#2156 )	2020-10-09 15:09:26 -07:00
Pratyaksh Sharma	524193eb4b	[HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode (#1566 ) Co-authored-by: Balaji Varadarajan <balaji.varadarajan@robinhood.com>	2020-10-06 20:34:03 -07:00
lw0090	fdae388626	[HUDI-1203] add port configuration for EmbeddedTimelineService (#2142 )	2020-10-05 11:36:54 -07:00
Prashant Wason	6c610b91ef	[HUDI-1305] Added an API to shutdown and remove the metrics reporter. (#2132 ) This helps in removing reporter once the test has complete. Prevents log pollution from un-necessary metric logs. - Added an API to shutdown the metrics reporter after tests.	2020-10-04 09:30:04 -07:00
Mathieu	1f7add9291	[HUDI-1089] Refactor hudi-client to support multi-engine (#1827 ) - This change breaks `hudi-client` into `hudi-client-common` and `hudi-spark-client` modules - Simple usages of Spark using jsc.parallelize() has been redone using EngineContext#map, EngineContext#flatMap etc - Code changes in the PR, break classes into `BaseXYZ` parent classes with no spark dependencies living in `hudi-client-common` - Classes on `hudi-spark-client` are named `SparkXYZ` extending the parent classes with all the Spark dependencies - To simplify/cleanup, HoodieIndex#fetchRecordLocation has been removed and its usages in tests replaced with alternatives Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-01 14:25:29 -07:00

35 Commits