lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
rmpifer	0709c62a6b	[HUDI-1800] Exclude file slices in pending compaction when performing small file sizing (#2902 ) Co-authored-by: Ryan Pifer <ryanpife@amazon.com>	2021-05-29 08:06:01 -04:00
Susu Dong	685f77b5dd	[HUDI-1740] Fix insert-overwrite API archival (#2784 ) - fix problem of archiving replace commits - Fix problem when getting empty replacecommit.requested - Improved the logic of handling empty and non-empty requested/inflight commit files. Added unit tests to cover both empty and non-empty inflight files cases and cleaned up some unused test util methods Co-authored-by: yorkzero831 <yorkzero8312@gmail.com> Co-authored-by: zheren.yu <zheren.yu@paypay-corp.co.jp>	2021-05-21 13:52:13 -07:00
Y Ethan Guo	a96034d38d	[HUDI-1888] Fix NPE when the nested partition path field has null value (#2957 )	2021-05-21 08:28:11 -04:00
wangxianghu	ced068e1ee	[MINOR] Remove unused method in BaseSparkCommitActionExecutor (#2965 )	2021-05-20 10:18:07 +08:00
lw0090	5a8b2a4f86	[HUDI-1768] add spark datasource unit test for schema validate add column (#2776 )	2021-05-11 16:49:18 -04:00
TeRS-K	be9db2c4f5	[HUDI-1055] Remove hardcoded parquet in tests (#2740 ) * Remove hardcoded parquet in tests * Use DataFileUtils.getInstance * Renaming DataFileUtils to BaseFileUtils Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-05-11 10:01:45 -07:00
satishkotha	386767693d	[HUDI-1833] rollback pending clustering even if there is greater commit (#2863 ) * [HUDI-1833] rollback pending clustering even if there are greater commits	2021-04-27 14:21:42 -07:00
satishkotha	2999586509	[HUDI-1690] use jsc union instead of rdd union (#2872 )	2021-04-26 23:35:01 -07:00
Danny Chan	d047e91d86	[HUDI-1837] Add optional instant range to log record scanner for log (#2870 )	2021-04-26 16:53:18 +08:00
Chanh Le	a1e636dc6b	[HUDI-1551] Add support for BigDecimal and Integer when partitioning based on time. (#2851 ) Co-authored-by: trungchanh.le <trungchanh.le@bybit.com>	2021-04-22 21:56:20 +08:00
jsbali	b31c520c66	[HUDI-1714] Added tests to TestHoodieTimelineArchiveLog for the archival of compl… (#2677 ) * Added tests to TestHoodieTimelineArchiveLog for the archival of completed clean and rollback actions. * Adding code review changes * [HUDI-1714] Minor Fixes	2021-04-21 10:27:43 -07:00
li36909	6b4b878d08	[HUDI-1744] rollback fails on mor table when the partition path hasn't any files (#2749 ) Co-authored-by: lrz <lrz@lrzdeMacBook-Pro.local>	2021-04-19 15:44:11 -07:00
Aditya Tiwari	ec2334ceac	[HUDI-1716]: Resolving default values for schema from dataframe (#2765 ) - Adding default values and setting null as first entry in UNION data types in avro schema. Co-authored-by: Aditya Tiwari <aditya.tiwari@flipkart.com>	2021-04-19 10:05:20 -04:00
hj2016	1da16dfd2e	[HUDI-1784] Added print detailed stack log when hbase connection error (#2799 )	2021-04-12 13:46:06 +08:00
hongdd	ecdbd2517f	[HUDI-699] Fix CompactionCommand and add unit test for CompactionCommand (#2325 )	2021-04-08 15:35:33 +08:00
pengzhiwei	684622c7c9	[HUDI-1591] Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning (#2651 )	2021-04-01 11:12:28 -07:00
Sebastian Bernauer	aa0da72c59	Preparation for Avro update (#2650 )	2021-03-30 21:50:17 -07:00
Gary Li	452f5e2d66	[HOTFIX] close spark session in functional test suite and disable spark3 test for spark2 (#2727 )	2021-03-29 06:04:48 -07:00
garyli1019	6e803e08b1	Moving to 0.9.0-SNAPSHOT on master branch.	2021-03-24 21:37:14 +08:00
Jintao Guan	1277c62398	[HUDI-1653] Add support for composite keys in NonpartitionedKeyGenerator (#2627 ) * [HUDI-1653] Add support for composite keys in NonpartitionedKeyGenerator * update NonpartitionedKeyGenerator to support composite record keys * update NonpartitionedKeyGenerator	2021-03-18 15:33:31 -07:00
n3nash	74241947c1	[HUDI-845] Added locking capability to allow multiple writers (#2374 ) * [HUDI-845] Added locking capability to allow multiple writers 1. Added LockProvider API for pluggable lock methodologies 2. Added Resolution Strategy API to allow for pluggable conflict resolution 3. Added TableService client API to schedule table services 4. Added Transaction Manager for wrapping actions within transactions	2021-03-16 16:43:53 -07:00
Prashant Wason	3b36cb805d	[HUDI-1552] Improve performance of key lookups from base file in Metadata Table. (#2494 ) * [HUDI-1552] Improve performance of key lookups from base file in Metadata Table. 1. Cache the KeyScanner across lookups so that the HFile index does not have to be read for each lookup. 2. Enable block caching in KeyScanner. 3. Move the lock to a limited scope of the code to reduce lock contention. 4. Removed reuse configuration * Properly close the readers, when metadata table is accessed from executors - Passing a reuse boolean into HoodieBackedTableMetadata - Preserve the fast return behavior when reusing and opening from multiple threads (no contention) - Handle concurrent close() and open readers, for reuse=false, by always synchronizing Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-03-15 13:42:57 -07:00
satishkotha	c4a66324cd	[HUDI-1651] Fix archival of requested replacecommit (#2622 )	2021-03-09 15:56:44 -08:00
Raymond Xu	d3a451611c	[MINOR] HoodieClientTestHarness close resources in AfterAll phase (#2646 ) Parameterized test case like `org.apache.hudi.table.upgrade.TestUpgradeDowngrade#testUpgrade` incurs flakiness when org.apache.hadoop.fs.FileSystem#closeAll is invoked at BeforeEach; it should be invoked in AfterAll instead.	2021-03-08 17:36:03 +08:00
pengzhiwei	bc883db5de	[HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig (#2596 )	2021-03-05 14:10:27 +08:00
Raymond Xu	899ae70fdb	[HUDI-1587] Add latency and freshness support (#2541 ) Save min and max of event time in each commit and compute the latency and freshness metrics.	2021-03-03 20:13:12 -08:00
Prashant Wason	73fa308ff0	[HUDI-1634] Re-bootstrap metadata table when un-synced instants have been archived. (#2595 )	2021-03-01 20:31:55 -08:00
Prashant Wason	022df0d1b1	[HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap. (#2565 )	2021-02-25 16:52:28 -08:00
hj2016	77ba561a6b	[HUDI-1347] Fix Hbase index to make rollback synchronous (via config) (#2188 ) Co-authored-by: huangjing <huangjing@clinbrain.com> Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>	2021-02-23 20:56:58 -05:00
Prashant Wason	d2f360f5dd	[MINOR] Ensure directory exists before listing all marker files. (#2594 )	2021-02-23 08:05:59 -08:00
n3nash	ffcfb58bac	[HUDI-1486] Remove inline inflight rollback in hoodie writer (#2359 ) 1. Refactor rollback and move cleaning failed commits logic into cleaner 2. Introduce hoodie heartbeat to ascertain failed commits 3. Fix test cases	2021-02-19 20:12:22 -08:00
Sivabalan Narayanan	c9fcf964b2	[HUDI-1315] Adding builder for HoodieTableMetaClient initialization (#2534 )	2021-02-20 09:54:26 +08:00
Karl_Wang	9431aabfab	[HUDI-1381] Schedule compaction based on time elapsed (#2260 ) - introduce configs to control how compaction is triggered - Compaction can be triggered using time, number of delta commits and/or combinations - Default behaviour remains the same.	2021-02-17 07:44:53 -08:00
pengzhiwei	37972071ff	[HUDI-1109] Support Spark Structured Streaming read from Hudi table (#2485 )	2021-02-17 03:36:29 -08:00
Danny Chan	4c5b6923cc	[HUDI-1557] Make Flink write pipeline write task scalable (#2506 ) This is the #step 2 of RFC-24: https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal This PR introduce a BucketAssigner that assigns bucket ID (partition path & fileID) for each stream record. There is no need to look up index and partition the records anymore in the following pipeline for these records, we actually decide the write target location before the write and each record computes its location when the BucketAssigner receives it, thus, the indexing is with streaming style. Computing locations for a batch of records all at a time is resource consuming so a pressure to the engine, we should avoid that in streaming system.	2021-02-06 22:03:52 +08:00
Sivabalan Narayanan	eb91e5ba70	[HUDI-1523] Call mkdir(partition) only if not exists (#2501 )	2021-02-03 09:02:37 -05:00
satishkotha	9cb6cb8189	[HUDI-1266] Add unit test for validating replacecommit rollback (#2418 )	2021-01-29 10:28:08 -08:00
satishkotha	2d2d5c83b1	[HUDI-1555] Remove isEmpty to improve clustering execution performance (#2502 )	2021-01-29 10:27:09 -08:00
SteNicholas	2ee1c3fb0c	[HUDI-1234] Insert new records to data files without merging for "Insert" operation. (#2111 ) * Added HoodieConcatHandle to skip merging for "insert" operation when the corresponding config is set Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>	2021-01-27 13:09:51 -05:00
Shen Hong	c4afd179c1	[HUDI-1476] Introduce unit test infra for java client (#2478 )	2021-01-24 11:17:19 -08:00
vinoth chandar	5e30fc1b2b	[MINOR] Disabling problematic tests temporarily to stabilize CI (#2468 )	2021-01-20 14:24:34 -08:00
Vinoth Chandar	3719e7b388	Moving to 0.8.0-SNAPSHOT on master branch.	2021-01-20 11:31:22 -08:00
teeyog	c931dc5406	[MINOR] Remove redundant judgments (#2466 )	2021-01-20 20:41:09 +08:00
vinoth chandar	5ca0625b27	[HUDI 1308] Harden RFC-15 Implementation based on production testing (#2441 ) Addresses leaks, perf degradation observed during testing. These were regressions from the original rfc-15 PoC implementation. * Pass a single instance of HoodieTableMetadata everywhere * Fix tests and add config for enabling metrics - Removed special casing of assumeDatePartitioning inside FSUtils#getAllPartitionPaths() - Consequently, IOException is never thrown and many files had to be adjusted - More diligent handling of open file handles in metadata table - Added config for controlling reuse of connections - Added config for turning off fallback to listing, so we can see tests fail - Changed all ipf listing code to cache/amortize the open/close for better performance - Timelineserver also reuses connections, for better performance - Without timelineserver, when metadata table is opened from executors, reuse is not allowed - HoodieMetadataConfig passed into HoodieTableMetadata#create as argument. - Fix TestHoodieBackedTableMetadata#testSync	2021-01-19 21:20:28 -08:00
Sivabalan Narayanan	b9c2856d16	[HUDI-1535] Fix 0.7.0 snapshot (#2456 ) * Revert "[MINOR] Bumping snapshot version to 0.7.0 (#2435)" This reverts commit `a43e191d6c`. * Fixing 0.7.0 snapshot bump	2021-01-19 12:20:43 -08:00
Udit Mehrotra	684e12e9fc	[HUDI-1529] Add block size to the FileStatus objects returned from metadata table to avoid too many file splits (#2451 )	2021-01-18 07:29:53 -08:00
satishkotha	3d1d5d00b0	[HUDI-1533] Make SerializableSchema work for large schemas and add ability to sortBy numeric values (#2453 )	2021-01-17 12:36:55 -08:00
Sivabalan Narayanan	a43e191d6c	[MINOR] Bumping snapshot version to 0.7.0 (#2435 )	2021-01-16 09:56:28 -05:00
n3nash	749f657856	[HUDI-1509]: Reverting LinkedHashSet changes to combine fields from oldSchema and newSchema in favor of using only new schema for record rewriting (#2424 )	2021-01-14 12:47:50 -08:00
n3nash	e926c1a45c	HUDI-1525 fix test hbase index (#2436 )	2021-01-12 23:30:21 -08:00

1 2

93 Commits