lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
peanut-chenzhong	c0e8b03d93	[HUDI-1977] Fix Hudi CLI tempview query issue (#4626 )	2022-01-29 10:39:08 +08:00
Raymond Xu	0bd38f26ca	[HUDI-2596] Make class names consistent in hudi-client (#4680 )	2022-01-27 17:05:08 -08:00
YueZhang	b2b23f5d3a	[HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter (#4521 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2022-01-06 21:16:29 -05:00
Sivabalan Narayanan	2954027b92	[HUDI-52] Enabling savepoint and restore for MOR table (#4507 ) * Enabling restore for MOR table * Fixing savepoint for compaction commits in MOR	2022-01-06 21:26:08 +05:30
Aimiyoo	57f43de1ea	[MINOR] Fix DedupeSparkJob typo (#4418 )	2021-12-22 11:51:26 -08:00
Sivabalan Narayanan	3ce0526924	Adding verbose output for metadata validate files command (#4166 )	2021-12-10 09:38:38 -08:00
yuzhao.cyz	a1d0ff4209	Moving to 0.11.0-SNAPSHOT on master branch.	2021-11-27 17:22:10 +08:00
huleilei	8402cac407	[HUDI-2848] Excluse guava from hudi-cli pom (#4100 )	2021-11-26 16:56:03 -05:00
Manoj Govindassamy	445208a0d2	[HUDI-2845] Metadata CLI - files/partition file listing fix and new validate option (#4092 ) - Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>	2021-11-26 16:44:16 -05:00
Y Ethan Guo	d1e83e4ba0	[HUDI-2767] Enabling timeline-server-based marker as default (#4112 ) - Changes the default config of marker type (HoodieWriteConfig.MARKERS_TYPE or hoodie.write.markers.type) from DIRECT to TIMELINE_SERVER_BASED for Spark Engine. - Adds engine-specific marker type configs: Spark -> TIMELINE_SERVER_BASED, Flink -> DIRECT, Java -> DIRECT. - Uses DIRECT markers as well for Spark structured streaming due to timeline server only available for the first mini-batch. - Fixes the marker creation method for non-partitioned table in TimelineServerBasedWriteMarkers. - Adds the fallback to direct markers even when TIMELINE_SERVER_BASED is configured, in WriteMarkersFactory: when HDFS is used, or embedded timeline server is disabled, the fallback to direct markers happens. - Fixes the closing of timeline service. - Fixes tests that depend on markers, mainly by starting the timeline service for each test.	2021-11-26 16:41:05 -05:00
Manoj Govindassamy	3d75aca40d	[HUDI-2850] Fixing Clustering CLI - schedule and run command fixes to avoid NumberFormatException (#4101 )	2021-11-26 07:17:23 -05:00
Alexey Kudinkin	6f5d8d04cd	[HUDI-2840] Fixed DeltaStreaemer to properly respect configuration passed t/h properties file (#4090 ) * Rebased `DFSPropertiesConfiguration` to access Hadoop config in liue of FS to avoid confusion * Fixed `readConfig` to take Hadoop's `Configuration` instead of FS; Fixing usages * Added test for local FS access * Rebase to use `FSUtils.getFs` * Combine properties provided as a file along w/ overrides provided from the CLI * Added helper utilities to `HoodieClusteringConfig`; Make sure corresponding config methods fallback to defaults; * Fixed DeltaStreamer usage to respect properly combined configuration; Abstracted `HoodieClusteringConfig.from` convenience utility to init Clustering config from `Properties` * Tidying up * `lint` * Reverting changes to `HoodieWriteConfig` * Tdiying up * Fixed incorrect merge of the props * Converted `HoodieConfig` to wrap around `Properties` into `TypedProperties` * Fixed compilation * Fixed compilation	2021-11-25 14:48:22 -08:00
Sivabalan Narayanan	fc9ca6a07a	[HUDI-2559] Converting commit timestamp format to millisecs (#4024 ) - Adds support for generating commit timestamps with millisecs granularity. - Older commit timestamps (in secs granularity) will be suffixed with 999 and parsed with millisecs format.	2021-11-22 11:44:38 -05:00
vinoth chandar	ae0c67d9fc	[HUDI-2795] Add mechanism to safely update,delete and recover table properties (#4038 ) * [HUDI-2795] Add mechanism to safely update,delete and recover table properties - Fail safe mechanism, that lets queries succeed off a backup file - Readers who are not upgraded to this version of code will just fail until recovery is done. - Added unit tests that exercises all these scenarios. - Adding CLI for recovery, updation to table command. - [Pending] Add some hash based verfication to ensure any rare partial writes for HDFS * Fixing upgrade/downgrade infrastructure to use new updation method	2021-11-20 08:07:40 -08:00
wenningd	24def0b30d	[HUDI-2362] Add external config file support (#3416 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-11-18 01:59:26 -08:00
Sivabalan Narayanan	ce7d233307	[HUDI-2151] Part3 Enabling marker based rollback as default rollback strategy (#3950 ) * Enabling timeline server based markers * Enabling timeline server based markers and marker based rollback * Removing constraint that timeline server can be enabled only for hdfs * Fixing tests	2021-11-17 11:51:28 +05:30
Prashant Wason	b7ee341e14	[HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe. (#2819 )	2021-11-05 09:31:42 -04:00
董可伦	48a3906ccc	[MINOR] Fix typo,'paritition' corrected to 'partition' (#3764 )	2021-10-11 14:07:34 -04:00
Y Ethan Guo	2e152177fb	[HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module (#3743 )	2021-10-06 20:20:41 -04:00
Sivabalan Narayanan	5f32162a2f	[HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 (#3590 ) * [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime. - This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline. - Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table. - Due to this, archival of data table also fences itself up until compacted instant in metadata table. All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways. - As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer. - Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition. Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table. - Enabling metadata table by default. - Adding more tests for metadata table Co-authored-by: Prashant Wason <pwason@uber.com>	2021-10-06 00:17:52 -04:00
Carl-Zhou-CN	aa546554ff	[HUDI-2451] On windows client with hdfs server for wrong file separator (#3687 ) Co-authored-by: yao.zhou <yao.zhou@linkflowtech.com>	2021-09-26 21:51:27 +08:00
liujinhui	eb5e7eec0a	MINOR_CHECKSTYLE (#3616 ) Fix checkstyle	2021-09-07 18:19:39 +08:00
Raymond Xu	cf002b6918	[HUDI-2079] Make CLI command tests functional (#3601 ) Make all tests in org.apache.hudi.cli.commands extend org.apache.hudi.cli.functional.CLIFunctionalTestHarness and tag as "functional". This also resolves a blocker where DFS init consistently failed when moving to ubuntu 18.04	2021-09-06 15:53:53 -07:00
Danny Chan	e9bf1c1186	[HUDI-2380] The default archive folder should be 'archived' (#3568 )	2021-09-04 15:53:55 +08:00
Raymond Xu	073c318d9f	[HUDI-1989] Disable HDFSParquetImporter related tests (#3597 ) Also mark HDFSParquetImportCommand and HDFSParquetImporter as deprecated.	2021-09-03 23:08:11 -04:00
Udit Mehrotra	c350d05dd3	Restore 0.8.0 config keys with deprecated annotation (#3506 ) Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-19 13:36:40 -07:00
Udit Mehrotra	3e301196bf	Moving to 0.10.0-SNAPSHOT on master branch.	2021-08-14 18:51:09 -07:00
Y Ethan Guo	23dca6c237	[HUDI-2268] Add upgrade and downgrade to and from 0.9.0 (#3470 ) - Added upgrade and downgrade step to and from 0.9.0. Upgrade adds few table properties. Downgrade recreates timeline server based marker files if any.	2021-08-14 20:20:23 -04:00
Sagar Sumit	0544d70d8f	[MINOR] Deprecate older configs (#3464 ) Rename and deprecate props in HoodieWriteConfig Rename and deprecate older props	2021-08-12 20:31:04 -07:00
Sivabalan Narayanan	1df5ded433	[HUDI-2273] Migrating some long running tests to functional test profile (#3398 )	2021-08-04 19:08:50 -04:00
wenningd	91bb0d1318	[HUDI-2255] Refactor Datasource options (#3373 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-08-03 17:50:30 -07:00
rmahindra123	8fef50e237	[HUDI-2044] Integrate consumers with rocksDB and compression within External Spillable Map (#3318 )	2021-07-28 01:31:03 -04:00
Sivabalan Narayanan	61148c1c43	[HUDI-2176, 2178, 2179] Adding virtual key support to COW table (#3306 )	2021-07-26 17:21:04 -04:00
Vinay Patil	5a94b6bf54	[HUDI-2192] Clean up Multiple versions of scala libraries detected Warning (#3292 )	2021-07-21 00:33:27 -07:00
Jintao Guan	2debb9b3ed	[HUDI-1828] Update unit tests to support ORC as the base file format (#3237 )	2021-07-15 00:05:42 +08:00
wangxianghu	62a1ad8b3a	[HUDI-1930] Bootstrap support configure KeyGenerator by type (#3170 ) * [HUDI-1930] Bootstrap support configure KeyGenerator by type	2021-07-03 20:27:37 +08:00
wenningd	d412fb2fe6	[HUDI-89] Add configOption & refactor all configs based on that (#2833 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-06-30 14:26:30 -07:00
Sivabalan Narayanan	919590988a	[HUDI-1914] Add fetching latest schema to table command in hudi-cli (#2964 )	2021-06-07 16:04:35 -07:00
wangxianghu	e7020748b5	[HUDI-1920] Set archived as the default value of HOODIE_ARCHIVELOG_FOLDER_PROP_NAME (#2978 )	2021-05-25 16:29:55 +08:00
Susu Dong	685f77b5dd	[HUDI-1740] Fix insert-overwrite API archival (#2784 ) - fix problem of archiving replace commits - Fix problem when getting empty replacecommit.requested - Improved the logic of handling empty and non-empty requested/inflight commit files. Added unit tests to cover both empty and non-empty inflight files cases and cleaned up some unused test util methods Co-authored-by: yorkzero831 <yorkzero8312@gmail.com> Co-authored-by: zheren.yu <zheren.yu@paypay-corp.co.jp>	2021-05-21 13:52:13 -07:00
zhangminglei	fe3f5c2d56	[HUDI-1913] Using streams instead of loops for input/output (#2962 )	2021-05-19 09:13:38 +08:00
TeRS-K	be9db2c4f5	[HUDI-1055] Remove hardcoded parquet in tests (#2740 ) * Remove hardcoded parquet in tests * Use DataFileUtils.getInstance * Renaming DataFileUtils to BaseFileUtils Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-05-11 10:01:45 -07:00
Sivabalan Narayanan	0284cdecce	[HUDI-1876] wiring in Hadoop Conf with AvroSchemaConverters instantiation (#2914 )	2021-05-05 21:31:44 -07:00
jsbali	4a3431866d	[HUDI-1746] Added support for replace commits in commit showpartitions, commit show_write_stats, commit showfiles (#2678 ) * Added support for replace commits in commit showpartitions, commit show_write_stats, commit showfiles * Adding CR changes * [HUDI-1746] Code review changes	2021-04-21 10:31:35 -07:00
Jintao Guan	3253079507	[HUDI-1764] Add Hudi-CLI support for clustering (#2773 ) * tmp base * update * update unit test * update * update * update CLI parameters * linting * update doSchedule in HoodieClusteringJob * update * update diff according to comments	2021-04-20 09:46:42 -07:00
hongdd	ecdbd2517f	[HUDI-699] Fix CompactionCommand and add unit test for CompactionCommand (#2325 )	2021-04-08 15:35:33 +08:00
li36909	920537cac8	[HUDI-1749] Clean/Compaction/Rollback command maybe never exit when operation fail (#2752 )	2021-04-05 23:23:15 -07:00
garyli1019	6e803e08b1	Moving to 0.9.0-SNAPSHOT on master branch.	2021-03-24 21:37:14 +08:00
n3nash	74241947c1	[HUDI-845] Added locking capability to allow multiple writers (#2374 ) * [HUDI-845] Added locking capability to allow multiple writers 1. Added LockProvider API for pluggable lock methodologies 2. Added Resolution Strategy API to allow for pluggable conflict resolution 3. Added TableService client API to schedule table services 4. Added Transaction Manager for wrapping actions within transactions	2021-03-16 16:43:53 -07:00
Prashant Wason	3b36cb805d	[HUDI-1552] Improve performance of key lookups from base file in Metadata Table. (#2494 ) * [HUDI-1552] Improve performance of key lookups from base file in Metadata Table. 1. Cache the KeyScanner across lookups so that the HFile index does not have to be read for each lookup. 2. Enable block caching in KeyScanner. 3. Move the lock to a limited scope of the code to reduce lock contention. 4. Removed reuse configuration * Properly close the readers, when metadata table is accessed from executors - Passing a reuse boolean into HoodieBackedTableMetadata - Preserve the fast return behavior when reusing and opening from multiple threads (no contention) - Handle concurrent close() and open readers, for reuse=false, by always synchronizing Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-03-15 13:42:57 -07:00

1 2 3 4

197 Commits