1
0
Commit Graph

205 Commits

Author SHA1 Message Date
Sagar Sumit
ed106f671e [HUDI-2809] Introduce a checksum mechanism for validating hoodie.properties (#4712)
Fix dependency conflict

Fix repairs command

Implement putIfAbsent for DDB lock provider

Add upgrade step and validate while fetching configs

Validate checksum for latest table version only while fetching config

Move generateChecksum to BinaryUtil

Rebase and resolve conflict

Fix table version check
2022-02-18 10:17:06 +05:30
Y Ethan Guo
9a05940a74 [HUDI-3366] Remove hardcoded logic of disabling metadata table in tests (#4792) 2022-02-15 16:41:47 -05:00
Raymond Xu
27bd7b538e [HUDI-1576] Make archiving an async service (#4795) 2022-02-14 21:15:06 -05:00
Y Ethan Guo
6aba00e84f [MINOR] Fix typos in Spark client related classes (#4781) 2022-02-13 06:41:58 -08:00
satishkotha
89ed6f062e [HUDI-3362] Fix restore to rollback pending clustering operations followed by other rolling back other commits (#4772) 2022-02-11 14:12:45 -05:00
wenningd
1c778590d1 [HUDI-3395] Allow pass rollbackUsingMarkers to Hudi CLI rollback command (#4557)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2022-02-10 09:41:22 -05:00
YueZhang
de206acbae [HUDI-3369] New ScheduleAndExecute mode for HoodieCompactor and hudi-cli (#4750)
Schedule and execute compaction plan in one single mode.
2022-02-07 15:01:34 +05:30
Alexey Kudinkin
a68e1dc2db [HUDI-431] Adding support for Parquet in MOR LogBlocks (#4333)
- Adding support for Parquet in MOR tables Log blocks

Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
2022-02-02 14:35:05 -05:00
peanut-chenzhong
c0e8b03d93 [HUDI-1977] Fix Hudi CLI tempview query issue (#4626) 2022-01-29 10:39:08 +08:00
Raymond Xu
0bd38f26ca [HUDI-2596] Make class names consistent in hudi-client (#4680) 2022-01-27 17:05:08 -08:00
YueZhang
b2b23f5d3a [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter (#4521)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-01-06 21:16:29 -05:00
Sivabalan Narayanan
2954027b92 [HUDI-52] Enabling savepoint and restore for MOR table (#4507)
* Enabling restore for MOR table

* Fixing savepoint for compaction commits in MOR
2022-01-06 21:26:08 +05:30
Aimiyoo
57f43de1ea [MINOR] Fix DedupeSparkJob typo (#4418) 2021-12-22 11:51:26 -08:00
Sivabalan Narayanan
3ce0526924 Adding verbose output for metadata validate files command (#4166) 2021-12-10 09:38:38 -08:00
yuzhao.cyz
a1d0ff4209 Moving to 0.11.0-SNAPSHOT on master branch. 2021-11-27 17:22:10 +08:00
huleilei
8402cac407 [HUDI-2848] Excluse guava from hudi-cli pom (#4100) 2021-11-26 16:56:03 -05:00
Manoj Govindassamy
445208a0d2 [HUDI-2845] Metadata CLI - files/partition file listing fix and new validate option (#4092)
- Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
2021-11-26 16:44:16 -05:00
Y Ethan Guo
d1e83e4ba0 [HUDI-2767] Enabling timeline-server-based marker as default (#4112)
- Changes the default config of marker type (HoodieWriteConfig.MARKERS_TYPE or hoodie.write.markers.type) from DIRECT to TIMELINE_SERVER_BASED for Spark Engine.
- Adds engine-specific marker type configs: Spark -> TIMELINE_SERVER_BASED, Flink -> DIRECT, Java -> DIRECT.
- Uses DIRECT markers as well for Spark structured streaming due to timeline server only available for the first mini-batch.
- Fixes the marker creation method for non-partitioned table in TimelineServerBasedWriteMarkers.
- Adds the fallback to direct markers even when TIMELINE_SERVER_BASED is configured, in WriteMarkersFactory: when HDFS is used, or embedded timeline server is disabled, the fallback to direct markers happens.
- Fixes the closing of timeline service.
- Fixes tests that depend on markers, mainly by starting the timeline service for each test.
2021-11-26 16:41:05 -05:00
Manoj Govindassamy
3d75aca40d [HUDI-2850] Fixing Clustering CLI - schedule and run command fixes to avoid NumberFormatException (#4101) 2021-11-26 07:17:23 -05:00
Alexey Kudinkin
6f5d8d04cd [HUDI-2840] Fixed DeltaStreaemer to properly respect configuration passed t/h properties file (#4090)
* Rebased `DFSPropertiesConfiguration` to access Hadoop config in liue of FS to avoid confusion

* Fixed `readConfig` to take Hadoop's `Configuration` instead of FS;
Fixing usages

* Added test for local FS access

* Rebase to use `FSUtils.getFs`

* Combine properties provided as a file along w/ overrides provided from the CLI

* Added helper utilities to `HoodieClusteringConfig`;
Make sure corresponding config methods fallback to defaults;

* Fixed DeltaStreamer usage to respect properly combined configuration;
Abstracted `HoodieClusteringConfig.from` convenience utility to init Clustering config from `Properties`

* Tidying up

* `lint`

* Reverting changes to `HoodieWriteConfig`

* Tdiying up

* Fixed incorrect merge of the props

* Converted `HoodieConfig` to wrap around `Properties` into `TypedProperties`

* Fixed compilation

* Fixed compilation
2021-11-25 14:48:22 -08:00
Sivabalan Narayanan
fc9ca6a07a [HUDI-2559] Converting commit timestamp format to millisecs (#4024)
- Adds support for generating commit timestamps with millisecs granularity. 
- Older commit timestamps (in secs granularity) will be suffixed with 999 and parsed with millisecs format.
2021-11-22 11:44:38 -05:00
vinoth chandar
ae0c67d9fc [HUDI-2795] Add mechanism to safely update,delete and recover table properties (#4038)
* [HUDI-2795] Add mechanism to safely update,delete and recover table properties

  - Fail safe mechanism, that lets queries succeed off a backup file
  - Readers who are not upgraded to this version of code will just fail until recovery is done.
  - Added unit tests that exercises all these scenarios.
  - Adding CLI for recovery, updation to table command.
  - [Pending] Add some hash based verfication to ensure any rare partial writes for HDFS

* Fixing upgrade/downgrade infrastructure to use new updation method
2021-11-20 08:07:40 -08:00
wenningd
24def0b30d [HUDI-2362] Add external config file support (#3416)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-11-18 01:59:26 -08:00
Sivabalan Narayanan
ce7d233307 [HUDI-2151] Part3 Enabling marker based rollback as default rollback strategy (#3950)
* Enabling timeline server based markers

* Enabling timeline server based markers and marker based rollback

* Removing constraint that timeline server can be enabled only for hdfs

* Fixing tests
2021-11-17 11:51:28 +05:30
Prashant Wason
b7ee341e14 [HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe. (#2819) 2021-11-05 09:31:42 -04:00
董可伦
48a3906ccc [MINOR] Fix typo,'paritition' corrected to 'partition' (#3764) 2021-10-11 14:07:34 -04:00
Y Ethan Guo
2e152177fb [HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module (#3743) 2021-10-06 20:20:41 -04:00
Sivabalan Narayanan
5f32162a2f [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426 (#3590)
* [HUDI-2285] Adding Synchronous updates to metadata before completion of commits in data timelime.

- This patch adds synchronous updates to metadata table. In other words, every write is first committed to metadata table followed by data table. While reading metadata table, we ignore any delta commits that are present only in metadata table and not in data table timeline.
- Compaction of metadata table is fenced by the condition that we trigger compaction only when there are no inflight requests in datatable. This ensures that all base files in metadata table is always in sync with data table(w/o any holes) and only there could be some extra invalid commits among delta log files in metadata table.
- Due to this, archival of data table also fences itself up until compacted instant in metadata table.
All writes to metadata table happens within the datatable lock. So, metadata table works in one writer mode only. This might be tough to loosen since all writers write to same FILES partition and so, will result in a conflict anyways.
- As part of this, have added acquiring locks in data table for those operations which were not before while committing (rollback, clean, compaction, cluster). To note, we were not doing any conflict resolution. All we are doing here is to commit by taking a lock. So that all writes to metadata table is always a single writer. 
- Also added building block to add buckets for partitions, which will be leveraged by other indexes like record level index, etc. For now, FILES partition has only one bucket. In general, any number of buckets per partition is allowed and each partition has a fixed fileId prefix with incremental suffix for each bucket within each partition.
Have fixed [HUDI-2476]. This fix is about retrying a failed compaction if it succeeded in metadata for first time, but failed w/ data table.
- Enabling metadata table by default.
- Adding more tests for metadata table

Co-authored-by: Prashant Wason <pwason@uber.com>
2021-10-06 00:17:52 -04:00
Carl-Zhou-CN
aa546554ff [HUDI-2451] On windows client with hdfs server for wrong file separator (#3687)
Co-authored-by: yao.zhou <yao.zhou@linkflowtech.com>
2021-09-26 21:51:27 +08:00
liujinhui
eb5e7eec0a MINOR_CHECKSTYLE (#3616)
Fix checkstyle
2021-09-07 18:19:39 +08:00
Raymond Xu
cf002b6918 [HUDI-2079] Make CLI command tests functional (#3601)
Make all tests in org.apache.hudi.cli.commands extend org.apache.hudi.cli.functional.CLIFunctionalTestHarness and tag as "functional".

This also resolves a blocker where DFS init consistently failed when moving to ubuntu 18.04
2021-09-06 15:53:53 -07:00
Danny Chan
e9bf1c1186 [HUDI-2380] The default archive folder should be 'archived' (#3568) 2021-09-04 15:53:55 +08:00
Raymond Xu
073c318d9f [HUDI-1989] Disable HDFSParquetImporter related tests (#3597)
Also mark HDFSParquetImportCommand and HDFSParquetImporter as deprecated.
2021-09-03 23:08:11 -04:00
Udit Mehrotra
c350d05dd3 Restore 0.8.0 config keys with deprecated annotation (#3506)
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-08-19 13:36:40 -07:00
Udit Mehrotra
3e301196bf Moving to 0.10.0-SNAPSHOT on master branch. 2021-08-14 18:51:09 -07:00
Y Ethan Guo
23dca6c237 [HUDI-2268] Add upgrade and downgrade to and from 0.9.0 (#3470)
- Added upgrade and downgrade step to and from 0.9.0. Upgrade adds few table properties. Downgrade recreates timeline server based marker files if any.
2021-08-14 20:20:23 -04:00
Sagar Sumit
0544d70d8f [MINOR] Deprecate older configs (#3464)
Rename and deprecate props in HoodieWriteConfig

Rename and deprecate older props
2021-08-12 20:31:04 -07:00
Sivabalan Narayanan
1df5ded433 [HUDI-2273] Migrating some long running tests to functional test profile (#3398) 2021-08-04 19:08:50 -04:00
wenningd
91bb0d1318 [HUDI-2255] Refactor Datasource options (#3373)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-08-03 17:50:30 -07:00
rmahindra123
8fef50e237 [HUDI-2044] Integrate consumers with rocksDB and compression within External Spillable Map (#3318) 2021-07-28 01:31:03 -04:00
Sivabalan Narayanan
61148c1c43 [HUDI-2176, 2178, 2179] Adding virtual key support to COW table (#3306) 2021-07-26 17:21:04 -04:00
Vinay Patil
5a94b6bf54 [HUDI-2192] Clean up Multiple versions of scala libraries detected Warning (#3292) 2021-07-21 00:33:27 -07:00
Jintao Guan
2debb9b3ed [HUDI-1828] Update unit tests to support ORC as the base file format (#3237) 2021-07-15 00:05:42 +08:00
wangxianghu
62a1ad8b3a [HUDI-1930] Bootstrap support configure KeyGenerator by type (#3170)
* [HUDI-1930] Bootstrap support configure KeyGenerator by type
2021-07-03 20:27:37 +08:00
wenningd
d412fb2fe6 [HUDI-89] Add configOption & refactor all configs based on that (#2833)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-06-30 14:26:30 -07:00
Sivabalan Narayanan
919590988a [HUDI-1914] Add fetching latest schema to table command in hudi-cli (#2964) 2021-06-07 16:04:35 -07:00
wangxianghu
e7020748b5 [HUDI-1920] Set archived as the default value of HOODIE_ARCHIVELOG_FOLDER_PROP_NAME (#2978) 2021-05-25 16:29:55 +08:00
Susu Dong
685f77b5dd [HUDI-1740] Fix insert-overwrite API archival (#2784)
- fix problem of archiving replace commits
- Fix problem when getting empty replacecommit.requested
- Improved the logic of handling empty and non-empty requested/inflight commit files. Added unit tests to cover both empty and non-empty inflight files cases and cleaned up some unused test util methods

Co-authored-by: yorkzero831 <yorkzero8312@gmail.com>
Co-authored-by: zheren.yu <zheren.yu@paypay-corp.co.jp>
2021-05-21 13:52:13 -07:00
zhangminglei
fe3f5c2d56 [HUDI-1913] Using streams instead of loops for input/output (#2962) 2021-05-19 09:13:38 +08:00
TeRS-K
be9db2c4f5 [HUDI-1055] Remove hardcoded parquet in tests (#2740)
* Remove hardcoded parquet in tests
* Use DataFileUtils.getInstance
* Renaming DataFileUtils to BaseFileUtils

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2021-05-11 10:01:45 -07:00