1
0
Commit Graph

3047 Commits

Author SHA1 Message Date
Danny Chan
0faa562b6f [HUDI-4403] Fix the end input metadata for bounded source (#6116) 2022-07-16 12:02:17 +08:00
Shiyan Xu
726e8e3590 [MINOR] Disable TestHiveSyncGlobalCommitTool (#6119) 2022-07-15 10:23:21 -07:00
JerryYue-M
b781b31045 [HUDI-4397] Flink Inline Cluster and Compact plan distribute strategy changed from rebalance to hash to avoid potential multiple threads accessing the same file (#6106)
Co-authored-by: jerryyue <jerryyue@didiglobal.com>
2022-07-15 12:21:50 +08:00
Tim Brown
4898ea52f7 [HUDI-4399][RFC-57] Claim RFC 57 for DeltaStreamer proto support (#6112) 2022-07-14 18:11:45 -07:00
Danny Chan
05606708fa [HUDI-4393] Add marker file for target file when flink merge handle rolls over (#6103) 2022-07-14 16:00:08 +08:00
Yann Byron
aaccc63ad5 [RFC-51] [HUDI-3478] Hudi to support Change-Data-Capture (#5436)
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-07-14 00:36:26 -07:00
Danny Chan
e70a427956 [HUDI-4391] Incremental read from archived commits for flink (#6096) 2022-07-14 15:19:26 +08:00
Luning (Lucas) Wang
ee956b8951 [HUDI-4379] Bump Flink versions to 1.14.5 and 1.15.1 (#6080) 2022-07-12 15:03:24 +08:00
HunterXHunter
994c561488 [HUDI-4298] When reading the mor table with QUERY_TYPE_SNAPSHOT,Unabl… (#5937)
* [HUDI-4298] Add test case for reading mor table

Signed-off-by: LinMingQiang <1356469429@qq.com>
2022-07-12 14:49:44 +08:00
Sagar Sumit
a270eeeef9 [MINOR] Update RFCs status (#6078) 2022-07-11 13:04:25 +05:30
Shiyan Xu
51244eba82 [HUDI-4323] Make database table names optional in sync tool (#6073)
* [HUDI-4323] Make database table names optional in sync tool
* Infer from these properties from the table config
2022-07-11 10:03:31 +05:30
冯健
63f95ab801 [HUDI-3730][RFC-55] Improve hudi-sync classes design and simplify configs (#5695)
* [HUDI-4146] RFC for Improve Hive/Meta sync class design and hierarchies

Co-authored-by: jian.feng <jian.feng@shopee.com>
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-07-10 11:42:34 +05:30
Shiyan Xu
046044c83d [HUDI-4324] Remove use_jdbc config from hudi sync (#6072)
* [HUDI-4324] Remove use_jdbc config from hudi sync
* Users should use HIVE_SYNC_MODE instead
2022-07-10 11:16:09 +05:30
dependabot[bot]
10aec07fd2 [MINOR] Bump xalan from 2.7.1 to 2.7.2 (#6062)
Bumps xalan from 2.7.1 to 2.7.2.

---
updated-dependencies:
- dependency-name: xalan:xalan
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-09 20:02:36 +05:30
liujinhui
126b88b48d [HUDI-2150] Rename/Restructure configs for better modularity (#6061)
- Move clean related configuration to HoodieCleanConfig
- Move Archival related configuration to HoodieArchivalConfig
- hoodie.compaction.payload.class move this to HoodiePayloadConfig
2022-07-09 20:00:48 +05:30
superche
6566fc6625 [HUDI-3500] Add call procedure for RepairsCommand (#6053) 2022-07-09 09:29:14 +08:00
xiarixiaoyao
b686c07407 [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields (#6017)
* [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields.

* fix comments

Co-authored-by: public (bdcee5037027) <mengtao0326@qq.com>
2022-07-09 03:08:38 +08:00
Kumud Kumar Srivatsava Tirupati
fc8d96246a [HUDI-4335] Bug fixes in AWSGlueCatalogSyncClient post schema evolution. (#5995)
* fix for updateTableParameters which is not excluding partition columns and updateTableProperties boolean check

* Fix - serde parameters getting overrided on table property update

* removing stale syncConfig
2022-07-08 09:47:49 -05:00
苏承祥
f20acb8dc3 [HUDI-4367] Support copyToTable on call (#6054) 2022-07-08 15:08:11 +08:00
Danny Chan
a998586396 [minor] following 4152, refactor the clazz about plan selection strategy (#6060) 2022-07-08 09:56:10 +08:00
Danny Chan
c744848c59 [HUDI-4366] Synchronous cleaning for flink bounded source (#6051) 2022-07-08 09:55:07 +08:00
KnightChess
5673819736 [HUDI-4309] fix spark32 repartition error (#6033) 2022-07-08 09:38:09 +08:00
e74ad324c3 [HUDI-4152] Flink offline compaction support compacting multi compaction plan at once (#5677)
* [HUDI-4152] Flink offline compaction allow compact multi compaction plan at once

* [HUDI-4152] Fix exception for duplicated uid when multi compaction plan are compacted

* [HUDI-4152] Provider UT & IT for compact multi compaction plan

* [HUDI-4152] Put multi compaction plans into one compaction plan source

* [HUDI-4152] InstantCompactionPlanSelectStrategy allow multi instant by using comma

* [HUDI-4152] Add IT for InstantCompactionPlanSelectStrategy
2022-07-07 14:11:26 +08:00
Danny Chan
7eeaff9ee0 [HUDI-4357] Support flink 1.15.x (#6050) 2022-07-06 13:42:58 +08:00
shenjiayu17
b18c32379f [HUDI-4219] Merge Into when update expression "col=s.col+2" on precombine cause exception (#5828) 2022-07-06 09:10:35 +08:00
董可伦
3670e82af5 [HUDI-4356] Fix the error when sync hive in CTAS (#6029) 2022-07-06 00:08:23 +08:00
ForwardXu
8570c3aab4 [HUDI-4359] Support show_fs_path_detail command on Call Produce Command (#6042) 2022-07-05 23:56:32 +08:00
xi chaomin
23c9c5c296 [HUDI-3836] Improve the way of fetching metadata partitions from table (#5286)
Co-authored-by: xicm <xicm@asiainfo.com>
2022-07-05 07:50:17 -07:00
Y Ethan Guo
fbda4ad5bd [HUDI-4360] Fix HoodieDropPartitionsTool based on refactored meta sync (#6043) 2022-07-04 23:37:21 -07:00
YueZhang
45fdcf68a1 [HUDI-3116]Add a new HoodieDropPartitionsTool to let users drop table partitions through a standalone job. (#4459)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-07-04 19:24:18 -07:00
Shiyan Xu
6187622178 [MINOR] Improve variable names (#6039) 2022-07-04 18:03:50 -07:00
voonhous
c091e4cc30 [HUDI-3730] Add ConfigTool#toMap UT (#6035)
Co-authored-by: voonhou.su <voonhou.su@shopee.com>
2022-07-04 15:07:19 -07:00
superche
e0954040a9 [HUDI-3511] Add call procedure for MetadataCommand (#6018) 2022-07-03 21:44:56 +08:00
Shiyan Xu
c0e1587966 [HUDI-3730] Improve meta sync class design and hierarchies (#5854)
* [HUDI-3730] Improve meta sync class design and hierarchies (#5754)
* Implements class design proposed in RFC-55

Co-authored-by: jian.feng <fengjian428@gmial.com>
Co-authored-by: jian.feng <jian.feng@shopee.com>
2022-07-03 14:47:25 +05:30
superche
c00ea84985 [HUDI-3505] Add call procedure for UpgradeOrDowngradeCommand (#6012)
Co-authored-by: superche <superche@tencent.com>
2022-07-03 08:47:48 +08:00
Danny Chan
47792a3186 [HUDI-4353] Column stats data skipping for flink (#6026) 2022-07-03 08:29:31 +08:00
JerryYue-M
bdf73b2650 [HUDI-3953]Flink Hudi module should support low-level source and sink api (#5445)
Co-authored-by: jerryyue <jerryyue@didiglobal.com>
2022-07-02 08:38:46 +08:00
RexAn
62a0c962ac [HUDI-3634] Could read empty or partial HoodieCommitMetaData in downstream if using HDFS (#5048)
Add the differentiated logic of creating immutable file in HDFS by first creating the file.tmp and then renaming the file
2022-06-30 11:07:40 -07:00
miomiocat
397fd30142 [HUDI-3984] Remove mandatory check of partiton path for cli command (#5458) 2022-06-30 10:00:13 -07:00
komao
8547899a39 [HUDI-4285] add ByteBuffer#rewind after ByteBuffer#get in AvroDeseria… (#5907)
* [HUDI-4285] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer

* add ut

Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com>
2022-06-30 20:48:50 +08:00
RexAn
cdaaa3c4c7 [HUDI-4346] Fix params not update BULKINSERT_ARE_PARTITIONER_RECORDS_SORTED (#5999) 2022-06-29 19:26:00 -07:00
cxzl25
6a01f7029c [MINOR] Following #2070, Fix BindException when running tests on shared machines. (#5951) 2022-06-29 19:20:59 -07:00
luoyajun
3948b8935a [HUDI-4336] Fix records overwritten bug with binary primary key (#5996) 2022-06-30 09:12:00 +08:00
wenningd
03a94d9ff5 [HUDI-4331] Allow loading external config file from class loader (#5987)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2022-06-29 17:04:34 -07:00
YueZhang
e71f04768e [MINOR] Make CLI 'commit rollback' using rollbackUsingMarkers false as default (#5174)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-06-29 10:12:46 -07:00
YueZhang
637660b7aa [HUDI-1575] Claim RFC-56: Early Conflict Detection For Multi-writer (#6002)
Co-authored-by: yuezhang <yuezhang@yuezhang-mac.freewheelmedia.net>
2022-06-29 01:43:31 -07:00
Teng
e3eb14ad2d [HUDI-4334] close SparkRDDWriteClient after usage in Create/Delete/RollbackSavepointsProcedure (#5994) 2022-06-29 06:13:29 +08:00
bschell
fd7d25ab63 [HUDI-1176] Upgrade hudi to log4j2 (#5366)
* Move to log4j2

cr: https://code.amazon.com/reviews/CR-71010705

* Upgrade unit tests to log4j2

* update exclusion

Co-authored-by: Brandon Scheller <bschelle@amazon.com>
2022-06-28 12:54:23 -07:00
Alexey Kudinkin
ed823f1c6f [HUDI-4320] Make sure HoodieStorageConfig.PARQUET_WRITE_LEGACY_FORMAT_ENABLED could be specified by the writer (#5970)
Fixed sequence determining whether Parquet's legacy-format writing property should be overridden to only kick in when it has not been explicitly specified by the caller
2022-06-28 12:27:32 -07:00
BruceLin
efb9719018 [HUDI-4332] The current instant may be wrong under some extreme conditions in AppendWriteFunction. (#5988) 2022-06-28 20:42:26 +08:00