1
0
Commit Graph

3119 Commits

Author SHA1 Message Date
v-zhangjc9
32f7e323dc Change version to private 2024-05-24 15:16:38 +08:00
v-zhangjc9
8462d79ead Change hadoop version to 3.1.2 2022-08-02 16:06:39 +08:00
v-zhangjc9
46ce96096d Add private repo 2022-08-02 16:06:39 +08:00
Sivabalan Narayanan
765dd2eae6 [HUDI-4221] Optimzing getAllPartitionPaths (#6234)
- Levering spark par for dir processing
2022-07-29 03:49:56 -04:00
Danny Chan
ce4330d62b [HUDI-4499] Tweak default retry times for flink metadata table lock (#6238) 2022-07-29 15:01:29 +08:00
Udit Mehrotra
c39e88dcf0 [HUDI-4495] Fix handling of S3 paths incompatible with java URI standards (#6237) 2022-07-28 20:04:14 -07:00
Alexey Kudinkin
cfd0c1ee34 [HUDI-4081][HUDI-4472] Addressing Spark SQL vs Spark DS performance gap (#6213) 2022-07-28 15:36:03 -07:00
Shawn Chang
70b5cf6dab [MINOR] Minor changes around Spark 3.3 support (#6231)
Co-authored-by: Shawn Chang <yxchang@amazon.com>
2022-07-28 09:32:34 -07:00
Yann Byron
ea1fbc71ec [HUDI-4494] keep the fields' order when data is written out of order (#6233) 2022-07-28 22:15:01 +08:00
Danny Chan
07eedd3ef6 [HUDI-4484] Add default lock config options for flink metadata table (#6222) 2022-07-28 20:57:13 +08:00
Rahil C
0a5ce000bf [HUDI-4490] Make AWSDmsAvroPayload class backwards compatible (#6229)
Co-authored-by: Rahil Chertara <rchertar@amazon.com>
2022-07-27 21:55:06 -05:00
Rahil C
51599af281 [HUDI-4126] Disable file splits for Bootstrap real time queries (via InputFormat) (#6219)
Co-authored-by: Udit Mehrotra <uditme@amazon.com>
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-07-27 16:58:29 -05:00
Shawn Chang
cdaec5a8da [HUDI-4186] Support Hudi with Spark 3.3.0 (#5943)
Co-authored-by: Shawn Chang <yxchang@amazon.com>
2022-07-27 14:47:49 -07:00
Y Ethan Guo
924c30c7ea [HUDI-4469] Flip reuse flag to true in HoodieBackedTableMetadata to improve file listing (#6214) 2022-07-27 14:04:59 -07:00
Shiyan Xu
717f159bfd [HUDI-3730] Keep metasync configs backward compatible (#6221) 2022-07-27 16:00:44 +05:30
冯健
e5faf2cc84 [HUDI-4210] Create custom hbase index to solve data skew issue on hbase regions (#5797) 2022-07-26 18:09:17 +08:00
Shiyan Xu
1ea1e659c2 [HUDI-4474] Infer metasync configs (#6217)
- infer repeated sync configs from original configs
  - `META_SYNC_BASE_FILE_FORMAT`
    - infer from `org.apache.hudi.common.table.HoodieTableConfig.BASE_FILE_FORMAT`
  - `META_SYNC_ASSUME_DATE_PARTITION`
    - infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ASSUME_DATE_PARTITIONING`
  - `META_SYNC_DECODE_PARTITION`
    - infer from `org.apache.hudi.common.table.HoodieTableConfig.URL_ENCODE_PARTITIONING`
  - `META_SYNC_USE_FILE_LISTING_FROM_METADATA`
    - infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ENABLE`

As proposed in https://github.com/apache/hudi/blob/master/rfc/rfc-55/rfc-55.md#compatible-changes
2022-07-26 15:28:31 +05:30
Dongwook Kwon
74d7b4d751 [HUDI-4471] Relocate AWSDmsAvroPayload class to hudi-common 2022-07-25 17:51:27 -07:00
Alexey Kudinkin
e7c8df7e8b [HUDI-4250][HUDI-4202] Optimize performance of Column Stats Index reading in Data Skipping (#5746)
We provide an alternative way of fetching Column Stats Index within the reading process to avoid the penalty of a more heavy-weight execution scheduled through a Spark engine.
2022-07-25 15:36:12 -07:00
Sagar Sumit
6e7ac45735 [HUDI-3884] Support archival beyond savepoint commits (#5837)
Co-authored-by: sivabalan <n.siva.b@gmail.com>
2022-07-25 13:42:29 -05:00
Shiyan Xu
eee6a02f77 [HUDI-4456] Clean up test resources (#6203) 2022-07-25 10:13:06 -05:00
Shiyan Xu
71c2c3102b [HUDI-4455] Improve test classes for TestHiveSyncTool (#6202)
Improve HiveTestService, HiveTestUtil, and related classes.
2022-07-25 19:05:34 +05:30
superche
1fda9ee9bb [HUDI-4071] Match ROLLBACK_USING_MARKERS_ENABLE in sql as datasource (#6206)
Co-authored-by: superche <superche@tencent.com>
2022-07-25 18:40:23 +08:00
Danny Chan
b513232449 [HUDI-4458] Add a converter cache for flink ColumnStatsIndices (#6205) 2022-07-25 17:49:01 +08:00
Y Ethan Guo
f6e7227ed5 [MINOR] Only log stdout output for non-zero exit from commands in IT (#6199) 2022-07-24 22:08:33 -07:00
Tim Brown
76a28daeb0 [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness (#6201) 2022-07-24 21:42:15 -07:00
Vander
2a08a65f71 [MINOR] Fix typos in Spark client related classes (#6204) 2022-07-24 21:41:42 -07:00
simonsssu
1a910fd473 [HUDI-3510] Add sync validate procedure (#6200)
* [HUDI-3510] Add sync validate procedure

Co-authored-by: simonssu <simonssu@tencent.com>
2022-07-25 09:28:46 +08:00
KnightChess
a54c963543 [HUDI-4348] fix merge into sql data quality in concurrent scene (#6020) 2022-07-24 06:29:47 -07:00
Rahil C
1a5a9f7f03 [HUDI-4439] Fix Amazon CloudWatch reporter for metadata enabled tables (#6164)
Co-authored-by: Udit Mehrotra <uditme@amazon.com>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
2022-07-23 21:08:21 -07:00
Danny Chan
ba11082282 [HUDI-4450] Revert the checkpoint abort notification (#6181) 2022-07-24 08:44:22 +08:00
Danny Chan
a0ffd05b77 [HUDI-4448] Remove the latest commit refresh for timeline server (#6179) 2022-07-23 16:10:53 -07:00
Alexey Kudinkin
2d745057ea [HUDI-4420] Fixing table schema delineation on partition/data schema for Spark relations (#5708) 2022-07-23 16:59:16 -05:00
Sagar Sumit
da28e38fe3 [HUDI-4071] Make NONE sort mode as default for bulk insert (#6195) 2022-07-23 14:37:04 -05:00
Rahil C
f1f0109ab8 [HUDI-4440] Treat boostrapped table as non-partitioned in HudiFileIndex if partition column is missing from schema (#6163)
Co-authored-by: Ryan Pifer <rmpifer@umich.edu>
2022-07-23 11:44:40 -07:00
Shiyan Xu
f0e843249c [MINOR] Bump CI timeout to 150m (#6198) 2022-07-23 10:07:51 -05:00
superche
859157ec01 [MINOR] Fix Call Procedure code style (#6186)
* Fix Call Procedure code style.
Co-authored-by: superche <superche@tencent.com>
2022-07-23 17:18:38 +08:00
Rahil C
a5348cc685 [HUDI-4436] Invalidate cached table in Spark after write (#6159)
Co-authored-by: Ryan Pifer <rmpifer@umich.edu>
2022-07-22 22:47:47 -07:00
冯健
340c3dbbe1 [HUDI-4437] Fix test conflicts by clearing file system cache (#6123)
Co-authored-by: jian.feng <fengjian428@gmial.com>
Co-authored-by: jian.feng <jian.feng@shopee.com>
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-07-22 17:58:04 -07:00
Rahil C
af10a97e7a [HUDI-4435] Fix Avro field not found issue introduced by Avro 1.10 (#6155)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2022-07-22 17:26:16 -07:00
Shiyan Xu
d5c7c79d87 Revert "[HUDI-4324] Remove use_jdbc config from hudi sync (#6072)" (#6160)
This reverts commit 046044c83d.
2022-07-22 17:18:45 -07:00
Sagar Sumit
a36762a862 [HUDI-4303] Use Hive sentinel value as partition default to avoid type caste issues (#5954) 2022-07-22 17:14:36 -07:00
Alexey Kudinkin
39f2a06c85 [HUDI-3979] Optimize out mandatory columns when no merging is performed (#5430)
For MOR, when no merging is performed there is no point in reading either primary-key or pre-combine-key values (unless query is referencing these). Avoiding reading these allows to potentially save substantial resources wasted for reading it out.
2022-07-22 15:32:44 -07:00
Shiyan Xu
6b84384022 Revert "[MINOR] Fix CI issue with TestHiveSyncTool (#6110)" (#6192)
This reverts commit d5c904e10e.
2022-07-22 12:20:39 -07:00
Sagar Sumit
716dd3512b [MINOR] Disable Flink compactor IT test (#6189) 2022-07-22 10:16:55 -07:00
Alexey Kudinkin
eea4a692c0 [HUDI-4039] Make sure all builtin KeyGenerators properly implement Spark specific APIs (#5523)
This set of changes makes sure that all builtin KeyGenerators properly implement Spark-specific APIs in a performant way (minimizing key-generators overhead)
2022-07-22 08:35:07 -07:00
Shiyan Xu
d5c904e10e [MINOR] Fix CI issue with TestHiveSyncTool (#6110) 2022-07-22 10:30:00 -05:00
Alexey Kudinkin
41653fc708 [MINOR] Fallback to default for hive-style partitioning, url-encoding configs (#6175)
- Fixes broken ITTestHoodieDemo#testParquetDemo
2022-07-22 18:55:58 +05:30
ForwardXu
51b5783161 [HUDI-4404] Fix insert into dynamic partition write misalignment (#6124) 2022-07-22 09:40:52 +08:00
superche
8e0b47e360 [MINOR] Fix result missing information issue in commits_compare Procedure (#6165)
Co-authored-by: superche <superche@tencent.com>
2022-07-21 16:25:22 -07:00