1
0
Commit Graph

3127 Commits

Author SHA1 Message Date
v-zhangjc9
d9581682a2 Add option to control use hsync or not 2024-05-24 15:17:37 +08:00
v-zhangjc9
181df2240a Fix bug for schedule compaction manually 2024-05-24 15:17:37 +08:00
v-zhangjc9
2188b8ed8a Use hoodie table path to be uid avoid that the same name cannot be start in one job 2024-05-24 15:17:37 +08:00
v-zhangjc9
6be03ca56a Down the reader mem check 2024-05-24 15:17:37 +08:00
v-zhangjc9
215a794fd3 Add victoria metrics reporter 2024-05-24 15:17:37 +08:00
jcxiaozf
eb4b741c38 If there are multiple files under the same partition path and file ID, sort them according to the modification time of the files to avoid reading the files that failed to write before. 2024-05-24 15:17:37 +08:00
v-zhangjc9
5c4908f006 Add closed handler to HoodieFlinkCompactor 2024-05-24 15:16:38 +08:00
v-zhangjc9
0ac43017cb Fix NPE when offline compaction could not find schema from data file 2024-05-24 15:16:38 +08:00
v-zhangjc9
32f7e323dc Change version to private 2024-05-24 15:16:38 +08:00
v-zhangjc9
8462d79ead Change hadoop version to 3.1.2 2022-08-02 16:06:39 +08:00
v-zhangjc9
46ce96096d Add private repo 2022-08-02 16:06:39 +08:00
Sivabalan Narayanan
765dd2eae6 [HUDI-4221] Optimzing getAllPartitionPaths (#6234)
- Levering spark par for dir processing
2022-07-29 03:49:56 -04:00
Danny Chan
ce4330d62b [HUDI-4499] Tweak default retry times for flink metadata table lock (#6238) 2022-07-29 15:01:29 +08:00
Udit Mehrotra
c39e88dcf0 [HUDI-4495] Fix handling of S3 paths incompatible with java URI standards (#6237) 2022-07-28 20:04:14 -07:00
Alexey Kudinkin
cfd0c1ee34 [HUDI-4081][HUDI-4472] Addressing Spark SQL vs Spark DS performance gap (#6213) 2022-07-28 15:36:03 -07:00
Shawn Chang
70b5cf6dab [MINOR] Minor changes around Spark 3.3 support (#6231)
Co-authored-by: Shawn Chang <yxchang@amazon.com>
2022-07-28 09:32:34 -07:00
Yann Byron
ea1fbc71ec [HUDI-4494] keep the fields' order when data is written out of order (#6233) 2022-07-28 22:15:01 +08:00
Danny Chan
07eedd3ef6 [HUDI-4484] Add default lock config options for flink metadata table (#6222) 2022-07-28 20:57:13 +08:00
Rahil C
0a5ce000bf [HUDI-4490] Make AWSDmsAvroPayload class backwards compatible (#6229)
Co-authored-by: Rahil Chertara <rchertar@amazon.com>
2022-07-27 21:55:06 -05:00
Rahil C
51599af281 [HUDI-4126] Disable file splits for Bootstrap real time queries (via InputFormat) (#6219)
Co-authored-by: Udit Mehrotra <uditme@amazon.com>
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-07-27 16:58:29 -05:00
Shawn Chang
cdaec5a8da [HUDI-4186] Support Hudi with Spark 3.3.0 (#5943)
Co-authored-by: Shawn Chang <yxchang@amazon.com>
2022-07-27 14:47:49 -07:00
Y Ethan Guo
924c30c7ea [HUDI-4469] Flip reuse flag to true in HoodieBackedTableMetadata to improve file listing (#6214) 2022-07-27 14:04:59 -07:00
Shiyan Xu
717f159bfd [HUDI-3730] Keep metasync configs backward compatible (#6221) 2022-07-27 16:00:44 +05:30
冯健
e5faf2cc84 [HUDI-4210] Create custom hbase index to solve data skew issue on hbase regions (#5797) 2022-07-26 18:09:17 +08:00
Shiyan Xu
1ea1e659c2 [HUDI-4474] Infer metasync configs (#6217)
- infer repeated sync configs from original configs
  - `META_SYNC_BASE_FILE_FORMAT`
    - infer from `org.apache.hudi.common.table.HoodieTableConfig.BASE_FILE_FORMAT`
  - `META_SYNC_ASSUME_DATE_PARTITION`
    - infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ASSUME_DATE_PARTITIONING`
  - `META_SYNC_DECODE_PARTITION`
    - infer from `org.apache.hudi.common.table.HoodieTableConfig.URL_ENCODE_PARTITIONING`
  - `META_SYNC_USE_FILE_LISTING_FROM_METADATA`
    - infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ENABLE`

As proposed in https://github.com/apache/hudi/blob/master/rfc/rfc-55/rfc-55.md#compatible-changes
2022-07-26 15:28:31 +05:30
Dongwook Kwon
74d7b4d751 [HUDI-4471] Relocate AWSDmsAvroPayload class to hudi-common 2022-07-25 17:51:27 -07:00
Alexey Kudinkin
e7c8df7e8b [HUDI-4250][HUDI-4202] Optimize performance of Column Stats Index reading in Data Skipping (#5746)
We provide an alternative way of fetching Column Stats Index within the reading process to avoid the penalty of a more heavy-weight execution scheduled through a Spark engine.
2022-07-25 15:36:12 -07:00
Sagar Sumit
6e7ac45735 [HUDI-3884] Support archival beyond savepoint commits (#5837)
Co-authored-by: sivabalan <n.siva.b@gmail.com>
2022-07-25 13:42:29 -05:00
Shiyan Xu
eee6a02f77 [HUDI-4456] Clean up test resources (#6203) 2022-07-25 10:13:06 -05:00
Shiyan Xu
71c2c3102b [HUDI-4455] Improve test classes for TestHiveSyncTool (#6202)
Improve HiveTestService, HiveTestUtil, and related classes.
2022-07-25 19:05:34 +05:30
superche
1fda9ee9bb [HUDI-4071] Match ROLLBACK_USING_MARKERS_ENABLE in sql as datasource (#6206)
Co-authored-by: superche <superche@tencent.com>
2022-07-25 18:40:23 +08:00
Danny Chan
b513232449 [HUDI-4458] Add a converter cache for flink ColumnStatsIndices (#6205) 2022-07-25 17:49:01 +08:00
Y Ethan Guo
f6e7227ed5 [MINOR] Only log stdout output for non-zero exit from commands in IT (#6199) 2022-07-24 22:08:33 -07:00
Tim Brown
76a28daeb0 [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness (#6201) 2022-07-24 21:42:15 -07:00
Vander
2a08a65f71 [MINOR] Fix typos in Spark client related classes (#6204) 2022-07-24 21:41:42 -07:00
simonsssu
1a910fd473 [HUDI-3510] Add sync validate procedure (#6200)
* [HUDI-3510] Add sync validate procedure

Co-authored-by: simonssu <simonssu@tencent.com>
2022-07-25 09:28:46 +08:00
KnightChess
a54c963543 [HUDI-4348] fix merge into sql data quality in concurrent scene (#6020) 2022-07-24 06:29:47 -07:00
Rahil C
1a5a9f7f03 [HUDI-4439] Fix Amazon CloudWatch reporter for metadata enabled tables (#6164)
Co-authored-by: Udit Mehrotra <uditme@amazon.com>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
2022-07-23 21:08:21 -07:00
Danny Chan
ba11082282 [HUDI-4450] Revert the checkpoint abort notification (#6181) 2022-07-24 08:44:22 +08:00
Danny Chan
a0ffd05b77 [HUDI-4448] Remove the latest commit refresh for timeline server (#6179) 2022-07-23 16:10:53 -07:00
Alexey Kudinkin
2d745057ea [HUDI-4420] Fixing table schema delineation on partition/data schema for Spark relations (#5708) 2022-07-23 16:59:16 -05:00
Sagar Sumit
da28e38fe3 [HUDI-4071] Make NONE sort mode as default for bulk insert (#6195) 2022-07-23 14:37:04 -05:00
Rahil C
f1f0109ab8 [HUDI-4440] Treat boostrapped table as non-partitioned in HudiFileIndex if partition column is missing from schema (#6163)
Co-authored-by: Ryan Pifer <rmpifer@umich.edu>
2022-07-23 11:44:40 -07:00
Shiyan Xu
f0e843249c [MINOR] Bump CI timeout to 150m (#6198) 2022-07-23 10:07:51 -05:00
superche
859157ec01 [MINOR] Fix Call Procedure code style (#6186)
* Fix Call Procedure code style.
Co-authored-by: superche <superche@tencent.com>
2022-07-23 17:18:38 +08:00
Rahil C
a5348cc685 [HUDI-4436] Invalidate cached table in Spark after write (#6159)
Co-authored-by: Ryan Pifer <rmpifer@umich.edu>
2022-07-22 22:47:47 -07:00
冯健
340c3dbbe1 [HUDI-4437] Fix test conflicts by clearing file system cache (#6123)
Co-authored-by: jian.feng <fengjian428@gmial.com>
Co-authored-by: jian.feng <jian.feng@shopee.com>
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-07-22 17:58:04 -07:00
Rahil C
af10a97e7a [HUDI-4435] Fix Avro field not found issue introduced by Avro 1.10 (#6155)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2022-07-22 17:26:16 -07:00
Shiyan Xu
d5c7c79d87 Revert "[HUDI-4324] Remove use_jdbc config from hudi sync (#6072)" (#6160)
This reverts commit 046044c83d.
2022-07-22 17:18:45 -07:00
Sagar Sumit
a36762a862 [HUDI-4303] Use Hive sentinel value as partition default to avoid type caste issues (#5954) 2022-07-22 17:14:36 -07:00