v-zhangjc9
215a794fd3
Add victoria metrics reporter
2024-05-24 15:17:37 +08:00
jcxiaozf
eb4b741c38
If there are multiple files under the same partition path and file ID, sort them according to the modification time of the files to avoid reading the files that failed to write before.
2024-05-24 15:17:37 +08:00
v-zhangjc9
5c4908f006
Add closed handler to HoodieFlinkCompactor
2024-05-24 15:16:38 +08:00
v-zhangjc9
0ac43017cb
Fix NPE when offline compaction could not find schema from data file
2024-05-24 15:16:38 +08:00
v-zhangjc9
32f7e323dc
Change version to private
2024-05-24 15:16:38 +08:00
v-zhangjc9
8462d79ead
Change hadoop version to 3.1.2
2022-08-02 16:06:39 +08:00
v-zhangjc9
46ce96096d
Add private repo
2022-08-02 16:06:39 +08:00
Sivabalan Narayanan
765dd2eae6
[HUDI-4221] Optimzing getAllPartitionPaths ( #6234 )
...
- Levering spark par for dir processing
2022-07-29 03:49:56 -04:00
Danny Chan
ce4330d62b
[HUDI-4499] Tweak default retry times for flink metadata table lock ( #6238 )
2022-07-29 15:01:29 +08:00
Udit Mehrotra
c39e88dcf0
[HUDI-4495] Fix handling of S3 paths incompatible with java URI standards ( #6237 )
2022-07-28 20:04:14 -07:00
Alexey Kudinkin
cfd0c1ee34
[HUDI-4081][HUDI-4472] Addressing Spark SQL vs Spark DS performance gap ( #6213 )
2022-07-28 15:36:03 -07:00
Shawn Chang
70b5cf6dab
[MINOR] Minor changes around Spark 3.3 support ( #6231 )
...
Co-authored-by: Shawn Chang <yxchang@amazon.com >
2022-07-28 09:32:34 -07:00
Yann Byron
ea1fbc71ec
[HUDI-4494] keep the fields' order when data is written out of order ( #6233 )
2022-07-28 22:15:01 +08:00
Danny Chan
07eedd3ef6
[HUDI-4484] Add default lock config options for flink metadata table ( #6222 )
2022-07-28 20:57:13 +08:00
Rahil C
0a5ce000bf
[HUDI-4490] Make AWSDmsAvroPayload class backwards compatible ( #6229 )
...
Co-authored-by: Rahil Chertara <rchertar@amazon.com >
2022-07-27 21:55:06 -05:00
Rahil C
51599af281
[HUDI-4126] Disable file splits for Bootstrap real time queries (via InputFormat) ( #6219 )
...
Co-authored-by: Udit Mehrotra <uditme@amazon.com >
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-07-27 16:58:29 -05:00
Shawn Chang
cdaec5a8da
[HUDI-4186] Support Hudi with Spark 3.3.0 ( #5943 )
...
Co-authored-by: Shawn Chang <yxchang@amazon.com >
2022-07-27 14:47:49 -07:00
Y Ethan Guo
924c30c7ea
[HUDI-4469] Flip reuse flag to true in HoodieBackedTableMetadata to improve file listing ( #6214 )
2022-07-27 14:04:59 -07:00
Shiyan Xu
717f159bfd
[HUDI-3730] Keep metasync configs backward compatible ( #6221 )
2022-07-27 16:00:44 +05:30
冯健
e5faf2cc84
[HUDI-4210] Create custom hbase index to solve data skew issue on hbase regions ( #5797 )
2022-07-26 18:09:17 +08:00
Shiyan Xu
1ea1e659c2
[HUDI-4474] Infer metasync configs ( #6217 )
...
- infer repeated sync configs from original configs
- `META_SYNC_BASE_FILE_FORMAT`
- infer from `org.apache.hudi.common.table.HoodieTableConfig.BASE_FILE_FORMAT`
- `META_SYNC_ASSUME_DATE_PARTITION`
- infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ASSUME_DATE_PARTITIONING`
- `META_SYNC_DECODE_PARTITION`
- infer from `org.apache.hudi.common.table.HoodieTableConfig.URL_ENCODE_PARTITIONING`
- `META_SYNC_USE_FILE_LISTING_FROM_METADATA`
- infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ENABLE`
As proposed in https://github.com/apache/hudi/blob/master/rfc/rfc-55/rfc-55.md#compatible-changes
2022-07-26 15:28:31 +05:30
Dongwook Kwon
74d7b4d751
[HUDI-4471] Relocate AWSDmsAvroPayload class to hudi-common
2022-07-25 17:51:27 -07:00
Alexey Kudinkin
e7c8df7e8b
[HUDI-4250][HUDI-4202] Optimize performance of Column Stats Index reading in Data Skipping ( #5746 )
...
We provide an alternative way of fetching Column Stats Index within the reading process to avoid the penalty of a more heavy-weight execution scheduled through a Spark engine.
2022-07-25 15:36:12 -07:00
Sagar Sumit
6e7ac45735
[HUDI-3884] Support archival beyond savepoint commits ( #5837 )
...
Co-authored-by: sivabalan <n.siva.b@gmail.com >
2022-07-25 13:42:29 -05:00
Shiyan Xu
eee6a02f77
[HUDI-4456] Clean up test resources ( #6203 )
2022-07-25 10:13:06 -05:00
Shiyan Xu
71c2c3102b
[HUDI-4455] Improve test classes for TestHiveSyncTool ( #6202 )
...
Improve HiveTestService, HiveTestUtil, and related classes.
2022-07-25 19:05:34 +05:30
superche
1fda9ee9bb
[HUDI-4071] Match ROLLBACK_USING_MARKERS_ENABLE in sql as datasource ( #6206 )
...
Co-authored-by: superche <superche@tencent.com >
2022-07-25 18:40:23 +08:00
Danny Chan
b513232449
[HUDI-4458] Add a converter cache for flink ColumnStatsIndices ( #6205 )
2022-07-25 17:49:01 +08:00
Y Ethan Guo
f6e7227ed5
[MINOR] Only log stdout output for non-zero exit from commands in IT ( #6199 )
2022-07-24 22:08:33 -07:00
Tim Brown
76a28daeb0
[HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness ( #6201 )
2022-07-24 21:42:15 -07:00
Vander
2a08a65f71
[MINOR] Fix typos in Spark client related classes ( #6204 )
2022-07-24 21:41:42 -07:00
simonsssu
1a910fd473
[HUDI-3510] Add sync validate procedure ( #6200 )
...
* [HUDI-3510] Add sync validate procedure
Co-authored-by: simonssu <simonssu@tencent.com >
2022-07-25 09:28:46 +08:00
KnightChess
a54c963543
[HUDI-4348] fix merge into sql data quality in concurrent scene ( #6020 )
2022-07-24 06:29:47 -07:00
Rahil C
1a5a9f7f03
[HUDI-4439] Fix Amazon CloudWatch reporter for metadata enabled tables ( #6164 )
...
Co-authored-by: Udit Mehrotra <uditme@amazon.com >
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com >
2022-07-23 21:08:21 -07:00
Danny Chan
ba11082282
[HUDI-4450] Revert the checkpoint abort notification ( #6181 )
2022-07-24 08:44:22 +08:00
Danny Chan
a0ffd05b77
[HUDI-4448] Remove the latest commit refresh for timeline server ( #6179 )
2022-07-23 16:10:53 -07:00
Alexey Kudinkin
2d745057ea
[HUDI-4420] Fixing table schema delineation on partition/data schema for Spark relations ( #5708 )
2022-07-23 16:59:16 -05:00
Sagar Sumit
da28e38fe3
[HUDI-4071] Make NONE sort mode as default for bulk insert ( #6195 )
2022-07-23 14:37:04 -05:00
Rahil C
f1f0109ab8
[HUDI-4440] Treat boostrapped table as non-partitioned in HudiFileIndex if partition column is missing from schema ( #6163 )
...
Co-authored-by: Ryan Pifer <rmpifer@umich.edu >
2022-07-23 11:44:40 -07:00
Shiyan Xu
f0e843249c
[MINOR] Bump CI timeout to 150m ( #6198 )
2022-07-23 10:07:51 -05:00
superche
859157ec01
[MINOR] Fix Call Procedure code style ( #6186 )
...
* Fix Call Procedure code style.
Co-authored-by: superche <superche@tencent.com >
2022-07-23 17:18:38 +08:00
Rahil C
a5348cc685
[HUDI-4436] Invalidate cached table in Spark after write ( #6159 )
...
Co-authored-by: Ryan Pifer <rmpifer@umich.edu >
2022-07-22 22:47:47 -07:00
冯健
340c3dbbe1
[HUDI-4437] Fix test conflicts by clearing file system cache ( #6123 )
...
Co-authored-by: jian.feng <fengjian428@gmial.com >
Co-authored-by: jian.feng <jian.feng@shopee.com >
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-07-22 17:58:04 -07:00
Rahil C
af10a97e7a
[HUDI-4435] Fix Avro field not found issue introduced by Avro 1.10 ( #6155 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2022-07-22 17:26:16 -07:00
Shiyan Xu
d5c7c79d87
Revert "[HUDI-4324] Remove use_jdbc config from hudi sync ( #6072 )" ( #6160 )
...
This reverts commit 046044c83d .
2022-07-22 17:18:45 -07:00
Sagar Sumit
a36762a862
[HUDI-4303] Use Hive sentinel value as partition default to avoid type caste issues ( #5954 )
2022-07-22 17:14:36 -07:00
Alexey Kudinkin
39f2a06c85
[HUDI-3979] Optimize out mandatory columns when no merging is performed ( #5430 )
...
For MOR, when no merging is performed there is no point in reading either primary-key or pre-combine-key values (unless query is referencing these). Avoiding reading these allows to potentially save substantial resources wasted for reading it out.
2022-07-22 15:32:44 -07:00
Shiyan Xu
6b84384022
Revert "[MINOR] Fix CI issue with TestHiveSyncTool ( #6110 )" ( #6192 )
...
This reverts commit d5c904e10e .
2022-07-22 12:20:39 -07:00
Sagar Sumit
716dd3512b
[MINOR] Disable Flink compactor IT test ( #6189 )
2022-07-22 10:16:55 -07:00
Alexey Kudinkin
eea4a692c0
[HUDI-4039] Make sure all builtin KeyGenerators properly implement Spark specific APIs ( #5523 )
...
This set of changes makes sure that all builtin KeyGenerators properly implement Spark-specific APIs in a performant way (minimizing key-generators overhead)
2022-07-22 08:35:07 -07:00