Shawn Chang
70b5cf6dab
[MINOR] Minor changes around Spark 3.3 support ( #6231 )
...
Co-authored-by: Shawn Chang <yxchang@amazon.com >
2022-07-28 09:32:34 -07:00
Yann Byron
ea1fbc71ec
[HUDI-4494] keep the fields' order when data is written out of order ( #6233 )
2022-07-28 22:15:01 +08:00
Danny Chan
07eedd3ef6
[HUDI-4484] Add default lock config options for flink metadata table ( #6222 )
2022-07-28 20:57:13 +08:00
Rahil C
0a5ce000bf
[HUDI-4490] Make AWSDmsAvroPayload class backwards compatible ( #6229 )
...
Co-authored-by: Rahil Chertara <rchertar@amazon.com >
2022-07-27 21:55:06 -05:00
Rahil C
51599af281
[HUDI-4126] Disable file splits for Bootstrap real time queries (via InputFormat) ( #6219 )
...
Co-authored-by: Udit Mehrotra <uditme@amazon.com >
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-07-27 16:58:29 -05:00
Shawn Chang
cdaec5a8da
[HUDI-4186] Support Hudi with Spark 3.3.0 ( #5943 )
...
Co-authored-by: Shawn Chang <yxchang@amazon.com >
2022-07-27 14:47:49 -07:00
Y Ethan Guo
924c30c7ea
[HUDI-4469] Flip reuse flag to true in HoodieBackedTableMetadata to improve file listing ( #6214 )
2022-07-27 14:04:59 -07:00
Shiyan Xu
717f159bfd
[HUDI-3730] Keep metasync configs backward compatible ( #6221 )
2022-07-27 16:00:44 +05:30
冯健
e5faf2cc84
[HUDI-4210] Create custom hbase index to solve data skew issue on hbase regions ( #5797 )
2022-07-26 18:09:17 +08:00
Shiyan Xu
1ea1e659c2
[HUDI-4474] Infer metasync configs ( #6217 )
...
- infer repeated sync configs from original configs
- `META_SYNC_BASE_FILE_FORMAT`
- infer from `org.apache.hudi.common.table.HoodieTableConfig.BASE_FILE_FORMAT`
- `META_SYNC_ASSUME_DATE_PARTITION`
- infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ASSUME_DATE_PARTITIONING`
- `META_SYNC_DECODE_PARTITION`
- infer from `org.apache.hudi.common.table.HoodieTableConfig.URL_ENCODE_PARTITIONING`
- `META_SYNC_USE_FILE_LISTING_FROM_METADATA`
- infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ENABLE`
As proposed in https://github.com/apache/hudi/blob/master/rfc/rfc-55/rfc-55.md#compatible-changes
2022-07-26 15:28:31 +05:30
Dongwook Kwon
74d7b4d751
[HUDI-4471] Relocate AWSDmsAvroPayload class to hudi-common
2022-07-25 17:51:27 -07:00
Alexey Kudinkin
e7c8df7e8b
[HUDI-4250][HUDI-4202] Optimize performance of Column Stats Index reading in Data Skipping ( #5746 )
...
We provide an alternative way of fetching Column Stats Index within the reading process to avoid the penalty of a more heavy-weight execution scheduled through a Spark engine.
2022-07-25 15:36:12 -07:00
Sagar Sumit
6e7ac45735
[HUDI-3884] Support archival beyond savepoint commits ( #5837 )
...
Co-authored-by: sivabalan <n.siva.b@gmail.com >
2022-07-25 13:42:29 -05:00
Shiyan Xu
eee6a02f77
[HUDI-4456] Clean up test resources ( #6203 )
2022-07-25 10:13:06 -05:00
Shiyan Xu
71c2c3102b
[HUDI-4455] Improve test classes for TestHiveSyncTool ( #6202 )
...
Improve HiveTestService, HiveTestUtil, and related classes.
2022-07-25 19:05:34 +05:30
superche
1fda9ee9bb
[HUDI-4071] Match ROLLBACK_USING_MARKERS_ENABLE in sql as datasource ( #6206 )
...
Co-authored-by: superche <superche@tencent.com >
2022-07-25 18:40:23 +08:00
Danny Chan
b513232449
[HUDI-4458] Add a converter cache for flink ColumnStatsIndices ( #6205 )
2022-07-25 17:49:01 +08:00
Y Ethan Guo
f6e7227ed5
[MINOR] Only log stdout output for non-zero exit from commands in IT ( #6199 )
2022-07-24 22:08:33 -07:00
Tim Brown
76a28daeb0
[HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness ( #6201 )
2022-07-24 21:42:15 -07:00
Vander
2a08a65f71
[MINOR] Fix typos in Spark client related classes ( #6204 )
2022-07-24 21:41:42 -07:00
simonsssu
1a910fd473
[HUDI-3510] Add sync validate procedure ( #6200 )
...
* [HUDI-3510] Add sync validate procedure
Co-authored-by: simonssu <simonssu@tencent.com >
2022-07-25 09:28:46 +08:00
KnightChess
a54c963543
[HUDI-4348] fix merge into sql data quality in concurrent scene ( #6020 )
2022-07-24 06:29:47 -07:00
Rahil C
1a5a9f7f03
[HUDI-4439] Fix Amazon CloudWatch reporter for metadata enabled tables ( #6164 )
...
Co-authored-by: Udit Mehrotra <uditme@amazon.com >
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com >
2022-07-23 21:08:21 -07:00
Danny Chan
ba11082282
[HUDI-4450] Revert the checkpoint abort notification ( #6181 )
2022-07-24 08:44:22 +08:00
Danny Chan
a0ffd05b77
[HUDI-4448] Remove the latest commit refresh for timeline server ( #6179 )
2022-07-23 16:10:53 -07:00
Alexey Kudinkin
2d745057ea
[HUDI-4420] Fixing table schema delineation on partition/data schema for Spark relations ( #5708 )
2022-07-23 16:59:16 -05:00
Sagar Sumit
da28e38fe3
[HUDI-4071] Make NONE sort mode as default for bulk insert ( #6195 )
2022-07-23 14:37:04 -05:00
Rahil C
f1f0109ab8
[HUDI-4440] Treat boostrapped table as non-partitioned in HudiFileIndex if partition column is missing from schema ( #6163 )
...
Co-authored-by: Ryan Pifer <rmpifer@umich.edu >
2022-07-23 11:44:40 -07:00
Shiyan Xu
f0e843249c
[MINOR] Bump CI timeout to 150m ( #6198 )
2022-07-23 10:07:51 -05:00
superche
859157ec01
[MINOR] Fix Call Procedure code style ( #6186 )
...
* Fix Call Procedure code style.
Co-authored-by: superche <superche@tencent.com >
2022-07-23 17:18:38 +08:00
Rahil C
a5348cc685
[HUDI-4436] Invalidate cached table in Spark after write ( #6159 )
...
Co-authored-by: Ryan Pifer <rmpifer@umich.edu >
2022-07-22 22:47:47 -07:00
冯健
340c3dbbe1
[HUDI-4437] Fix test conflicts by clearing file system cache ( #6123 )
...
Co-authored-by: jian.feng <fengjian428@gmial.com >
Co-authored-by: jian.feng <jian.feng@shopee.com >
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-07-22 17:58:04 -07:00
Rahil C
af10a97e7a
[HUDI-4435] Fix Avro field not found issue introduced by Avro 1.10 ( #6155 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2022-07-22 17:26:16 -07:00
Shiyan Xu
d5c7c79d87
Revert "[HUDI-4324] Remove use_jdbc config from hudi sync ( #6072 )" ( #6160 )
...
This reverts commit 046044c83d .
2022-07-22 17:18:45 -07:00
Sagar Sumit
a36762a862
[HUDI-4303] Use Hive sentinel value as partition default to avoid type caste issues ( #5954 )
2022-07-22 17:14:36 -07:00
Alexey Kudinkin
39f2a06c85
[HUDI-3979] Optimize out mandatory columns when no merging is performed ( #5430 )
...
For MOR, when no merging is performed there is no point in reading either primary-key or pre-combine-key values (unless query is referencing these). Avoiding reading these allows to potentially save substantial resources wasted for reading it out.
2022-07-22 15:32:44 -07:00
Shiyan Xu
6b84384022
Revert "[MINOR] Fix CI issue with TestHiveSyncTool ( #6110 )" ( #6192 )
...
This reverts commit d5c904e10e .
2022-07-22 12:20:39 -07:00
Sagar Sumit
716dd3512b
[MINOR] Disable Flink compactor IT test ( #6189 )
2022-07-22 10:16:55 -07:00
Alexey Kudinkin
eea4a692c0
[HUDI-4039] Make sure all builtin KeyGenerators properly implement Spark specific APIs ( #5523 )
...
This set of changes makes sure that all builtin KeyGenerators properly implement Spark-specific APIs in a performant way (minimizing key-generators overhead)
2022-07-22 08:35:07 -07:00
Shiyan Xu
d5c904e10e
[MINOR] Fix CI issue with TestHiveSyncTool ( #6110 )
2022-07-22 10:30:00 -05:00
Alexey Kudinkin
41653fc708
[MINOR] Fallback to default for hive-style partitioning, url-encoding configs ( #6175 )
...
- Fixes broken ITTestHoodieDemo#testParquetDemo
2022-07-22 18:55:58 +05:30
ForwardXu
51b5783161
[HUDI-4404] Fix insert into dynamic partition write misalignment ( #6124 )
2022-07-22 09:40:52 +08:00
superche
8e0b47e360
[MINOR] Fix result missing information issue in commits_compare Procedure ( #6165 )
...
Co-authored-by: superche <superche@tencent.com >
2022-07-21 16:25:22 -07:00
Sivabalan Narayanan
36e656aa77
[HUDI-4247] Upgrading protocol buffers version for presto bundle ( #5852 )
2022-07-21 15:58:40 -07:00
Sivabalan Narayanan
2e0dd29714
[HUDI-4204] Fixing NPE with row writer path and with OCC ( #5850 )
2022-07-21 15:57:34 -07:00
Y Ethan Guo
50cdb867c7
[HUDI-4400] Fix missing bloom filters in metadata table in non-partitioned table ( #6113 )
...
Fixes the missing bloom filters in metadata table in the non-partitioned table due to incorrect record key generation, because of wrong file names when generating the metadata payload for the bloom filter.
2022-07-21 11:38:25 -07:00
wenningd
f52b93fd10
Merge pull request #6154 from rahil-c/rahil-c/disable-emrSpark-properties
...
[HUDI-4434] Disable EmrFS file metadata caching and EMR Spark's data prefetcher feature
2022-07-21 11:35:52 -07:00
Rahil C
2bf7920bd9
[MINOR] Add logger for HoodieCopyOnWriteTableInputFormat ( #6161 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2022-07-21 22:27:18 +05:30
Alexey Kudinkin
a33bdd32e3
[HUDI-3993] Replacing UDF in Bulk Insert w/ RDD transformation ( #5470 )
2022-07-21 06:20:47 -07:00
wenningd
c7fe3fd01d
[HUDI-3764] Allow loading external configs while querying Hudi tables with Spark ( #4915 )
...
Currently when doing Hudi queries w/ Spark, it won't
load the external configurations. Say if customers enabled
metadata listing in their global config file, then this would
let them actually query w/o metadata feature enabled.
This PR fixes this issue and allows loading global
configs during the Hudi reading phase.
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2022-07-21 15:12:17 +05:30