Sagar Sumit
da28e38fe3
[HUDI-4071] Make NONE sort mode as default for bulk insert ( #6195 )
2022-07-23 14:37:04 -05:00
Rahil C
f1f0109ab8
[HUDI-4440] Treat boostrapped table as non-partitioned in HudiFileIndex if partition column is missing from schema ( #6163 )
...
Co-authored-by: Ryan Pifer <rmpifer@umich.edu >
2022-07-23 11:44:40 -07:00
Shiyan Xu
f0e843249c
[MINOR] Bump CI timeout to 150m ( #6198 )
2022-07-23 10:07:51 -05:00
superche
859157ec01
[MINOR] Fix Call Procedure code style ( #6186 )
...
* Fix Call Procedure code style.
Co-authored-by: superche <superche@tencent.com >
2022-07-23 17:18:38 +08:00
Rahil C
a5348cc685
[HUDI-4436] Invalidate cached table in Spark after write ( #6159 )
...
Co-authored-by: Ryan Pifer <rmpifer@umich.edu >
2022-07-22 22:47:47 -07:00
冯健
340c3dbbe1
[HUDI-4437] Fix test conflicts by clearing file system cache ( #6123 )
...
Co-authored-by: jian.feng <fengjian428@gmial.com >
Co-authored-by: jian.feng <jian.feng@shopee.com >
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-07-22 17:58:04 -07:00
Rahil C
af10a97e7a
[HUDI-4435] Fix Avro field not found issue introduced by Avro 1.10 ( #6155 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2022-07-22 17:26:16 -07:00
Shiyan Xu
d5c7c79d87
Revert "[HUDI-4324] Remove use_jdbc config from hudi sync ( #6072 )" ( #6160 )
...
This reverts commit 046044c83d .
2022-07-22 17:18:45 -07:00
Sagar Sumit
a36762a862
[HUDI-4303] Use Hive sentinel value as partition default to avoid type caste issues ( #5954 )
2022-07-22 17:14:36 -07:00
Alexey Kudinkin
39f2a06c85
[HUDI-3979] Optimize out mandatory columns when no merging is performed ( #5430 )
...
For MOR, when no merging is performed there is no point in reading either primary-key or pre-combine-key values (unless query is referencing these). Avoiding reading these allows to potentially save substantial resources wasted for reading it out.
2022-07-22 15:32:44 -07:00
Shiyan Xu
6b84384022
Revert "[MINOR] Fix CI issue with TestHiveSyncTool ( #6110 )" ( #6192 )
...
This reverts commit d5c904e10e .
2022-07-22 12:20:39 -07:00
Sagar Sumit
716dd3512b
[MINOR] Disable Flink compactor IT test ( #6189 )
2022-07-22 10:16:55 -07:00
Alexey Kudinkin
eea4a692c0
[HUDI-4039] Make sure all builtin KeyGenerators properly implement Spark specific APIs ( #5523 )
...
This set of changes makes sure that all builtin KeyGenerators properly implement Spark-specific APIs in a performant way (minimizing key-generators overhead)
2022-07-22 08:35:07 -07:00
Shiyan Xu
d5c904e10e
[MINOR] Fix CI issue with TestHiveSyncTool ( #6110 )
2022-07-22 10:30:00 -05:00
Alexey Kudinkin
41653fc708
[MINOR] Fallback to default for hive-style partitioning, url-encoding configs ( #6175 )
...
- Fixes broken ITTestHoodieDemo#testParquetDemo
2022-07-22 18:55:58 +05:30
ForwardXu
51b5783161
[HUDI-4404] Fix insert into dynamic partition write misalignment ( #6124 )
2022-07-22 09:40:52 +08:00
superche
8e0b47e360
[MINOR] Fix result missing information issue in commits_compare Procedure ( #6165 )
...
Co-authored-by: superche <superche@tencent.com >
2022-07-21 16:25:22 -07:00
Sivabalan Narayanan
36e656aa77
[HUDI-4247] Upgrading protocol buffers version for presto bundle ( #5852 )
2022-07-21 15:58:40 -07:00
Sivabalan Narayanan
2e0dd29714
[HUDI-4204] Fixing NPE with row writer path and with OCC ( #5850 )
2022-07-21 15:57:34 -07:00
Y Ethan Guo
50cdb867c7
[HUDI-4400] Fix missing bloom filters in metadata table in non-partitioned table ( #6113 )
...
Fixes the missing bloom filters in metadata table in the non-partitioned table due to incorrect record key generation, because of wrong file names when generating the metadata payload for the bloom filter.
2022-07-21 11:38:25 -07:00
wenningd
f52b93fd10
Merge pull request #6154 from rahil-c/rahil-c/disable-emrSpark-properties
...
[HUDI-4434] Disable EmrFS file metadata caching and EMR Spark's data prefetcher feature
2022-07-21 11:35:52 -07:00
Rahil C
2bf7920bd9
[MINOR] Add logger for HoodieCopyOnWriteTableInputFormat ( #6161 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2022-07-21 22:27:18 +05:30
Alexey Kudinkin
a33bdd32e3
[HUDI-3993] Replacing UDF in Bulk Insert w/ RDD transformation ( #5470 )
2022-07-21 06:20:47 -07:00
wenningd
c7fe3fd01d
[HUDI-3764] Allow loading external configs while querying Hudi tables with Spark ( #4915 )
...
Currently when doing Hudi queries w/ Spark, it won't
load the external configurations. Say if customers enabled
metadata listing in their global config file, then this would
let them actually query w/o metadata feature enabled.
This PR fixes this issue and allows loading global
configs during the Hudi reading phase.
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2022-07-21 15:12:17 +05:30
Alexey Kudinkin
de37774e12
[HUDI-3896] Porting Nested Schema Pruning optimization for Hudi's custom Relations ( #5428 )
...
Currently, all Hudi Relations bear performance gap relative to Spark's HadoopFsRelation
and the reason to that is SchemaPruning optimization rule (pruning nested schemas)
that is unfortunately predicated on usage of HadoopFsRelation, meaning that it's
not applied in cases when any other relation is used.
This change is porting this rule to Hudi relations (MOR, Incremental, etc)
by the virtue of leveraging HoodieSparkSessionExtensions mechanism
injecting modified version of the original SchemaPruning rule
that is adopted to work w/ Hudi's custom relations.
- Added customOptimizerRules to HoodieAnalysis
- Added NestedSchemaPrunning Spark's Optimizer rule
- Handle Spark's Optimizer pruned data schema (to effectively prune nested schemas)
- Enable HoodieClientTestHarness to inject HoodieSparkSessionExtensions
- Injecting Spark Session extensions for TestMORDataSource, TestCOWDataSource
- Disabled fallback to HadoopFsRelation
2022-07-21 15:06:06 +05:30
Shiyan Xu
2394c62973
[HUDI-4146][RFC-55] Update config changes proposal ( #6162 )
2022-07-21 12:55:02 +05:30
Danny Chan
348519f3cd
[HUDI-4427] Add a computed column IT test ( #6150 )
2022-07-21 09:38:26 +08:00
Rahil Chertara
473be87aa5
Disable EmrFS file metadata caching and EMR Spark's data prefetcher feature
2022-07-20 17:04:00 -07:00
Y Ethan Guo
2b828ccb98
[HUDI-4401] Skip HBase version check ( #6114 )
2022-07-20 14:09:45 -07:00
Danny Chan
e3675fe9b0
[HUDI-4372] Enable matadata table by default for flink ( #6066 )
2022-07-20 16:10:19 +08:00
Danny Chan
6c3578069e
[HUDI-4416] Default database path for hoodie hive catalog ( #6136 )
2022-07-19 15:38:47 +08:00
冯健
382d19e85b
[HUDI-4065] Add FileBasedLockProvider ( #6071 )
2022-07-19 07:52:47 +08:00
liujinhui
1959b843b7
[HUDI-4409] Improve LockManager wait logic when catch exception ( #6122 )
2022-07-18 22:45:52 +08:00
Bo Cui
9282611bae
[HUDI-4098] Support HMS for flink HudiCatalog ( #6082 )
...
* [HUDI-4098]Support HMS for flink HudiCatalog
2022-07-18 11:46:23 +08:00
Sivabalan Narayanan
3964c476e0
Fix file group count issue with metadata partitions ( #5892 )
2022-07-18 07:19:29 +05:30
RexAn
ded197800a
[HUDI-4170] Make user can use hoodie.datasource.read.paths to read necessary files ( #5722 )
...
* Rebase codes
* Move listFileSlices to HoodieBaseRelation
* Fix review
* Fix style
* Fix bug
2022-07-17 16:11:45 +08:00
Alexey Kudinkin
4bda6afe0b
[HUDI-4249] Fixing in-memory HoodieData implementation to operate lazily ( #5855 )
2022-07-16 18:26:48 -05:00
simonsssu
80368a049d
[HUDI-3503] Add call procedure for CleanCommand ( #6065 )
...
* [HUDI-3503] Add call procedure for CleanCommand
Co-authored-by: simonssu <simonssu@tencent.com >
2022-07-16 22:33:26 +08:00
Danny Chan
6aec9d754f
[HUDI-4408] Reuse old rollover file as base file for flink merge handle ( #6120 )
2022-07-16 20:46:23 +08:00
Danny Chan
0faa562b6f
[HUDI-4403] Fix the end input metadata for bounded source ( #6116 )
2022-07-16 12:02:17 +08:00
Shiyan Xu
726e8e3590
[MINOR] Disable TestHiveSyncGlobalCommitTool ( #6119 )
2022-07-15 10:23:21 -07:00
JerryYue-M
b781b31045
[HUDI-4397] Flink Inline Cluster and Compact plan distribute strategy changed from rebalance to hash to avoid potential multiple threads accessing the same file ( #6106 )
...
Co-authored-by: jerryyue <jerryyue@didiglobal.com >
2022-07-15 12:21:50 +08:00
Tim Brown
4898ea52f7
[HUDI-4399][RFC-57] Claim RFC 57 for DeltaStreamer proto support ( #6112 )
2022-07-14 18:11:45 -07:00
Danny Chan
05606708fa
[HUDI-4393] Add marker file for target file when flink merge handle rolls over ( #6103 )
2022-07-14 16:00:08 +08:00
Yann Byron
aaccc63ad5
[RFC-51] [HUDI-3478] Hudi to support Change-Data-Capture ( #5436 )
...
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-07-14 00:36:26 -07:00
Danny Chan
e70a427956
[HUDI-4391] Incremental read from archived commits for flink ( #6096 )
2022-07-14 15:19:26 +08:00
Luning (Lucas) Wang
ee956b8951
[HUDI-4379] Bump Flink versions to 1.14.5 and 1.15.1 ( #6080 )
2022-07-12 15:03:24 +08:00
HunterXHunter
994c561488
[HUDI-4298] When reading the mor table with QUERY_TYPE_SNAPSHOT,Unabl… ( #5937 )
...
* [HUDI-4298] Add test case for reading mor table
Signed-off-by: LinMingQiang <1356469429@qq.com >
2022-07-12 14:49:44 +08:00
Sagar Sumit
a270eeeef9
[MINOR] Update RFCs status ( #6078 )
2022-07-11 13:04:25 +05:30
Shiyan Xu
51244eba82
[HUDI-4323] Make database table names optional in sync tool ( #6073 )
...
* [HUDI-4323] Make database table names optional in sync tool
* Infer from these properties from the table config
2022-07-11 10:03:31 +05:30