Alexey Kudinkin
eea4a692c0
[HUDI-4039] Make sure all builtin KeyGenerators properly implement Spark specific APIs ( #5523 )
...
This set of changes makes sure that all builtin KeyGenerators properly implement Spark-specific APIs in a performant way (minimizing key-generators overhead)
2022-07-22 08:35:07 -07:00
ForwardXu
51b5783161
[HUDI-4404] Fix insert into dynamic partition write misalignment ( #6124 )
2022-07-22 09:40:52 +08:00
superche
8e0b47e360
[MINOR] Fix result missing information issue in commits_compare Procedure ( #6165 )
...
Co-authored-by: superche <superche@tencent.com >
2022-07-21 16:25:22 -07:00
Sivabalan Narayanan
2e0dd29714
[HUDI-4204] Fixing NPE with row writer path and with OCC ( #5850 )
2022-07-21 15:57:34 -07:00
wenningd
f52b93fd10
Merge pull request #6154 from rahil-c/rahil-c/disable-emrSpark-properties
...
[HUDI-4434] Disable EmrFS file metadata caching and EMR Spark's data prefetcher feature
2022-07-21 11:35:52 -07:00
Alexey Kudinkin
a33bdd32e3
[HUDI-3993] Replacing UDF in Bulk Insert w/ RDD transformation ( #5470 )
2022-07-21 06:20:47 -07:00
wenningd
c7fe3fd01d
[HUDI-3764] Allow loading external configs while querying Hudi tables with Spark ( #4915 )
...
Currently when doing Hudi queries w/ Spark, it won't
load the external configurations. Say if customers enabled
metadata listing in their global config file, then this would
let them actually query w/o metadata feature enabled.
This PR fixes this issue and allows loading global
configs during the Hudi reading phase.
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2022-07-21 15:12:17 +05:30
Alexey Kudinkin
de37774e12
[HUDI-3896] Porting Nested Schema Pruning optimization for Hudi's custom Relations ( #5428 )
...
Currently, all Hudi Relations bear performance gap relative to Spark's HadoopFsRelation
and the reason to that is SchemaPruning optimization rule (pruning nested schemas)
that is unfortunately predicated on usage of HadoopFsRelation, meaning that it's
not applied in cases when any other relation is used.
This change is porting this rule to Hudi relations (MOR, Incremental, etc)
by the virtue of leveraging HoodieSparkSessionExtensions mechanism
injecting modified version of the original SchemaPruning rule
that is adopted to work w/ Hudi's custom relations.
- Added customOptimizerRules to HoodieAnalysis
- Added NestedSchemaPrunning Spark's Optimizer rule
- Handle Spark's Optimizer pruned data schema (to effectively prune nested schemas)
- Enable HoodieClientTestHarness to inject HoodieSparkSessionExtensions
- Injecting Spark Session extensions for TestMORDataSource, TestCOWDataSource
- Disabled fallback to HadoopFsRelation
2022-07-21 15:06:06 +05:30
Rahil Chertara
473be87aa5
Disable EmrFS file metadata caching and EMR Spark's data prefetcher feature
2022-07-20 17:04:00 -07:00
RexAn
ded197800a
[HUDI-4170] Make user can use hoodie.datasource.read.paths to read necessary files ( #5722 )
...
* Rebase codes
* Move listFileSlices to HoodieBaseRelation
* Fix review
* Fix style
* Fix bug
2022-07-17 16:11:45 +08:00
simonsssu
80368a049d
[HUDI-3503] Add call procedure for CleanCommand ( #6065 )
...
* [HUDI-3503] Add call procedure for CleanCommand
Co-authored-by: simonssu <simonssu@tencent.com >
2022-07-16 22:33:26 +08:00
Shiyan Xu
51244eba82
[HUDI-4323] Make database table names optional in sync tool ( #6073 )
...
* [HUDI-4323] Make database table names optional in sync tool
* Infer from these properties from the table config
2022-07-11 10:03:31 +05:30
Shiyan Xu
046044c83d
[HUDI-4324] Remove use_jdbc config from hudi sync ( #6072 )
...
* [HUDI-4324] Remove use_jdbc config from hudi sync
* Users should use HIVE_SYNC_MODE instead
2022-07-10 11:16:09 +05:30
liujinhui
126b88b48d
[HUDI-2150] Rename/Restructure configs for better modularity ( #6061 )
...
- Move clean related configuration to HoodieCleanConfig
- Move Archival related configuration to HoodieArchivalConfig
- hoodie.compaction.payload.class move this to HoodiePayloadConfig
2022-07-09 20:00:48 +05:30
superche
6566fc6625
[HUDI-3500] Add call procedure for RepairsCommand ( #6053 )
2022-07-09 09:29:14 +08:00
xiarixiaoyao
b686c07407
[HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields ( #6017 )
...
* [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields.
* fix comments
Co-authored-by: public (bdcee5037027) <mengtao0326@qq.com >
2022-07-09 03:08:38 +08:00
苏承祥
f20acb8dc3
[HUDI-4367] Support copyToTable on call ( #6054 )
2022-07-08 15:08:11 +08:00
KnightChess
5673819736
[HUDI-4309] fix spark32 repartition error ( #6033 )
2022-07-08 09:38:09 +08:00
shenjiayu17
b18c32379f
[HUDI-4219] Merge Into when update expression "col=s.col+2" on precombine cause exception ( #5828 )
2022-07-06 09:10:35 +08:00
董可伦
3670e82af5
[HUDI-4356] Fix the error when sync hive in CTAS ( #6029 )
2022-07-06 00:08:23 +08:00
ForwardXu
8570c3aab4
[HUDI-4359] Support show_fs_path_detail command on Call Produce Command ( #6042 )
2022-07-05 23:56:32 +08:00
xi chaomin
23c9c5c296
[HUDI-3836] Improve the way of fetching metadata partitions from table ( #5286 )
...
Co-authored-by: xicm <xicm@asiainfo.com >
2022-07-05 07:50:17 -07:00
superche
e0954040a9
[HUDI-3511] Add call procedure for MetadataCommand ( #6018 )
2022-07-03 21:44:56 +08:00
Shiyan Xu
c0e1587966
[HUDI-3730] Improve meta sync class design and hierarchies ( #5854 )
...
* [HUDI-3730] Improve meta sync class design and hierarchies (#5754 )
* Implements class design proposed in RFC-55
Co-authored-by: jian.feng <fengjian428@gmial.com >
Co-authored-by: jian.feng <jian.feng@shopee.com >
2022-07-03 14:47:25 +05:30
superche
c00ea84985
[HUDI-3505] Add call procedure for UpgradeOrDowngradeCommand ( #6012 )
...
Co-authored-by: superche <superche@tencent.com >
2022-07-03 08:47:48 +08:00
komao
8547899a39
[HUDI-4285] add ByteBuffer#rewind after ByteBuffer#get in AvroDeseria… ( #5907 )
...
* [HUDI-4285] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer
* add ut
Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com >
2022-06-30 20:48:50 +08:00
RexAn
cdaaa3c4c7
[HUDI-4346] Fix params not update BULKINSERT_ARE_PARTITIONER_RECORDS_SORTED ( #5999 )
2022-06-29 19:26:00 -07:00
Teng
e3eb14ad2d
[HUDI-4334] close SparkRDDWriteClient after usage in Create/Delete/RollbackSavepointsProcedure ( #5994 )
2022-06-29 06:13:29 +08:00
bschell
fd7d25ab63
[HUDI-1176] Upgrade hudi to log4j2 ( #5366 )
...
* Move to log4j2
cr: https://code.amazon.com/reviews/CR-71010705
* Upgrade unit tests to log4j2
* update exclusion
Co-authored-by: Brandon Scheller <bschelle@amazon.com >
2022-06-28 12:54:23 -07:00
Alexey Kudinkin
ed823f1c6f
[HUDI-4320] Make sure HoodieStorageConfig.PARQUET_WRITE_LEGACY_FORMAT_ENABLED could be specified by the writer ( #5970 )
...
Fixed sequence determining whether Parquet's legacy-format writing property should be overridden to only kick in when it has not been explicitly specified by the caller
2022-06-28 12:27:32 -07:00
ForwardXu
08eba914ed
[HUDI-4333] fix HoodieFileIndex's listFiles method log print skipping percent NaN ( #5990 )
2022-06-28 15:08:48 +08:00
KnightChess
09dc001430
[HUDI-4325] fix spark sql procedure cause ParseException with semicolon ( #5982 )
...
* [HUDI-4325] fix saprk sql procedure cause ParseException with semicolon
2022-06-28 09:44:41 +08:00
superche
b14ed47f21
[HUDI-3506] Add call procedure for CommitsCommand ( #5974 )
...
* [HUDI-3506] Add call procedure for CommitsCommand
Co-authored-by: superche <superche@tencent.com >
2022-06-28 09:43:36 +08:00
ForwardXu
26c967bac6
[HUDI-3504] Support bootstrap command based on Call Produce Command ( #5977 )
2022-06-27 13:06:50 +08:00
leesf
8f4e2a189e
[HUDI-4315] Do not throw exception in BaseSpark3Adapter#toTableIdentifier ( #5957 )
2022-06-27 12:50:58 +08:00
cxzl25
72fa19bcc9
[HUDI-4316] Support for spillable diskmap configuration when constructing HoodieMergedLogRecordScanner ( #5959 )
2022-06-27 11:09:30 +08:00
cxzl25
7a6eb0f6e1
[HUDI-4309] Spark3.2 custom parser should not throw exception ( #5947 )
2022-06-27 09:37:23 +08:00
ForwardXu
1c43c590ac
[HUDI-3502] Support hdfs parquet import command based on Call Produce Command ( #5956 )
2022-06-26 11:27:14 +08:00
xiarixiaoyao
142adf4ccb
[HUDI-4296] Fix the bug that TestHoodieSparkSqlWriter.testSchemaEvolutionForTableType is flaky ( #5973 )
2022-06-25 21:03:19 +08:00
xiarixiaoyao
360df576a9
Revert "[TEST][DO_NOT_MERGE]fix random failed for ci ( #5948 )" ( #5971 )
...
This reverts commit e8fbd4daf4 .
2022-06-25 11:23:17 +08:00
xiarixiaoyao
e8fbd4daf4
[TEST][DO_NOT_MERGE]fix random failed for ci ( #5948 )
2022-06-25 10:15:08 +08:00
jiz
eeafaeacd2
[HUDI-3512] Add call procedure for StatsCommand ( #5955 )
...
Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com>
2022-06-25 09:43:23 +08:00
jiz
af9f09047d
[HUDI-3509] Add call procedure for HoodieLogFileCommand ( #5949 )
...
Co-authored-by: zhanshaoxiong <jiimmyzhan@tencent.com >
2022-06-24 10:16:54 +08:00
jiz
1bb017d396
[HUDI-3508] Add call procedure for FileSystemViewCommand ( #5929 )
...
* [HUDI-3508] Add call procedure for FileSystemView
* minor
Co-authored-by: jiimmyzhan <jiimmyzhan@tencent.com >
2022-06-22 17:50:20 +08:00
RexAn
17ac5a4573
[HUDI-4173] Fix wrong results if the user read no base files hudi table by glob paths ( #5723 )
2022-06-20 23:02:34 +05:30
ForwardXu
c5c4cfec91
[HUDI-3507] Support export command based on Call Produce Command ( #5901 )
2022-06-19 18:48:22 +08:00
huberylee
fec49dc12b
[HUDI-4165] Support Create/Drop/Show/Refresh Index Syntax for Spark SQL ( #5761 )
...
* Support Create/Drop/Show/Refresh Index Syntax for Spark SQL
2022-06-17 18:33:58 +08:00
KnightChess
0ff34b6974
[HUDI-4214] improve repeat init write schema in ExpressionPayload ( #5820 )
...
* [HUDI-4214] improve repeat init write schema in ExpressionPayload
2022-06-16 17:58:37 +08:00
KnightChess
2bf0a1906d
[HUDI-4217] improve repeat init object in ExpressionPayload ( #5825 )
2022-06-15 20:21:28 +08:00
superche
7b946cf351
[HUDI-3499] Add Call Procedure for show rollbacks ( #5848 )
...
* Add Call Procedure for show rollbacks
* fix
* add ut for show_rollback_detail and exception handle
Co-authored-by: superche <superche@tencent.com >
2022-06-15 16:50:15 +08:00