bschell
fd7d25ab63
[HUDI-1176] Upgrade hudi to log4j2 ( #5366 )
...
* Move to log4j2
cr: https://code.amazon.com/reviews/CR-71010705
* Upgrade unit tests to log4j2
* update exclusion
Co-authored-by: Brandon Scheller <bschelle@amazon.com >
2022-06-28 12:54:23 -07:00
Alexey Kudinkin
ed823f1c6f
[HUDI-4320] Make sure HoodieStorageConfig.PARQUET_WRITE_LEGACY_FORMAT_ENABLED could be specified by the writer ( #5970 )
...
Fixed sequence determining whether Parquet's legacy-format writing property should be overridden to only kick in when it has not been explicitly specified by the caller
2022-06-28 12:27:32 -07:00
ForwardXu
08eba914ed
[HUDI-4333] fix HoodieFileIndex's listFiles method log print skipping percent NaN ( #5990 )
2022-06-28 15:08:48 +08:00
KnightChess
09dc001430
[HUDI-4325] fix spark sql procedure cause ParseException with semicolon ( #5982 )
...
* [HUDI-4325] fix saprk sql procedure cause ParseException with semicolon
2022-06-28 09:44:41 +08:00
superche
b14ed47f21
[HUDI-3506] Add call procedure for CommitsCommand ( #5974 )
...
* [HUDI-3506] Add call procedure for CommitsCommand
Co-authored-by: superche <superche@tencent.com >
2022-06-28 09:43:36 +08:00
ForwardXu
26c967bac6
[HUDI-3504] Support bootstrap command based on Call Produce Command ( #5977 )
2022-06-27 13:06:50 +08:00
leesf
8f4e2a189e
[HUDI-4315] Do not throw exception in BaseSpark3Adapter#toTableIdentifier ( #5957 )
2022-06-27 12:50:58 +08:00
cxzl25
72fa19bcc9
[HUDI-4316] Support for spillable diskmap configuration when constructing HoodieMergedLogRecordScanner ( #5959 )
2022-06-27 11:09:30 +08:00
cxzl25
7a6eb0f6e1
[HUDI-4309] Spark3.2 custom parser should not throw exception ( #5947 )
2022-06-27 09:37:23 +08:00
ForwardXu
1c43c590ac
[HUDI-3502] Support hdfs parquet import command based on Call Produce Command ( #5956 )
2022-06-26 11:27:14 +08:00
xiarixiaoyao
142adf4ccb
[HUDI-4296] Fix the bug that TestHoodieSparkSqlWriter.testSchemaEvolutionForTableType is flaky ( #5973 )
2022-06-25 21:03:19 +08:00
xiarixiaoyao
360df576a9
Revert "[TEST][DO_NOT_MERGE]fix random failed for ci ( #5948 )" ( #5971 )
...
This reverts commit e8fbd4daf4 .
2022-06-25 11:23:17 +08:00
xiarixiaoyao
e8fbd4daf4
[TEST][DO_NOT_MERGE]fix random failed for ci ( #5948 )
2022-06-25 10:15:08 +08:00
jiz
eeafaeacd2
[HUDI-3512] Add call procedure for StatsCommand ( #5955 )
...
Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com>
2022-06-25 09:43:23 +08:00
jiz
af9f09047d
[HUDI-3509] Add call procedure for HoodieLogFileCommand ( #5949 )
...
Co-authored-by: zhanshaoxiong <jiimmyzhan@tencent.com >
2022-06-24 10:16:54 +08:00
jiz
1bb017d396
[HUDI-3508] Add call procedure for FileSystemViewCommand ( #5929 )
...
* [HUDI-3508] Add call procedure for FileSystemView
* minor
Co-authored-by: jiimmyzhan <jiimmyzhan@tencent.com >
2022-06-22 17:50:20 +08:00
RexAn
17ac5a4573
[HUDI-4173] Fix wrong results if the user read no base files hudi table by glob paths ( #5723 )
2022-06-20 23:02:34 +05:30
ForwardXu
c5c4cfec91
[HUDI-3507] Support export command based on Call Produce Command ( #5901 )
2022-06-19 18:48:22 +08:00
huberylee
fec49dc12b
[HUDI-4165] Support Create/Drop/Show/Refresh Index Syntax for Spark SQL ( #5761 )
...
* Support Create/Drop/Show/Refresh Index Syntax for Spark SQL
2022-06-17 18:33:58 +08:00
KnightChess
0ff34b6974
[HUDI-4214] improve repeat init write schema in ExpressionPayload ( #5820 )
...
* [HUDI-4214] improve repeat init write schema in ExpressionPayload
2022-06-16 17:58:37 +08:00
KnightChess
2bf0a1906d
[HUDI-4217] improve repeat init object in ExpressionPayload ( #5825 )
2022-06-15 20:21:28 +08:00
superche
7b946cf351
[HUDI-3499] Add Call Procedure for show rollbacks ( #5848 )
...
* Add Call Procedure for show rollbacks
* fix
* add ut for show_rollback_detail and exception handle
Co-authored-by: superche <superche@tencent.com >
2022-06-15 16:50:15 +08:00
Shiyan Xu
5aaac21d1d
[HUDI-4224] Fix CI issues ( #5842 )
...
- Upgrade junit to 5.7.2
- Downgrade surefire and failsafe to 2.22.2
- Fix test failures that were previously not reported
- Improve azure pipeline configs
Co-authored-by: liujinhui1994 <965147871@qq.com >
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com >
2022-06-12 11:44:18 -07:00
Y Ethan Guo
fd8f7c5f6c
[HUDI-4205] Fix NullPointerException in HFile reader creation ( #5841 )
...
Replace SerializableConfiguration with SerializableWritable for broadcasting the hadoop configuration before initializing HFile readers
2022-06-11 14:46:43 -07:00
Y Ethan Guo
97ccf5dd18
[HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table ( #5840 )
...
When explicitly specifying the metadata table path for reading in spark, the "hoodie.metadata.enable" is overwritten to true for proper read behavior.
2022-06-11 13:19:24 -07:00
xi chaomin
2b3a85528a
[HUDI-3889] Do not validate table config if save mode is set to Overwrite ( #5619 )
...
Co-authored-by: xicm <xicm@asiainfo.com >
2022-06-09 19:23:51 -04:00
Danny Chan
c608dbd6c2
[HUDI-4213] Infer keygen clazz for Spark SQL ( #5815 )
2022-06-09 20:37:58 +08:00
Alexey Kudinkin
35afdb4316
[HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration ( #5737 )
...
There are multiple issues with our current DataSource V2 integrations: b/c we advertise Hudi tables as V2, Spark expects it to implement certain APIs which are not implemented at the moment, instead we're using custom Resolution rule (in HoodieSpark3Analysis) to instead manually fallback to V1 APIs. This commit fixes the issue by reverting DSv2 APIs and making Spark use V1, except for schema evaluation logic.
2022-06-07 16:30:46 -07:00
Sivabalan Narayanan
f85cd9b16d
[HUDI-4200] Fixing sorting of keys fetched from metadata table ( #5773 )
...
- Key fetched from metadata table especially from base file reader is not sorted. and hence may result in throwing NPE (key prefix search) or unnecessary seeks to starting of Hfile (full key look ups). Fixing the same in this patch. This is not an issue with log blocks, since sorting is taking care within HoodieHfileDataBlock.
- Commit where the sorting was mistakenly reverted [HUDI-3760] Adding capability to fetch Metadata Records by prefix #5208
2022-06-07 08:19:52 -04:00
Sivabalan Narayanan
4f6fc726d0
[HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys ( #5664 )
...
Bulk insert row writer code path had a gap wrt hive style partitioning and default partition when virtual keys are enabled with SimpleKeyGen. This patch fixes the issue.
2022-06-06 10:21:00 -07:00
Alexey Kudinkin
4f7ea8c79a
[HUDI-4176] Fixing TableSchemaResolver to avoid repeated HoodieCommitMetadata parsing ( #5733 )
...
As has been outlined in HUDI-4176, we've hit a roadblock while testing Hudi on a large dataset (~1Tb) having pretty fat commits where Hudi's commit metadata could reach into 100s of Mbs.
Given the size some of ours commit metadata instances Spark's parsing and resolving phase (when spark.sql(...) is involved, but before returned Dataset is dereferenced) starts to dominate some of our queries' execution time.
- Rebased onto new APIs to avoid excessive Hadoop's Path allocations
- Eliminated hasOperationField completely to avoid repeatitive computations
- Cleaning up duplication in HoodieActiveTimeline
- Added caching for common instances of HoodieCommitMetadata
- Made tableStructSchema lazy;
2022-06-06 13:14:26 -04:00
Sagar Sumit
21ab0ff8be
[HUDI-4195] Bulk insert should use right keygen for non-partitioned table ( #5759 )
2022-06-06 07:19:03 -04:00
Saisai Shao
bd26d633d7
[HUDI-4168] Add Call Procedure for marker deletion ( #5738 )
...
* Add Call Procedure for marker deletion
2022-06-05 11:05:38 +08:00
leesf
3759a38b99
[HUDI-4183] Fix using HoodieCatalog to create non-hudi tables ( #5743 )
2022-06-03 17:16:48 +08:00
Jin Xing
918c4f4e0b
[HUDI-4149] Drop-Table fails when underlying table directory is broken ( #5672 )
2022-05-30 19:09:26 +08:00
ForwardXu
8fa8f26031
[MINOR] Fix Hive and meta sync config for sql statement ( #5316 )
2022-05-28 07:56:39 -07:00
RexAn
554caa3421
[MINOR] Fix the issue when handling conf hoodie.datasource.write.operation=bulk_insert in sql mode ( #5679 )
...
Co-authored-by: Rex An <bonean131@gmail.com >
2022-05-27 04:45:09 -07:00
Alexey Kudinkin
1767ff5e7c
[HUDI-4161] Make sure partition values are taken from partition path ( #5699 )
2022-05-27 02:36:30 -07:00
watermelon12138
57dbe57bed
[HUDI-4162] Fixed some constant mapping issues. ( #5700 )
...
Co-authored-by: y00617041 <yangxuan42@huawei.com >
2022-05-27 14:08:54 +08:00
komao
8d2f009048
[HUDI-4124] Add valid check in Spark Datasource configs ( #5637 )
...
Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com >
2022-05-26 05:21:28 -07:00
liujinhui
0caa55ecb4
[HUDI-4135] remove netty and netty-all ( #5663 )
2022-05-24 03:56:28 -07:00
felixYyu
716e995a38
[MINOR] Removing redundant semicolons and line breaks ( #5662 )
2022-05-23 15:26:36 -07:00
Y Ethan Guo
752f956f03
[HUDI-3933] Add UT cases to cover different key gen ( #5638 )
2022-05-23 06:48:09 -07:00
Raymond Xu
271d1a79c0
[HUDI-4051] Allow nested field as primary key and preCombineField in spark sql ( #5517 )
...
* [HUDI-4051] Allow nested field as preCombineField in spark sql
* relax validation for primary key
2022-05-22 00:47:51 -07:00
uday08bce
32a5d268f5
[HUDI-3890] fix rat plugin issue with sql files ( #5644 )
2022-05-21 12:22:55 -04:00
Jin Xing
922f765ead
[HUDI-4100] CTAS failed to clean up when given an illegal MANAGED table definition ( #5588 )
2022-05-21 22:41:18 +08:00
huberylee
85b146d3d5
[HUDI-3985] Refactor DLASyncTool to support read hoodie table as spark datasource table ( #5532 )
2022-05-20 22:25:32 +08:00
huberylee
6573469e73
[HUDI-4116] Unify clustering/compaction related procedures' output type ( #5620 )
...
* Unify clustering/compaction related procedures' output type
* Address review comments
2022-05-19 09:48:03 +08:00
Jin Xing
d422f69a0d
[HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand ( #5564 )
...
* [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand
* Set hoodie.query.as.ro.table in serde properties
2022-05-17 14:12:50 +08:00
董可伦
a7a42e4490
[HUDI-4103] [HUDI-4001] Filter the properties should not be used when create table for Spark SQL
2022-05-16 23:26:23 +08:00