lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Sagar Sumit	a36762a862	[HUDI-4303] Use Hive sentinel value as partition default to avoid type caste issues (#5954 )	2022-07-22 17:14:36 -07:00
Alexey Kudinkin	39f2a06c85	[HUDI-3979] Optimize out mandatory columns when no merging is performed (#5430 ) For MOR, when no merging is performed there is no point in reading either primary-key or pre-combine-key values (unless query is referencing these). Avoiding reading these allows to potentially save substantial resources wasted for reading it out.	2022-07-22 15:32:44 -07:00
Alexey Kudinkin	eea4a692c0	[HUDI-4039] Make sure all builtin `KeyGenerator`s properly implement Spark specific APIs (#5523 ) This set of changes makes sure that all builtin KeyGenerators properly implement Spark-specific APIs in a performant way (minimizing key-generators overhead)	2022-07-22 08:35:07 -07:00
ForwardXu	51b5783161	[HUDI-4404] Fix insert into dynamic partition write misalignment (#6124 )	2022-07-22 09:40:52 +08:00
superche	8e0b47e360	[MINOR] Fix result missing information issue in commits_compare Procedure (#6165 ) Co-authored-by: superche <superche@tencent.com>	2022-07-21 16:25:22 -07:00
Sivabalan Narayanan	2e0dd29714	[HUDI-4204] Fixing NPE with row writer path and with OCC (#5850 )	2022-07-21 15:57:34 -07:00
Alexey Kudinkin	a33bdd32e3	[HUDI-3993] Replacing UDF in Bulk Insert w/ RDD transformation (#5470 )	2022-07-21 06:20:47 -07:00
wenningd	c7fe3fd01d	[HUDI-3764] Allow loading external configs while querying Hudi tables with Spark (#4915 ) Currently when doing Hudi queries w/ Spark, it won't load the external configurations. Say if customers enabled metadata listing in their global config file, then this would let them actually query w/o metadata feature enabled. This PR fixes this issue and allows loading global configs during the Hudi reading phase. Co-authored-by: Wenning Ding <wenningd@amazon.com>	2022-07-21 15:12:17 +05:30
Alexey Kudinkin	de37774e12	[HUDI-3896] Porting Nested Schema Pruning optimization for Hudi's custom Relations (#5428 ) Currently, all Hudi Relations bear performance gap relative to Spark's HadoopFsRelation and the reason to that is SchemaPruning optimization rule (pruning nested schemas) that is unfortunately predicated on usage of HadoopFsRelation, meaning that it's not applied in cases when any other relation is used. This change is porting this rule to Hudi relations (MOR, Incremental, etc) by the virtue of leveraging HoodieSparkSessionExtensions mechanism injecting modified version of the original SchemaPruning rule that is adopted to work w/ Hudi's custom relations. - Added customOptimizerRules to HoodieAnalysis - Added NestedSchemaPrunning Spark's Optimizer rule - Handle Spark's Optimizer pruned data schema (to effectively prune nested schemas) - Enable HoodieClientTestHarness to inject HoodieSparkSessionExtensions - Injecting Spark Session extensions for TestMORDataSource, TestCOWDataSource - Disabled fallback to HadoopFsRelation	2022-07-21 15:06:06 +05:30
RexAn	ded197800a	[HUDI-4170] Make user can use hoodie.datasource.read.paths to read necessary files (#5722 ) * Rebase codes * Move listFileSlices to HoodieBaseRelation * Fix review * Fix style * Fix bug	2022-07-17 16:11:45 +08:00
simonsssu	80368a049d	[HUDI-3503] Add call procedure for CleanCommand (#6065 ) * [HUDI-3503] Add call procedure for CleanCommand Co-authored-by: simonssu <simonssu@tencent.com>	2022-07-16 22:33:26 +08:00
liujinhui	126b88b48d	[HUDI-2150] Rename/Restructure configs for better modularity (#6061 ) - Move clean related configuration to HoodieCleanConfig - Move Archival related configuration to HoodieArchivalConfig - hoodie.compaction.payload.class move this to HoodiePayloadConfig	2022-07-09 20:00:48 +05:30
superche	6566fc6625	[HUDI-3500] Add call procedure for RepairsCommand (#6053 )	2022-07-09 09:29:14 +08:00
xiarixiaoyao	b686c07407	[HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields (#6017 ) * [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields. * fix comments Co-authored-by: public (bdcee5037027) <mengtao0326@qq.com>	2022-07-09 03:08:38 +08:00
苏承祥	f20acb8dc3	[HUDI-4367] Support copyToTable on call (#6054 )	2022-07-08 15:08:11 +08:00
KnightChess	5673819736	[HUDI-4309] fix spark32 repartition error (#6033 )	2022-07-08 09:38:09 +08:00
shenjiayu17	b18c32379f	[HUDI-4219] Merge Into when update expression "col=s.col+2" on precombine cause exception (#5828 )	2022-07-06 09:10:35 +08:00
董可伦	3670e82af5	[HUDI-4356] Fix the error when sync hive in CTAS (#6029 )	2022-07-06 00:08:23 +08:00
ForwardXu	8570c3aab4	[HUDI-4359] Support show_fs_path_detail command on Call Produce Command (#6042 )	2022-07-05 23:56:32 +08:00
superche	e0954040a9	[HUDI-3511] Add call procedure for MetadataCommand (#6018 )	2022-07-03 21:44:56 +08:00
Shiyan Xu	c0e1587966	[HUDI-3730] Improve meta sync class design and hierarchies (#5854 ) * [HUDI-3730] Improve meta sync class design and hierarchies (#5754) * Implements class design proposed in RFC-55 Co-authored-by: jian.feng <fengjian428@gmial.com> Co-authored-by: jian.feng <jian.feng@shopee.com>	2022-07-03 14:47:25 +05:30
superche	c00ea84985	[HUDI-3505] Add call procedure for UpgradeOrDowngradeCommand (#6012 ) Co-authored-by: superche <superche@tencent.com>	2022-07-03 08:47:48 +08:00
komao	8547899a39	[HUDI-4285] add ByteBuffer#rewind after ByteBuffer#get in AvroDeseria… (#5907 ) * [HUDI-4285] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer * add ut Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com>	2022-06-30 20:48:50 +08:00
Teng	e3eb14ad2d	[HUDI-4334] close SparkRDDWriteClient after usage in Create/Delete/RollbackSavepointsProcedure (#5994 )	2022-06-29 06:13:29 +08:00
bschell	fd7d25ab63	[HUDI-1176] Upgrade hudi to log4j2 (#5366 ) * Move to log4j2 cr: https://code.amazon.com/reviews/CR-71010705 * Upgrade unit tests to log4j2 * update exclusion Co-authored-by: Brandon Scheller <bschelle@amazon.com>	2022-06-28 12:54:23 -07:00
Alexey Kudinkin	ed823f1c6f	[HUDI-4320] Make sure `HoodieStorageConfig.PARQUET_WRITE_LEGACY_FORMAT_ENABLED` could be specified by the writer (#5970 ) Fixed sequence determining whether Parquet's legacy-format writing property should be overridden to only kick in when it has not been explicitly specified by the caller	2022-06-28 12:27:32 -07:00
KnightChess	09dc001430	[HUDI-4325] fix spark sql procedure cause ParseException with semicolon (#5982 ) * [HUDI-4325] fix saprk sql procedure cause ParseException with semicolon	2022-06-28 09:44:41 +08:00
superche	b14ed47f21	[HUDI-3506] Add call procedure for CommitsCommand (#5974 ) * [HUDI-3506] Add call procedure for CommitsCommand Co-authored-by: superche <superche@tencent.com>	2022-06-28 09:43:36 +08:00
ForwardXu	26c967bac6	[HUDI-3504] Support bootstrap command based on Call Produce Command (#5977 )	2022-06-27 13:06:50 +08:00
leesf	8f4e2a189e	[HUDI-4315] Do not throw exception in BaseSpark3Adapter#toTableIdentifier (#5957 )	2022-06-27 12:50:58 +08:00
cxzl25	7a6eb0f6e1	[HUDI-4309] Spark3.2 custom parser should not throw exception (#5947 )	2022-06-27 09:37:23 +08:00
ForwardXu	1c43c590ac	[HUDI-3502] Support hdfs parquet import command based on Call Produce Command (#5956 )	2022-06-26 11:27:14 +08:00
jiz	eeafaeacd2	[HUDI-3512] Add call procedure for StatsCommand (#5955 ) Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com>	2022-06-25 09:43:23 +08:00
jiz	af9f09047d	[HUDI-3509] Add call procedure for HoodieLogFileCommand (#5949 ) Co-authored-by: zhanshaoxiong <jiimmyzhan@tencent.com>	2022-06-24 10:16:54 +08:00
jiz	1bb017d396	[HUDI-3508] Add call procedure for FileSystemViewCommand (#5929 ) * [HUDI-3508] Add call procedure for FileSystemView * minor Co-authored-by: jiimmyzhan <jiimmyzhan@tencent.com>	2022-06-22 17:50:20 +08:00
RexAn	17ac5a4573	[HUDI-4173] Fix wrong results if the user read no base files hudi table by glob paths (#5723 )	2022-06-20 23:02:34 +05:30
ForwardXu	c5c4cfec91	[HUDI-3507] Support export command based on Call Produce Command (#5901 )	2022-06-19 18:48:22 +08:00
huberylee	fec49dc12b	[HUDI-4165] Support Create/Drop/Show/Refresh Index Syntax for Spark SQL (#5761 ) * Support Create/Drop/Show/Refresh Index Syntax for Spark SQL	2022-06-17 18:33:58 +08:00
KnightChess	0ff34b6974	[HUDI-4214] improve repeat init write schema in ExpressionPayload (#5820 ) * [HUDI-4214] improve repeat init write schema in ExpressionPayload	2022-06-16 17:58:37 +08:00
KnightChess	2bf0a1906d	[HUDI-4217] improve repeat init object in ExpressionPayload (#5825 )	2022-06-15 20:21:28 +08:00
superche	7b946cf351	[HUDI-3499] Add Call Procedure for show rollbacks (#5848 ) * Add Call Procedure for show rollbacks * fix * add ut for show_rollback_detail and exception handle Co-authored-by: superche <superche@tencent.com>	2022-06-15 16:50:15 +08:00
Y Ethan Guo	97ccf5dd18	[HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table (#5840 ) When explicitly specifying the metadata table path for reading in spark, the "hoodie.metadata.enable" is overwritten to true for proper read behavior.	2022-06-11 13:19:24 -07:00
xi chaomin	2b3a85528a	[HUDI-3889] Do not validate table config if save mode is set to Overwrite (#5619 ) Co-authored-by: xicm <xicm@asiainfo.com>	2022-06-09 19:23:51 -04:00
Alexey Kudinkin	35afdb4316	[HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration (#5737 ) There are multiple issues with our current DataSource V2 integrations: b/c we advertise Hudi tables as V2, Spark expects it to implement certain APIs which are not implemented at the moment, instead we're using custom Resolution rule (in HoodieSpark3Analysis) to instead manually fallback to V1 APIs. This commit fixes the issue by reverting DSv2 APIs and making Spark use V1, except for schema evaluation logic.	2022-06-07 16:30:46 -07:00
Sivabalan Narayanan	f85cd9b16d	[HUDI-4200] Fixing sorting of keys fetched from metadata table (#5773 ) - Key fetched from metadata table especially from base file reader is not sorted. and hence may result in throwing NPE (key prefix search) or unnecessary seeks to starting of Hfile (full key look ups). Fixing the same in this patch. This is not an issue with log blocks, since sorting is taking care within HoodieHfileDataBlock. - Commit where the sorting was mistakenly reverted [HUDI-3760] Adding capability to fetch Metadata Records by prefix #5208	2022-06-07 08:19:52 -04:00
Alexey Kudinkin	4f7ea8c79a	[HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing (#5733 ) As has been outlined in HUDI-4176, we've hit a roadblock while testing Hudi on a large dataset (~1Tb) having pretty fat commits where Hudi's commit metadata could reach into 100s of Mbs. Given the size some of ours commit metadata instances Spark's parsing and resolving phase (when spark.sql(...) is involved, but before returned Dataset is dereferenced) starts to dominate some of our queries' execution time. - Rebased onto new APIs to avoid excessive Hadoop's Path allocations - Eliminated hasOperationField completely to avoid repeatitive computations - Cleaning up duplication in HoodieActiveTimeline - Added caching for common instances of HoodieCommitMetadata - Made tableStructSchema lazy;	2022-06-06 13:14:26 -04:00
Saisai Shao	bd26d633d7	[HUDI-4168] Add Call Procedure for marker deletion (#5738 ) * Add Call Procedure for marker deletion	2022-06-05 11:05:38 +08:00
leesf	3759a38b99	[HUDI-4183] Fix using HoodieCatalog to create non-hudi tables (#5743 )	2022-06-03 17:16:48 +08:00
Jin Xing	918c4f4e0b	[HUDI-4149] Drop-Table fails when underlying table directory is broken (#5672 )	2022-05-30 19:09:26 +08:00
ForwardXu	8fa8f26031	[MINOR] Fix Hive and meta sync config for sql statement (#5316 )	2022-05-28 07:56:39 -07:00

1 2 3 4 5 ...

359 Commits