lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
ForwardXu	51b5783161	[HUDI-4404] Fix insert into dynamic partition write misalignment (#6124 )	2022-07-22 09:40:52 +08:00
superche	8e0b47e360	[MINOR] Fix result missing information issue in commits_compare Procedure (#6165 ) Co-authored-by: superche <superche@tencent.com>	2022-07-21 16:25:22 -07:00
Alexey Kudinkin	de37774e12	[HUDI-3896] Porting Nested Schema Pruning optimization for Hudi's custom Relations (#5428 ) Currently, all Hudi Relations bear performance gap relative to Spark's HadoopFsRelation and the reason to that is SchemaPruning optimization rule (pruning nested schemas) that is unfortunately predicated on usage of HadoopFsRelation, meaning that it's not applied in cases when any other relation is used. This change is porting this rule to Hudi relations (MOR, Incremental, etc) by the virtue of leveraging HoodieSparkSessionExtensions mechanism injecting modified version of the original SchemaPruning rule that is adopted to work w/ Hudi's custom relations. - Added customOptimizerRules to HoodieAnalysis - Added NestedSchemaPrunning Spark's Optimizer rule - Handle Spark's Optimizer pruned data schema (to effectively prune nested schemas) - Enable HoodieClientTestHarness to inject HoodieSparkSessionExtensions - Injecting Spark Session extensions for TestMORDataSource, TestCOWDataSource - Disabled fallback to HadoopFsRelation	2022-07-21 15:06:06 +05:30
simonsssu	80368a049d	[HUDI-3503] Add call procedure for CleanCommand (#6065 ) * [HUDI-3503] Add call procedure for CleanCommand Co-authored-by: simonssu <simonssu@tencent.com>	2022-07-16 22:33:26 +08:00
liujinhui	126b88b48d	[HUDI-2150] Rename/Restructure configs for better modularity (#6061 ) - Move clean related configuration to HoodieCleanConfig - Move Archival related configuration to HoodieArchivalConfig - hoodie.compaction.payload.class move this to HoodiePayloadConfig	2022-07-09 20:00:48 +05:30
superche	6566fc6625	[HUDI-3500] Add call procedure for RepairsCommand (#6053 )	2022-07-09 09:29:14 +08:00
苏承祥	f20acb8dc3	[HUDI-4367] Support copyToTable on call (#6054 )	2022-07-08 15:08:11 +08:00
shenjiayu17	b18c32379f	[HUDI-4219] Merge Into when update expression "col=s.col+2" on precombine cause exception (#5828 )	2022-07-06 09:10:35 +08:00
董可伦	3670e82af5	[HUDI-4356] Fix the error when sync hive in CTAS (#6029 )	2022-07-06 00:08:23 +08:00
ForwardXu	8570c3aab4	[HUDI-4359] Support show_fs_path_detail command on Call Produce Command (#6042 )	2022-07-05 23:56:32 +08:00
superche	e0954040a9	[HUDI-3511] Add call procedure for MetadataCommand (#6018 )	2022-07-03 21:44:56 +08:00
Shiyan Xu	c0e1587966	[HUDI-3730] Improve meta sync class design and hierarchies (#5854 ) * [HUDI-3730] Improve meta sync class design and hierarchies (#5754) * Implements class design proposed in RFC-55 Co-authored-by: jian.feng <fengjian428@gmial.com> Co-authored-by: jian.feng <jian.feng@shopee.com>	2022-07-03 14:47:25 +05:30
superche	c00ea84985	[HUDI-3505] Add call procedure for UpgradeOrDowngradeCommand (#6012 ) Co-authored-by: superche <superche@tencent.com>	2022-07-03 08:47:48 +08:00
Teng	e3eb14ad2d	[HUDI-4334] close SparkRDDWriteClient after usage in Create/Delete/RollbackSavepointsProcedure (#5994 )	2022-06-29 06:13:29 +08:00
KnightChess	09dc001430	[HUDI-4325] fix spark sql procedure cause ParseException with semicolon (#5982 ) * [HUDI-4325] fix saprk sql procedure cause ParseException with semicolon	2022-06-28 09:44:41 +08:00
superche	b14ed47f21	[HUDI-3506] Add call procedure for CommitsCommand (#5974 ) * [HUDI-3506] Add call procedure for CommitsCommand Co-authored-by: superche <superche@tencent.com>	2022-06-28 09:43:36 +08:00
ForwardXu	26c967bac6	[HUDI-3504] Support bootstrap command based on Call Produce Command (#5977 )	2022-06-27 13:06:50 +08:00
ForwardXu	1c43c590ac	[HUDI-3502] Support hdfs parquet import command based on Call Produce Command (#5956 )	2022-06-26 11:27:14 +08:00
jiz	eeafaeacd2	[HUDI-3512] Add call procedure for StatsCommand (#5955 ) Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com>	2022-06-25 09:43:23 +08:00
jiz	af9f09047d	[HUDI-3509] Add call procedure for HoodieLogFileCommand (#5949 ) Co-authored-by: zhanshaoxiong <jiimmyzhan@tencent.com>	2022-06-24 10:16:54 +08:00
jiz	1bb017d396	[HUDI-3508] Add call procedure for FileSystemViewCommand (#5929 ) * [HUDI-3508] Add call procedure for FileSystemView * minor Co-authored-by: jiimmyzhan <jiimmyzhan@tencent.com>	2022-06-22 17:50:20 +08:00
ForwardXu	c5c4cfec91	[HUDI-3507] Support export command based on Call Produce Command (#5901 )	2022-06-19 18:48:22 +08:00
huberylee	fec49dc12b	[HUDI-4165] Support Create/Drop/Show/Refresh Index Syntax for Spark SQL (#5761 ) * Support Create/Drop/Show/Refresh Index Syntax for Spark SQL	2022-06-17 18:33:58 +08:00
KnightChess	0ff34b6974	[HUDI-4214] improve repeat init write schema in ExpressionPayload (#5820 ) * [HUDI-4214] improve repeat init write schema in ExpressionPayload	2022-06-16 17:58:37 +08:00
KnightChess	2bf0a1906d	[HUDI-4217] improve repeat init object in ExpressionPayload (#5825 )	2022-06-15 20:21:28 +08:00
superche	7b946cf351	[HUDI-3499] Add Call Procedure for show rollbacks (#5848 ) * Add Call Procedure for show rollbacks * fix * add ut for show_rollback_detail and exception handle Co-authored-by: superche <superche@tencent.com>	2022-06-15 16:50:15 +08:00
Alexey Kudinkin	35afdb4316	[HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration (#5737 ) There are multiple issues with our current DataSource V2 integrations: b/c we advertise Hudi tables as V2, Spark expects it to implement certain APIs which are not implemented at the moment, instead we're using custom Resolution rule (in HoodieSpark3Analysis) to instead manually fallback to V1 APIs. This commit fixes the issue by reverting DSv2 APIs and making Spark use V1, except for schema evaluation logic.	2022-06-07 16:30:46 -07:00
Saisai Shao	bd26d633d7	[HUDI-4168] Add Call Procedure for marker deletion (#5738 ) * Add Call Procedure for marker deletion	2022-06-05 11:05:38 +08:00
ForwardXu	8fa8f26031	[MINOR] Fix Hive and meta sync config for sql statement (#5316 )	2022-05-28 07:56:39 -07:00
huberylee	85b146d3d5	[HUDI-3985] Refactor DLASyncTool to support read hoodie table as spark datasource table (#5532 )	2022-05-20 22:25:32 +08:00
huberylee	6573469e73	[HUDI-4116] Unify clustering/compaction related procedures' output type (#5620 ) * Unify clustering/compaction related procedures' output type * Address review comments	2022-05-19 09:48:03 +08:00
董可伦	75f847691f	[HUDI-4001] Filter the properties should not be used when create table for Spark SQL (#5495 )	2022-05-16 09:50:29 +08:00
KnightChess	6ec039ba42	[MINOR] Update alter rename command class type for pattern matching (#5381 )	2022-04-26 19:39:51 -07:00
ForwardXu	44b3630b5d	[HUDI-3826] Make truncate partition use delete_partition operation (#5272 ) Make truncate partition and drop partition behave as drop partition with purge, which delete all records via Hudi DELETE_PARTITION; partition removed from metastore	2022-04-14 00:53:05 -07:00
ForwardXu	3449e86989	[HUDI-3780] improve drop partitions (#5178 )	2022-04-05 11:52:33 +08:00
xiarixiaoyao	444ff496a4	[RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark (#4910 ) * [HUDI-2560] introduce id_based schema to support full schema evolution. * add test for FileBasedInternalSchemaStorageManger and rebase code * add support for change column type and fix some test case * fix some bugs encountered in the production env and delete useless code * fix test error * rebase code * fixed some nested schema change bugs * [HUDI-2429][Stacked On HUDI-2560]Support full schema evolution for spark * [use dummyInternalSchema instead of null] * add support for spark3.1.x * remove support for spark3.1.x , sicne some compile fail * support spark3.1.x * rebase and prepare solve all comments * address all comments * rebase code * fixed the count() bug try to get internalSchema by parser commit file/history file directly, not use metaclient which is time cost address some comments * fixed all comments * fix new comments * rebase code,fix UT failed * fixed mistake * rebase code ,fixed new comments * rebase code , and prepare for address new comments * address commits * address new comments * fix new issues * control fallback original write logical	2022-04-01 13:20:24 -07:00
Alexey Kudinkin	e5a2baeed0	[HUDI-3549] Removing dependency on "spark-avro" (#4955 ) Hudi will be taking on promise for it bundles to stay compatible with Spark minor versions (for ex 2.4, 3.1, 3.2): meaning that single build of Hudi (for ex "hudi-spark3.2-bundle") will be compatible with ALL patch versions in that minor branch (in that case 3.2.1, 3.2.0, etc) To achieve that we'll have to remove (and ban) "spark-avro" as a dependency, which on a few occasions was the root-cause of incompatibility b/w consecutive Spark patch versions (most recently 3.2.1 and 3.2.0, due to this PR). Instead of bundling "spark-avro" as dependency, we will be copying over some of the classes Hudi depends on and maintain them along the Hudi code-base to make sure we're able to provide for the aforementioned guarantee. To workaround arising compatibility issues we will be applying local patches to guarantee compatibility of Hudi bundles w/in the Spark minor version branches. Following Hudi modules to Spark minor branches is currently maintained: "hudi-spark3" -> 3.2.x "hudi-spark3.1.x" -> 3.1.x "hudi-spark2" -> 2.4.x Following classes hierarchies (borrowed from "spark-avro") are maintained w/in these Spark-specific modules to guarantee compatibility with respective minor version branches: AvroSerializer AvroDeserializer AvroUtils Each of these classes has been correspondingly copied from Spark 3.2.1 (for 3.2.x branch), 3.1.2 (for 3.1.x branch), 2.4.4 (for 2.4.x branch) into their respective modules. SchemaConverters class in turn is shared across all those modules given its relative stability (there're only cosmetical changes from 2.4.4 to 3.2.1). All of the aforementioned classes have their corresponding scope of visibility limited to corresponding packages (org.apache.spark.sql.avro, org.apache.spark.sql) to make sure broader code-base does not become dependent on them and instead relies on facades abstracting them. Additionally, given that Hudi plans on supporting all the patch versions of Spark w/in aforementioned minor versions branches of Spark, additional build steps were added to validate that Hudi could be properly compiled against those versions. Testing, however, is performed against the most recent patch versions of Spark with the help of Azure CI. Brief change log: - Removing spark-avro bundling from Hudi by default - Scaffolded Spark 3.2.x hierarchy - Bootstrapped Spark 3.1.x Avro serializer/deserializer hierarchy - Bootstrapped Spark 2.4.x Avro serializer/deserializer hierarchy - Moved ExpressionCodeGen,ExpressionPayload into hudi-spark module - Fixed AvroDeserializer to stay compatible w/ both Spark 3.2.1 and 3.2.0 - Modified bot.yml to build full matrix of support Spark versions - Removed "spark-avro" dependency from all modules - Fixed relocation of spark-avro classes in bundles to assist in running integ-tests.	2022-03-29 14:44:47 -04:00
leesf	8f8a8158e2	[HUDI-2520] Fix drop table issue when sync to Hive (#5143 )	2022-03-28 19:34:12 -07:00
huberylee	1d0f4ccfe0	[HUDI-3538] Support Compaction Command Based on Call Procedure Command for Spark SQL (#4945 ) * Support Compaction Command Based on Call Procedure Command for Spark SQL * Addressed review comments	2022-03-28 14:11:35 +08:00
ForwardXu	12cc8e715b	[MINOR] fix QuickstartUtils move (#5133 )	2022-03-25 07:34:35 -07:00
ForwardXu	e5c3f9089b	[HUDI-3563] Make quickstart examples covered by CI tests (#5082 )	2022-03-25 01:37:17 -07:00
Alexey Kudinkin	8b38ddedc2	[HUDI-3594] Supporting Composite Expressions over Data Table Columns in Data Skipping flow (#4996 )	2022-03-24 22:27:15 -07:00
Rajesh Mahindra	5f570ea151	[HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs (#4175 ) - Refactor hive sync tool / config to use reflection and standardize configs Co-authored-by: sivabalan <n.siva.b@gmail.com> Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-03-21 22:56:31 -04:00
peanut-chenzhong	4b75cb6f23	fix NPE when run schdule using spark-sql if the commits time < hoodie.compact.inline.max.delta.commits (#4976 ) * Update CompactionHoodiePathCommand.scala fix NPE when run schdule using spark-sql if the commits time < hoodie.compact.inline.max.delta.commits * Update CompactionHoodiePathCommand.scala fix IndexOutOfBoundsException when there`s no schedule for compaction * Update CompactionHoodiePathCommand.scala fix CI issue	2022-03-14 16:40:38 +08:00
ForwardXu	6c8224cae6	[HUDI-3501] Support savepoints command based on Call Produce Command (#5025 )	2022-03-13 16:58:21 +04:00
huberylee	56cb49485d	[HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable (#4982 )	2022-03-11 13:23:19 -08:00
ForwardXu	08fd80c913	[HUDI-3221] Support querying a table as of a savepoint (#4720 )	2022-03-08 10:02:34 -08:00
shibei	62f534d002	[HUDI-3445] Support Clustering Command Based on Call Procedure Command for Spark SQL (#4901 ) * [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL * [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL * [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL Co-authored-by: shibei <huberylee.li@alibaba-inc.com>	2022-03-04 09:33:16 +08:00
ForwardXu	aa1810d737	[HUDI-3493] Not table to get execution plan (#4894 )	2022-02-24 17:04:44 -08:00
Alexey Kudinkin	85e8a5c4de	[HUDI-1296] Support Metadata Table in Spark Datasource (#4789 ) * Bootstrapping initial support for Metadata Table in Spark Datasource - Consolidated Avro/Row conversion utilities to center around Spark's AvroDeserializer ; removed duplication - Bootstrapped HoodieBaseRelation - Updated HoodieMergeOnReadRDD to be able to handle Metadata Table - Modified MOR relations to be able to read different Base File formats (Parquet, HFile)	2022-02-24 16:23:13 -05:00

1 2 3 4 5

215 Commits