lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Sivabalan Narayanan	4f6fc726d0	[HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys (#5664 ) Bulk insert row writer code path had a gap wrt hive style partitioning and default partition when virtual keys are enabled with SimpleKeyGen. This patch fixes the issue.	2022-06-06 10:21:00 -07:00
Alexey Kudinkin	4f7ea8c79a	[HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing (#5733 ) As has been outlined in HUDI-4176, we've hit a roadblock while testing Hudi on a large dataset (~1Tb) having pretty fat commits where Hudi's commit metadata could reach into 100s of Mbs. Given the size some of ours commit metadata instances Spark's parsing and resolving phase (when spark.sql(...) is involved, but before returned Dataset is dereferenced) starts to dominate some of our queries' execution time. - Rebased onto new APIs to avoid excessive Hadoop's Path allocations - Eliminated hasOperationField completely to avoid repeatitive computations - Cleaning up duplication in HoodieActiveTimeline - Added caching for common instances of HoodieCommitMetadata - Made tableStructSchema lazy;	2022-06-06 13:14:26 -04:00
Sagar Sumit	21ab0ff8be	[HUDI-4195] Bulk insert should use right keygen for non-partitioned table (#5759 )	2022-06-06 07:19:03 -04:00
Saisai Shao	bd26d633d7	[HUDI-4168] Add Call Procedure for marker deletion (#5738 ) * Add Call Procedure for marker deletion	2022-06-05 11:05:38 +08:00
leesf	3759a38b99	[HUDI-4183] Fix using HoodieCatalog to create non-hudi tables (#5743 )	2022-06-03 17:16:48 +08:00
Jin Xing	918c4f4e0b	[HUDI-4149] Drop-Table fails when underlying table directory is broken (#5672 )	2022-05-30 19:09:26 +08:00
ForwardXu	8fa8f26031	[MINOR] Fix Hive and meta sync config for sql statement (#5316 )	2022-05-28 07:56:39 -07:00
RexAn	554caa3421	[MINOR] Fix the issue when handling conf hoodie.datasource.write.operation=bulk_insert in sql mode (#5679 ) Co-authored-by: Rex An <bonean131@gmail.com>	2022-05-27 04:45:09 -07:00
Alexey Kudinkin	1767ff5e7c	[HUDI-4161] Make sure partition values are taken from partition path (#5699 )	2022-05-27 02:36:30 -07:00
watermelon12138	57dbe57bed	[HUDI-4162] Fixed some constant mapping issues. (#5700 ) Co-authored-by: y00617041 <yangxuan42@huawei.com>	2022-05-27 14:08:54 +08:00
komao	8d2f009048	[HUDI-4124] Add valid check in Spark Datasource configs (#5637 ) Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com>	2022-05-26 05:21:28 -07:00
liujinhui	0caa55ecb4	[HUDI-4135] remove netty and netty-all (#5663 )	2022-05-24 03:56:28 -07:00
felixYyu	716e995a38	[MINOR] Removing redundant semicolons and line breaks (#5662 )	2022-05-23 15:26:36 -07:00
Y Ethan Guo	752f956f03	[HUDI-3933] Add UT cases to cover different key gen (#5638 )	2022-05-23 06:48:09 -07:00
Raymond Xu	271d1a79c0	[HUDI-4051] Allow nested field as primary key and preCombineField in spark sql (#5517 ) * [HUDI-4051] Allow nested field as preCombineField in spark sql * relax validation for primary key	2022-05-22 00:47:51 -07:00
uday08bce	32a5d268f5	[HUDI-3890] fix rat plugin issue with sql files (#5644 )	2022-05-21 12:22:55 -04:00
Jin Xing	922f765ead	[HUDI-4100] CTAS failed to clean up when given an illegal MANAGED table definition (#5588 )	2022-05-21 22:41:18 +08:00
huberylee	85b146d3d5	[HUDI-3985] Refactor DLASyncTool to support read hoodie table as spark datasource table (#5532 )	2022-05-20 22:25:32 +08:00
huberylee	6573469e73	[HUDI-4116] Unify clustering/compaction related procedures' output type (#5620 ) * Unify clustering/compaction related procedures' output type * Address review comments	2022-05-19 09:48:03 +08:00
Jin Xing	d422f69a0d	[HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand (#5564 ) * [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand * Set hoodie.query.as.ro.table in serde properties	2022-05-17 14:12:50 +08:00
董可伦	a7a42e4490	[HUDI-4103] [HUDI-4001] Filter the properties should not be used when create table for Spark SQL	2022-05-16 23:26:23 +08:00
Yuwei XIAO	61030d8e7a	[HUDI-3123] consistent hashing index: basic write path (upsert/insert) (#4480 ) 1. basic write path(insert/upsert) implementation 2. adapt simple bucket index	2022-05-16 11:07:01 +08:00
董可伦	75f847691f	[HUDI-4001] Filter the properties should not be used when create table for Spark SQL (#5495 )	2022-05-16 09:50:29 +08:00
Sivabalan Narayanan	0cec955fa2	[HUDI-4018][HUDI-4027] Adding integ test yamls for immutable use-cases. Added delete partition support to integ tests (#5501 ) - Added pure immutable test yamls to integ test framework. Added SparkBulkInsertNode as part of it. - Added delete_partition support to integ test framework using spark-datasource. - Added a single yaml to test all non core write operations (insert overwrite, insert overwrite table and delete partitions) - Added tests for 4 concurrent spark datasource writers (multi-writer tests). - Fixed readme w/ sample commands for multi-writer.	2022-05-12 21:01:55 -04:00
Jin Xing	7f0c1f3ddf	[HUDI-4079] Supports showing table comment for hudi with spark3 (#5546 )	2022-05-11 22:28:58 +08:00
Sivabalan Narayanan	6285a239a3	[HUDI-3995] Making perf optimizations for bulk insert row writer path (#5462 ) - Avoid using udf for key generator for SimpleKeyGen and NonPartitionedKeyGen. - Fixed NonPartitioned Key generator to directly fetch record key from row rather than involving GenericRecord. - Other minor fixes around using static values instead of looking up hashmap.	2022-05-09 12:40:22 -04:00
cxzl25	9625d16937	[HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ configuration (#5287 )	2022-05-07 15:39:14 +08:00
Raymond Xu	c319ee9cea	[HUDI-4017] Improve spark sql coverage in CI (#5512 ) Add GitHub actions tasks to run spark sql UTs under spark 3.1 and 3.2.	2022-05-06 05:52:06 -07:00
Jin Xing	248b0591b0	[HUDI-4042] Support truncate-partition for Spark-3.2 (#5506 )	2022-05-06 00:29:47 -07:00
KnightChess	6ec039ba42	[MINOR] Update alter rename command class type for pattern matching (#5381 )	2022-04-26 19:39:51 -07:00
Sivabalan Narayanan	762623a15c	[HUDI-3972] Fixing hoodie.properties/tableConfig for no preCombine field with writes (#5424 ) Fixed instantiation of new table to set the null for preCombine if not explicitly set by the user.	2022-04-25 23:03:10 -04:00
miomiocat	5e5c177e4b	[HUDI-3923] Fix cast exception while reading boolean type of partitioned field (#5373 )	2022-04-23 20:12:54 +08:00
Sivabalan Narayanan	7523542c1d	[HUDI-3947] Fixing Hive conf usage in HoodieSparkSqlWriter (#5401 )	2022-04-22 22:20:05 -04:00
Alexey Kudinkin	c05a4e7b6f	[HUDI-3934] Fix `Spark32HoodieParquetFileFormat` not being compatible w/ Spark 3.2.0 (#5378 ) - Due to the fact that Spark 3.2.1 is non-BWC w/ 3.2.0, we have to handle all these incompatibilities in Spark32HoodieParquetFileFormat. This PR is addressing that. Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-04-21 21:00:38 -04:00
Y Ethan Guo	c4bc2deea0	[HUDI-3936] Fix projection for a nested field as pre-combined key (#5379 ) This PR fixes the projection logic around a nested field which is used as the pre-combined key field. The fix is to only check and append the root level field for projection, i.e., "a", for a nested field "a.b.c" in the mandatory columns. - Changes the logic to check and append the root level field for a required nested field in the mandatory columns in HoodieBaseRelation.appendMandatoryColumns	2022-04-21 20:17:57 -04:00
xiarixiaoyao	037f89ee7c	[HUDI-3921] Fixed schema evolution cannot work with HUDI-3855 (#5376 ) - when columns names are renamed (schema evolution enabled), while copying records from old data file with HoodieMergeHande, renamed columns wasn't handled well.	2022-04-21 18:27:54 -04:00
Alexey Kudinkin	4b296f79cc	[HUDI-3935] Adding config to fallback to enabled Partition Values extraction from Partition path (#5377 )	2022-04-21 01:36:19 -07:00
Alexey Kudinkin	f7544e23ac	[HUDI-3204] Fixing partition-values being derived from partition-path instead of source columns (#5364 ) - Scaffolded `Spark24HoodieParquetFileFormat` extending `ParquetFileFormat` and overriding the behavior of adding partition columns to every row - Amended `SparkAdapter`s `createHoodieParquetFileFormat` API to be able to configure whether to append partition values or not - Fallback to append partition values in cases when the source columns are not persisted in data-file - Fixing HoodieBaseRelation incorrectly handling mandatory columns	2022-04-20 19:30:27 +08:00
Alexey Kudinkin	81bf771e56	[HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution (#5352 ) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-04-19 10:40:20 -07:00
Alexey Kudinkin	7ecb47cd21	[HUDI-3895] Fixing file-partitioning seq for base-file only views to make sure we bucket the files efficiently (#5337 )	2022-04-18 16:06:52 -04:00
董可伦	b8e465fdfc	[MINOR] Fix typos in log4j-surefire.properties (#5212 )	2022-04-15 13:33:37 -07:00
Sivabalan Narayanan	e8ab915aff	[MINOR] Removing invalid code to close parquet reader iterator (#5182 )	2022-04-15 14:50:07 -04:00
ForwardXu	6621f3cdbb	[HUDI-3845] Fix delete mor table's partition with urlencode's error (#5282 )	2022-04-14 01:49:00 -07:00
ForwardXu	44b3630b5d	[HUDI-3826] Make truncate partition use delete_partition operation (#5272 ) Make truncate partition and drop partition behave as drop partition with purge, which delete all records via Hudi DELETE_PARTITION; partition removed from metastore	2022-04-14 00:53:05 -07:00
Alexey Kudinkin	434e782b7d	[HUDI-3867] Disable Data Skipping by default (#5306 )	2022-04-13 11:21:12 +05:30
Alexey Kudinkin	458fdd5611	[HUDI-3841] Fixing Column Stats in the presence of Schema Evolution (#5275 ) Currently, Data Skipping is not handling correctly the case when column-stats are not aligned and, for ex, some of the (column, file) combinations are missing from the CSI. This could occur in different scenarios (schema evolution, CSI config changes), and has to be handled properly when we're composing CSI projection for Data Skipping. This PR addresses that. - Added appropriate aligning for the transposed CSI projection	2022-04-11 15:45:53 -04:00
Alexey Kudinkin	976840e8eb	[HUDI-3812] Fixing Data Skipping configuration to respect Metadata Table configs (#5244 ) Addressing the problem of Data Skipping not respecting Metadata Table configs which might differ b/w write/read paths. More details could be found in HUDI-3812. - Fixing Data Skipping configuration to respect MT configs (on the Read path) - Tightening up DS handling of cases when no top-level columns are in the target query - Enhancing tests to cover all possible case	2022-04-10 13:43:47 -04:00
Raymond Xu	5e65aefc61	[HUDI-3837] Fix license and rat check settings (#5273 ) - add missing licenses - fix CI setting to run rat plugin - fix deploy script to include integ test modules	2022-04-09 11:01:18 -07:00
Alexey Kudinkin	81b25c543a	[HUDI-3825] Fixing Column Stats Index updating sequence (#5267 )	2022-04-08 23:14:08 -07:00
KnightChess	7a6272fba1	[HUDI-3781] fix spark delete sql can not delete record (#5215 )	2022-04-08 14:26:40 +08:00

1 2 3 4 5 ...

375 Commits