lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
RexAn	17ac5a4573	[HUDI-4173] Fix wrong results if the user read no base files hudi table by glob paths (#5723 )	2022-06-20 23:02:34 +05:30
Y Ethan Guo	7601e9e4c7	[MINOR] Update DOAP with 0.11.1 Release (#5908 )	2022-06-20 09:27:35 -07:00
Alexander Trushev	f1103281d2	[HUDI-4258] Fix when HoodieTable removes data file before the end of Flink job (#5876 ) * [HUDI-4258] Fix when HoodieTable removes data file before the end of Flink job	2022-06-20 17:07:49 +08:00
luokey	7c6bedff25	[HUDI-4259] Flink create avro schema not conformance to standards (#5878 ) * flink create avro schema not conformance to standards Co-authored-by: 854194341@qq.com <loukey_7821>	2022-06-20 15:41:23 +08:00
felixYyu	d7facb8cb8	fix remove redundant Variable (#5806 )	2022-06-20 15:21:49 +08:00
Shizhi Chen	7481eacf23	[HUDI-4277] supoort flink table source with computed column (#5897 ) Co-authored-by: chenshizhi <chenshizhi@bilibili.com>	2022-06-20 15:19:32 +08:00
5herhom	efafb79eeb	[MINOR] Add "spillable_map_path" in FlinkCompactionConfig. To avoid the disk space of "/tmp" full when compacting offline. (#5905 )	2022-06-20 15:15:23 +08:00
huberylee	d4f0326b4b	[HUDI-4275] Refactor rollback inflight instant for clustering/compaction to reuse some code (#5894 )	2022-06-20 14:29:21 +08:00
ForwardXu	c5c4cfec91	[HUDI-3507] Support export command based on Call Produce Command (#5901 )	2022-06-19 18:48:22 +08:00
huberylee	fec49dc12b	[HUDI-4165] Support Create/Drop/Show/Refresh Index Syntax for Spark SQL (#5761 ) * Support Create/Drop/Show/Refresh Index Syntax for Spark SQL	2022-06-17 18:33:58 +08:00
董可伦	7689e62cd9	[HUDI-4265] Deprecate useless targetTableName parameter in HoodieMultiTableDeltaStreamer (#5883 )	2022-06-17 16:57:14 +08:00
KnightChess	0ff34b6974	[HUDI-4214] improve repeat init write schema in ExpressionPayload (#5820 ) * [HUDI-4214] improve repeat init write schema in ExpressionPayload	2022-06-16 17:58:37 +08:00
KnightChess	2bf0a1906d	[HUDI-4217] improve repeat init object in ExpressionPayload (#5825 )	2022-06-15 20:21:28 +08:00
董可伦	c291b05699	[HUDI-4218] [HUDI-4218] Expose the real exception information when an exception occurs in the tableExists method (#5827 )	2022-06-15 18:10:35 +08:00
superche	7b946cf351	[HUDI-3499] Add Call Procedure for show rollbacks (#5848 ) * Add Call Procedure for show rollbacks * fix * add ut for show_rollback_detail and exception handle Co-authored-by: superche <superche@tencent.com>	2022-06-15 16:50:15 +08:00
Danny Chan	0811bb38fb	[HUDI-4255] Make the flink merge and replace handle intermediate file visible (#5866 )	2022-06-15 14:23:23 +08:00
Danny Chan	25bbff64cf	[minor] Following HUDI-4207, remote the new wrapper #init method (#5865 )	2022-06-15 08:48:13 +08:00
felixYyu	f16b1e8982	[MINOR] Fix typo of DisruptorExecutor in RFC 53 (#5860 )	2022-06-13 23:30:17 -07:00
HunterXHunter	264b15df87	[HUDI-4207] HoodieFlinkWriteClient.getOrCreateWriteHandle throws an e… (#5788 ) Adding more logs to assist in debugging with HoodieFlinkWriteClient.getOrCreateWriteHandle throwing exception	2022-06-13 10:36:06 -04:00
Qi Ji	4774c4248f	[HUDI-4006] failOnDataLoss on delta-streamer kafka sources (#5718 ) add new config key hoodie.deltastreamer.source.kafka.enable.failOnDataLoss when failOnDataLoss=false (current behaviour, the default), log a warning instead of seeking to earliest silently when failOnDataLoss is set, fail explicitly	2022-06-13 10:31:57 -04:00
luoyajun	0d859fe58b	[HUDI-3863] Add UT for drop partition column in deltastreamer testsuite (#5727 )	2022-06-13 10:29:32 -04:00
xi chaomin	e89f5627e4	[HUDI-3682] testReaderFilterRowKeys fails in TestHoodieOrcReaderWriter (#5790 ) TestReaderFilterRowKeys needs to get the key from RECORD_KEY_METADATA_FIELD, but the writer in current UT does not populate the meta field and the schema does not contains meta fields. This fix writes data with schema which contains meta fields and calls writeAvroWithMetadata for writing. Co-authored-by: xicm <xicm@asiainfo.com>	2022-06-13 10:22:12 -04:00
superche	14d8735a1c	Strip extra spaces when creating new configuration (#5849 ) Co-authored-by: superche <superche@tencent.com>	2022-06-13 19:10:38 +08:00
sandyfog	c82e3462e3	[MINOR] fix AvroSchemaConverter duplicate branch in 'switch' (#5813 )	2022-06-13 10:55:24 +08:00
Shiyan Xu	5aaac21d1d	[HUDI-4224] Fix CI issues (#5842 ) - Upgrade junit to 5.7.2 - Downgrade surefire and failsafe to 2.22.2 - Fix test failures that were previously not reported - Improve azure pipeline configs Co-authored-by: liujinhui1994 <965147871@qq.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>	2022-06-12 11:44:18 -07:00
Y Ethan Guo	fd8f7c5f6c	[HUDI-4205] Fix NullPointerException in HFile reader creation (#5841 ) Replace SerializableConfiguration with SerializableWritable for broadcasting the hadoop configuration before initializing HFile readers	2022-06-11 14:46:43 -07:00
Y Ethan Guo	97ccf5dd18	[HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table (#5840 ) When explicitly specifying the metadata table path for reading in spark, the "hoodie.metadata.enable" is overwritten to true for proper read behavior.	2022-06-11 13:19:24 -07:00
Sivabalan Narayanan	08fe281091	[HUDI-4221] Fixing getAllPartitionPaths perf hit w/ FileSystemBackedMetadata (#5829 )	2022-06-11 13:17:42 -07:00
xi chaomin	2b3a85528a	[HUDI-3889] Do not validate table config if save mode is set to Overwrite (#5619 ) Co-authored-by: xicm <xicm@asiainfo.com>	2022-06-09 19:23:51 -04:00
yanenze	ba47904fa2	[HUDI-4139]improvement for flink write operator name to identify tables easily (#5744 ) Co-authored-by: yanenze <yanenze@keytop.com.cn>	2022-06-09 17:48:20 -04:00
Danny Chan	c608dbd6c2	[HUDI-4213] Infer keygen clazz for Spark SQL (#5815 )	2022-06-09 20:37:58 +08:00
sandyfog	8ff17b0470	[MINOR] FlinkStateBackendConverter add more exception message (#5809 ) * [MINOR] FlinkStateBackendConverter add more exception message	2022-06-09 15:13:27 +08:00
liuzhuang2017	f5ab921300	[MINOR][DOCS] Update the README.md file in hudi-examples (#5803 )	2022-06-08 17:45:00 -07:00
Alexey Kudinkin	35afdb4316	[HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration (#5737 ) There are multiple issues with our current DataSource V2 integrations: b/c we advertise Hudi tables as V2, Spark expects it to implement certain APIs which are not implemented at the moment, instead we're using custom Resolution rule (in HoodieSpark3Analysis) to instead manually fallback to V1 APIs. This commit fixes the issue by reverting DSv2 APIs and making Spark use V1, except for schema evaluation logic.	2022-06-07 16:30:46 -07:00
Raymond Xu	1349b596a1	[HUDI-4198] Fix hive config for AWSGlueClientFactory (#5768 ) * HiveConf needs to load fs conf to allow instantiation via AWSGlueClientFactory * Resolve metastore uri config before loading fs conf * Skip hiveql due to CI issue Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>	2022-06-07 20:21:31 +05:30
Sivabalan Narayanan	f85cd9b16d	[HUDI-4200] Fixing sorting of keys fetched from metadata table (#5773 ) - Key fetched from metadata table especially from base file reader is not sorted. and hence may result in throwing NPE (key prefix search) or unnecessary seeks to starting of Hfile (full key look ups). Fixing the same in this patch. This is not an issue with log blocks, since sorting is taking care within HoodieHfileDataBlock. - Commit where the sorting was mistakenly reverted [HUDI-3760] Adding capability to fetch Metadata Records by prefix #5208	2022-06-07 08:19:52 -04:00
YueZhang	4f5cad8029	[MINOR][RFC-53] Fix typos (#5764 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2022-06-07 08:28:28 +08:00
Raymond Xu	e5710a8e7c	[MINOR] Mark AWSGlueCatalogSyncClient experimental (#5775 )	2022-06-07 08:25:59 +08:00
Sivabalan Narayanan	7da97c8096	[HUDI-4171] Fixing Non partitioned with virtual keys in read path (#5747 ) - When Non partitioned key gen is used with virtual keys, read path could break since partition path may not exist.	2022-06-06 15:48:21 -04:00
Sivabalan Narayanan	21b903fddb	[HUDI-4197] Fix Async indexer to support building FILES partition (#5766 ) - When async indexer is invoked only with "FILES" partition, it fails. Fixing it to work with Async indexer. Also, if metadata table itself is not initialized, and if someone is looking to build indexes via AsyncIndexer, first they are expected to index "FILES" partition followed by other partitions. In general, we have a limitation of building only one index at a time w/ AsyncIndexer and hence. Have added guards to ensure these conditions are met.	2022-06-06 15:47:11 -04:00
Sivabalan Narayanan	4f6fc726d0	[HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys (#5664 ) Bulk insert row writer code path had a gap wrt hive style partitioning and default partition when virtual keys are enabled with SimpleKeyGen. This patch fixes the issue.	2022-06-06 10:21:00 -07:00
Alexey Kudinkin	4f7ea8c79a	[HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing (#5733 ) As has been outlined in HUDI-4176, we've hit a roadblock while testing Hudi on a large dataset (~1Tb) having pretty fat commits where Hudi's commit metadata could reach into 100s of Mbs. Given the size some of ours commit metadata instances Spark's parsing and resolving phase (when spark.sql(...) is involved, but before returned Dataset is dereferenced) starts to dominate some of our queries' execution time. - Rebased onto new APIs to avoid excessive Hadoop's Path allocations - Eliminated hasOperationField completely to avoid repeatitive computations - Cleaning up duplication in HoodieActiveTimeline - Added caching for common instances of HoodieCommitMetadata - Made tableStructSchema lazy;	2022-06-06 13:14:26 -04:00
HunterXHunter	132c0aa8c7	[HUDI-4101] When BucketIndexPartitioner take partition path for dispersion may cause the fileID of the task to not be loaded correctly (#5763 ) Co-authored-by: john.wick <john.wick@vipshop.com>	2022-06-06 21:53:55 +08:00
Sagar Sumit	21ab0ff8be	[HUDI-4195] Bulk insert should use right keygen for non-partitioned table (#5759 )	2022-06-06 07:19:03 -04:00
Danny Chan	22c45a7704	[HUDI-4188] Fix flaky ITTestDataSTreamWrite.testWriteCopyOnWrite (#5749 )	2022-06-06 12:12:48 +08:00
marchpure	73b0be3c96	[HUDI-4192] HoodieHFileReader scan top cells after bottom cells throw NullPointerException (#5755 ) SeekTo top cells avoid NullPointerException	2022-06-06 12:07:26 +08:00
Y Ethan Guo	5d18b80343	[HUDI-4190] Include hbase-protocol for shading in the bundles (#5750 )	2022-06-05 17:42:16 -07:00
Saisai Shao	bd26d633d7	[HUDI-4168] Add Call Procedure for marker deletion (#5738 ) * Add Call Procedure for marker deletion	2022-06-05 11:05:38 +08:00
Nicolas Paris	80783c27f5	[HUDI-4187] Fix partition order in aws glue sync (#5731 )	2022-06-04 02:16:52 -07:00
leesf	3759a38b99	[HUDI-4183] Fix using HoodieCatalog to create non-hudi tables (#5743 )	2022-06-03 17:16:48 +08:00

1 2 3 4 5 ...

2965 Commits