lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
v-zhangjc9	215a794fd3	Add victoria metrics reporter	2024-05-24 15:17:37 +08:00
jcxiaozf	eb4b741c38	If there are multiple files under the same partition path and file ID, sort them according to the modification time of the files to avoid reading the files that failed to write before.	2024-05-24 15:17:37 +08:00
v-zhangjc9	5c4908f006	Add closed handler to HoodieFlinkCompactor	2024-05-24 15:16:38 +08:00
v-zhangjc9	0ac43017cb	Fix NPE when offline compaction could not find schema from data file	2024-05-24 15:16:38 +08:00
v-zhangjc9	32f7e323dc	Change version to private	2024-05-24 15:16:38 +08:00
v-zhangjc9	8462d79ead	Change hadoop version to 3.1.2	2022-08-02 16:06:39 +08:00
v-zhangjc9	46ce96096d	Add private repo	2022-08-02 16:06:39 +08:00
Sivabalan Narayanan	765dd2eae6	[HUDI-4221] Optimzing getAllPartitionPaths (#6234 ) - Levering spark par for dir processing	2022-07-29 03:49:56 -04:00
Danny Chan	ce4330d62b	[HUDI-4499] Tweak default retry times for flink metadata table lock (#6238 )	2022-07-29 15:01:29 +08:00
Udit Mehrotra	c39e88dcf0	[HUDI-4495] Fix handling of S3 paths incompatible with java URI standards (#6237 )	2022-07-28 20:04:14 -07:00
Alexey Kudinkin	cfd0c1ee34	[HUDI-4081][HUDI-4472] Addressing Spark SQL vs Spark DS performance gap (#6213 )	2022-07-28 15:36:03 -07:00
Shawn Chang	70b5cf6dab	[MINOR] Minor changes around Spark 3.3 support (#6231 ) Co-authored-by: Shawn Chang <yxchang@amazon.com>	2022-07-28 09:32:34 -07:00
Yann Byron	ea1fbc71ec	[HUDI-4494] keep the fields' order when data is written out of order (#6233 )	2022-07-28 22:15:01 +08:00
Danny Chan	07eedd3ef6	[HUDI-4484] Add default lock config options for flink metadata table (#6222 )	2022-07-28 20:57:13 +08:00
Rahil C	0a5ce000bf	[HUDI-4490] Make AWSDmsAvroPayload class backwards compatible (#6229 ) Co-authored-by: Rahil Chertara <rchertar@amazon.com>	2022-07-27 21:55:06 -05:00
Rahil C	51599af281	[HUDI-4126] Disable file splits for Bootstrap real time queries (via InputFormat) (#6219 ) Co-authored-by: Udit Mehrotra <uditme@amazon.com> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-07-27 16:58:29 -05:00
Shawn Chang	cdaec5a8da	[HUDI-4186] Support Hudi with Spark 3.3.0 (#5943 ) Co-authored-by: Shawn Chang <yxchang@amazon.com>	2022-07-27 14:47:49 -07:00
Y Ethan Guo	924c30c7ea	[HUDI-4469] Flip reuse flag to true in HoodieBackedTableMetadata to improve file listing (#6214 )	2022-07-27 14:04:59 -07:00
Shiyan Xu	717f159bfd	[HUDI-3730] Keep metasync configs backward compatible (#6221 )	2022-07-27 16:00:44 +05:30
冯健	e5faf2cc84	[HUDI-4210] Create custom hbase index to solve data skew issue on hbase regions (#5797 )	2022-07-26 18:09:17 +08:00
Shiyan Xu	1ea1e659c2	[HUDI-4474] Infer metasync configs (#6217 ) - infer repeated sync configs from original configs - `META_SYNC_BASE_FILE_FORMAT` - infer from `org.apache.hudi.common.table.HoodieTableConfig.BASE_FILE_FORMAT` - `META_SYNC_ASSUME_DATE_PARTITION` - infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ASSUME_DATE_PARTITIONING` - `META_SYNC_DECODE_PARTITION` - infer from `org.apache.hudi.common.table.HoodieTableConfig.URL_ENCODE_PARTITIONING` - `META_SYNC_USE_FILE_LISTING_FROM_METADATA` - infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ENABLE` As proposed in https://github.com/apache/hudi/blob/master/rfc/rfc-55/rfc-55.md#compatible-changes	2022-07-26 15:28:31 +05:30
Dongwook Kwon	74d7b4d751	[HUDI-4471] Relocate AWSDmsAvroPayload class to hudi-common	2022-07-25 17:51:27 -07:00
Alexey Kudinkin	e7c8df7e8b	[HUDI-4250][HUDI-4202] Optimize performance of Column Stats Index reading in Data Skipping (#5746 ) We provide an alternative way of fetching Column Stats Index within the reading process to avoid the penalty of a more heavy-weight execution scheduled through a Spark engine.	2022-07-25 15:36:12 -07:00
Sagar Sumit	6e7ac45735	[HUDI-3884] Support archival beyond savepoint commits (#5837 ) Co-authored-by: sivabalan <n.siva.b@gmail.com>	2022-07-25 13:42:29 -05:00
Shiyan Xu	eee6a02f77	[HUDI-4456] Clean up test resources (#6203 )	2022-07-25 10:13:06 -05:00
Shiyan Xu	71c2c3102b	[HUDI-4455] Improve test classes for TestHiveSyncTool (#6202 ) Improve HiveTestService, HiveTestUtil, and related classes.	2022-07-25 19:05:34 +05:30
superche	1fda9ee9bb	[HUDI-4071] Match ROLLBACK_USING_MARKERS_ENABLE in sql as datasource (#6206 ) Co-authored-by: superche <superche@tencent.com>	2022-07-25 18:40:23 +08:00
Danny Chan	b513232449	[HUDI-4458] Add a converter cache for flink ColumnStatsIndices (#6205 )	2022-07-25 17:49:01 +08:00
Y Ethan Guo	f6e7227ed5	[MINOR] Only log stdout output for non-zero exit from commands in IT (#6199 )	2022-07-24 22:08:33 -07:00
Tim Brown	76a28daeb0	[HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness (#6201 )	2022-07-24 21:42:15 -07:00
Vander	2a08a65f71	[MINOR] Fix typos in Spark client related classes (#6204 )	2022-07-24 21:41:42 -07:00
simonsssu	1a910fd473	[HUDI-3510] Add sync validate procedure (#6200 ) * [HUDI-3510] Add sync validate procedure Co-authored-by: simonssu <simonssu@tencent.com>	2022-07-25 09:28:46 +08:00
KnightChess	a54c963543	[HUDI-4348] fix merge into sql data quality in concurrent scene (#6020 )	2022-07-24 06:29:47 -07:00
Rahil C	1a5a9f7f03	[HUDI-4439] Fix Amazon CloudWatch reporter for metadata enabled tables (#6164 ) Co-authored-by: Udit Mehrotra <uditme@amazon.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>	2022-07-23 21:08:21 -07:00
Danny Chan	ba11082282	[HUDI-4450] Revert the checkpoint abort notification (#6181 )	2022-07-24 08:44:22 +08:00
Danny Chan	a0ffd05b77	[HUDI-4448] Remove the latest commit refresh for timeline server (#6179 )	2022-07-23 16:10:53 -07:00
Alexey Kudinkin	2d745057ea	[HUDI-4420] Fixing table schema delineation on partition/data schema for Spark relations (#5708 )	2022-07-23 16:59:16 -05:00
Sagar Sumit	da28e38fe3	[HUDI-4071] Make NONE sort mode as default for bulk insert (#6195 )	2022-07-23 14:37:04 -05:00
Rahil C	f1f0109ab8	[HUDI-4440] Treat boostrapped table as non-partitioned in HudiFileIndex if partition column is missing from schema (#6163 ) Co-authored-by: Ryan Pifer <rmpifer@umich.edu>	2022-07-23 11:44:40 -07:00
Shiyan Xu	f0e843249c	[MINOR] Bump CI timeout to 150m (#6198 )	2022-07-23 10:07:51 -05:00
superche	859157ec01	[MINOR] Fix Call Procedure code style (#6186 ) * Fix Call Procedure code style. Co-authored-by: superche <superche@tencent.com>	2022-07-23 17:18:38 +08:00
Rahil C	a5348cc685	[HUDI-4436] Invalidate cached table in Spark after write (#6159 ) Co-authored-by: Ryan Pifer <rmpifer@umich.edu>	2022-07-22 22:47:47 -07:00
冯健	340c3dbbe1	[HUDI-4437] Fix test conflicts by clearing file system cache (#6123 ) Co-authored-by: jian.feng <fengjian428@gmial.com> Co-authored-by: jian.feng <jian.feng@shopee.com> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-07-22 17:58:04 -07:00
Rahil C	af10a97e7a	[HUDI-4435] Fix Avro field not found issue introduced by Avro 1.10 (#6155 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2022-07-22 17:26:16 -07:00
Shiyan Xu	d5c7c79d87	Revert "[HUDI-4324] Remove use_jdbc config from hudi sync (#6072 )" (#6160 ) This reverts commit `046044c83d`.	2022-07-22 17:18:45 -07:00
Sagar Sumit	a36762a862	[HUDI-4303] Use Hive sentinel value as partition default to avoid type caste issues (#5954 )	2022-07-22 17:14:36 -07:00
Alexey Kudinkin	39f2a06c85	[HUDI-3979] Optimize out mandatory columns when no merging is performed (#5430 ) For MOR, when no merging is performed there is no point in reading either primary-key or pre-combine-key values (unless query is referencing these). Avoiding reading these allows to potentially save substantial resources wasted for reading it out.	2022-07-22 15:32:44 -07:00
Shiyan Xu	6b84384022	Revert "[MINOR] Fix CI issue with TestHiveSyncTool (#6110 )" (#6192 ) This reverts commit `d5c904e10e`.	2022-07-22 12:20:39 -07:00
Sagar Sumit	716dd3512b	[MINOR] Disable Flink compactor IT test (#6189 )	2022-07-22 10:16:55 -07:00
Alexey Kudinkin	eea4a692c0	[HUDI-4039] Make sure all builtin `KeyGenerator`s properly implement Spark specific APIs (#5523 ) This set of changes makes sure that all builtin KeyGenerators properly implement Spark-specific APIs in a performant way (minimizing key-generators overhead)	2022-07-22 08:35:07 -07:00

1 2 3 4 5 ...

3123 Commits