lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Shawn Chang	70b5cf6dab	[MINOR] Minor changes around Spark 3.3 support (#6231 ) Co-authored-by: Shawn Chang <yxchang@amazon.com>	2022-07-28 09:32:34 -07:00
Yann Byron	ea1fbc71ec	[HUDI-4494] keep the fields' order when data is written out of order (#6233 )	2022-07-28 22:15:01 +08:00
Danny Chan	07eedd3ef6	[HUDI-4484] Add default lock config options for flink metadata table (#6222 )	2022-07-28 20:57:13 +08:00
Rahil C	0a5ce000bf	[HUDI-4490] Make AWSDmsAvroPayload class backwards compatible (#6229 ) Co-authored-by: Rahil Chertara <rchertar@amazon.com>	2022-07-27 21:55:06 -05:00
Rahil C	51599af281	[HUDI-4126] Disable file splits for Bootstrap real time queries (via InputFormat) (#6219 ) Co-authored-by: Udit Mehrotra <uditme@amazon.com> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-07-27 16:58:29 -05:00
Shawn Chang	cdaec5a8da	[HUDI-4186] Support Hudi with Spark 3.3.0 (#5943 ) Co-authored-by: Shawn Chang <yxchang@amazon.com>	2022-07-27 14:47:49 -07:00
Y Ethan Guo	924c30c7ea	[HUDI-4469] Flip reuse flag to true in HoodieBackedTableMetadata to improve file listing (#6214 )	2022-07-27 14:04:59 -07:00
Shiyan Xu	717f159bfd	[HUDI-3730] Keep metasync configs backward compatible (#6221 )	2022-07-27 16:00:44 +05:30
冯健	e5faf2cc84	[HUDI-4210] Create custom hbase index to solve data skew issue on hbase regions (#5797 )	2022-07-26 18:09:17 +08:00
Shiyan Xu	1ea1e659c2	[HUDI-4474] Infer metasync configs (#6217 ) - infer repeated sync configs from original configs - `META_SYNC_BASE_FILE_FORMAT` - infer from `org.apache.hudi.common.table.HoodieTableConfig.BASE_FILE_FORMAT` - `META_SYNC_ASSUME_DATE_PARTITION` - infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ASSUME_DATE_PARTITIONING` - `META_SYNC_DECODE_PARTITION` - infer from `org.apache.hudi.common.table.HoodieTableConfig.URL_ENCODE_PARTITIONING` - `META_SYNC_USE_FILE_LISTING_FROM_METADATA` - infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ENABLE` As proposed in https://github.com/apache/hudi/blob/master/rfc/rfc-55/rfc-55.md#compatible-changes	2022-07-26 15:28:31 +05:30
Dongwook Kwon	74d7b4d751	[HUDI-4471] Relocate AWSDmsAvroPayload class to hudi-common	2022-07-25 17:51:27 -07:00
Alexey Kudinkin	e7c8df7e8b	[HUDI-4250][HUDI-4202] Optimize performance of Column Stats Index reading in Data Skipping (#5746 ) We provide an alternative way of fetching Column Stats Index within the reading process to avoid the penalty of a more heavy-weight execution scheduled through a Spark engine.	2022-07-25 15:36:12 -07:00
Sagar Sumit	6e7ac45735	[HUDI-3884] Support archival beyond savepoint commits (#5837 ) Co-authored-by: sivabalan <n.siva.b@gmail.com>	2022-07-25 13:42:29 -05:00
Shiyan Xu	eee6a02f77	[HUDI-4456] Clean up test resources (#6203 )	2022-07-25 10:13:06 -05:00
Shiyan Xu	71c2c3102b	[HUDI-4455] Improve test classes for TestHiveSyncTool (#6202 ) Improve HiveTestService, HiveTestUtil, and related classes.	2022-07-25 19:05:34 +05:30
superche	1fda9ee9bb	[HUDI-4071] Match ROLLBACK_USING_MARKERS_ENABLE in sql as datasource (#6206 ) Co-authored-by: superche <superche@tencent.com>	2022-07-25 18:40:23 +08:00
Danny Chan	b513232449	[HUDI-4458] Add a converter cache for flink ColumnStatsIndices (#6205 )	2022-07-25 17:49:01 +08:00
Y Ethan Guo	f6e7227ed5	[MINOR] Only log stdout output for non-zero exit from commands in IT (#6199 )	2022-07-24 22:08:33 -07:00
Tim Brown	76a28daeb0	[HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness (#6201 )	2022-07-24 21:42:15 -07:00
Vander	2a08a65f71	[MINOR] Fix typos in Spark client related classes (#6204 )	2022-07-24 21:41:42 -07:00
simonsssu	1a910fd473	[HUDI-3510] Add sync validate procedure (#6200 ) * [HUDI-3510] Add sync validate procedure Co-authored-by: simonssu <simonssu@tencent.com>	2022-07-25 09:28:46 +08:00
KnightChess	a54c963543	[HUDI-4348] fix merge into sql data quality in concurrent scene (#6020 )	2022-07-24 06:29:47 -07:00
Rahil C	1a5a9f7f03	[HUDI-4439] Fix Amazon CloudWatch reporter for metadata enabled tables (#6164 ) Co-authored-by: Udit Mehrotra <uditme@amazon.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>	2022-07-23 21:08:21 -07:00
Danny Chan	ba11082282	[HUDI-4450] Revert the checkpoint abort notification (#6181 )	2022-07-24 08:44:22 +08:00
Danny Chan	a0ffd05b77	[HUDI-4448] Remove the latest commit refresh for timeline server (#6179 )	2022-07-23 16:10:53 -07:00
Alexey Kudinkin	2d745057ea	[HUDI-4420] Fixing table schema delineation on partition/data schema for Spark relations (#5708 )	2022-07-23 16:59:16 -05:00
Sagar Sumit	da28e38fe3	[HUDI-4071] Make NONE sort mode as default for bulk insert (#6195 )	2022-07-23 14:37:04 -05:00
Rahil C	f1f0109ab8	[HUDI-4440] Treat boostrapped table as non-partitioned in HudiFileIndex if partition column is missing from schema (#6163 ) Co-authored-by: Ryan Pifer <rmpifer@umich.edu>	2022-07-23 11:44:40 -07:00
Shiyan Xu	f0e843249c	[MINOR] Bump CI timeout to 150m (#6198 )	2022-07-23 10:07:51 -05:00
superche	859157ec01	[MINOR] Fix Call Procedure code style (#6186 ) * Fix Call Procedure code style. Co-authored-by: superche <superche@tencent.com>	2022-07-23 17:18:38 +08:00
Rahil C	a5348cc685	[HUDI-4436] Invalidate cached table in Spark after write (#6159 ) Co-authored-by: Ryan Pifer <rmpifer@umich.edu>	2022-07-22 22:47:47 -07:00
冯健	340c3dbbe1	[HUDI-4437] Fix test conflicts by clearing file system cache (#6123 ) Co-authored-by: jian.feng <fengjian428@gmial.com> Co-authored-by: jian.feng <jian.feng@shopee.com> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-07-22 17:58:04 -07:00
Rahil C	af10a97e7a	[HUDI-4435] Fix Avro field not found issue introduced by Avro 1.10 (#6155 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2022-07-22 17:26:16 -07:00
Shiyan Xu	d5c7c79d87	Revert "[HUDI-4324] Remove use_jdbc config from hudi sync (#6072 )" (#6160 ) This reverts commit `046044c83d`.	2022-07-22 17:18:45 -07:00
Sagar Sumit	a36762a862	[HUDI-4303] Use Hive sentinel value as partition default to avoid type caste issues (#5954 )	2022-07-22 17:14:36 -07:00
Alexey Kudinkin	39f2a06c85	[HUDI-3979] Optimize out mandatory columns when no merging is performed (#5430 ) For MOR, when no merging is performed there is no point in reading either primary-key or pre-combine-key values (unless query is referencing these). Avoiding reading these allows to potentially save substantial resources wasted for reading it out.	2022-07-22 15:32:44 -07:00
Shiyan Xu	6b84384022	Revert "[MINOR] Fix CI issue with TestHiveSyncTool (#6110 )" (#6192 ) This reverts commit `d5c904e10e`.	2022-07-22 12:20:39 -07:00
Sagar Sumit	716dd3512b	[MINOR] Disable Flink compactor IT test (#6189 )	2022-07-22 10:16:55 -07:00
Alexey Kudinkin	eea4a692c0	[HUDI-4039] Make sure all builtin `KeyGenerator`s properly implement Spark specific APIs (#5523 ) This set of changes makes sure that all builtin KeyGenerators properly implement Spark-specific APIs in a performant way (minimizing key-generators overhead)	2022-07-22 08:35:07 -07:00
Shiyan Xu	d5c904e10e	[MINOR] Fix CI issue with TestHiveSyncTool (#6110 )	2022-07-22 10:30:00 -05:00
Alexey Kudinkin	41653fc708	[MINOR] Fallback to default for hive-style partitioning, url-encoding configs (#6175 ) - Fixes broken ITTestHoodieDemo#testParquetDemo	2022-07-22 18:55:58 +05:30
ForwardXu	51b5783161	[HUDI-4404] Fix insert into dynamic partition write misalignment (#6124 )	2022-07-22 09:40:52 +08:00
superche	8e0b47e360	[MINOR] Fix result missing information issue in commits_compare Procedure (#6165 ) Co-authored-by: superche <superche@tencent.com>	2022-07-21 16:25:22 -07:00
Sivabalan Narayanan	36e656aa77	[HUDI-4247] Upgrading protocol buffers version for presto bundle (#5852 )	2022-07-21 15:58:40 -07:00
Sivabalan Narayanan	2e0dd29714	[HUDI-4204] Fixing NPE with row writer path and with OCC (#5850 )	2022-07-21 15:57:34 -07:00
Y Ethan Guo	50cdb867c7	[HUDI-4400] Fix missing bloom filters in metadata table in non-partitioned table (#6113 ) Fixes the missing bloom filters in metadata table in the non-partitioned table due to incorrect record key generation, because of wrong file names when generating the metadata payload for the bloom filter.	2022-07-21 11:38:25 -07:00
wenningd	f52b93fd10	Merge pull request #6154 from rahil-c/rahil-c/disable-emrSpark-properties [HUDI-4434] Disable EmrFS file metadata caching and EMR Spark's data prefetcher feature	2022-07-21 11:35:52 -07:00
Rahil C	2bf7920bd9	[MINOR] Add logger for HoodieCopyOnWriteTableInputFormat (#6161 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2022-07-21 22:27:18 +05:30
Alexey Kudinkin	a33bdd32e3	[HUDI-3993] Replacing UDF in Bulk Insert w/ RDD transformation (#5470 )	2022-07-21 06:20:47 -07:00
wenningd	c7fe3fd01d	[HUDI-3764] Allow loading external configs while querying Hudi tables with Spark (#4915 ) Currently when doing Hudi queries w/ Spark, it won't load the external configurations. Say if customers enabled metadata listing in their global config file, then this would let them actually query w/o metadata feature enabled. This PR fixes this issue and allows loading global configs during the Hudi reading phase. Co-authored-by: Wenning Ding <wenningd@amazon.com>	2022-07-21 15:12:17 +05:30

1 2 3 4 5 ...

3112 Commits