lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Y Ethan Guo	924c30c7ea	[HUDI-4469] Flip reuse flag to true in HoodieBackedTableMetadata to improve file listing (#6214 )	2022-07-27 14:04:59 -07:00
Shiyan Xu	717f159bfd	[HUDI-3730] Keep metasync configs backward compatible (#6221 )	2022-07-27 16:00:44 +05:30
冯健	e5faf2cc84	[HUDI-4210] Create custom hbase index to solve data skew issue on hbase regions (#5797 )	2022-07-26 18:09:17 +08:00
Shiyan Xu	1ea1e659c2	[HUDI-4474] Infer metasync configs (#6217 ) - infer repeated sync configs from original configs - `META_SYNC_BASE_FILE_FORMAT` - infer from `org.apache.hudi.common.table.HoodieTableConfig.BASE_FILE_FORMAT` - `META_SYNC_ASSUME_DATE_PARTITION` - infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ASSUME_DATE_PARTITIONING` - `META_SYNC_DECODE_PARTITION` - infer from `org.apache.hudi.common.table.HoodieTableConfig.URL_ENCODE_PARTITIONING` - `META_SYNC_USE_FILE_LISTING_FROM_METADATA` - infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ENABLE` As proposed in https://github.com/apache/hudi/blob/master/rfc/rfc-55/rfc-55.md#compatible-changes	2022-07-26 15:28:31 +05:30
Dongwook Kwon	74d7b4d751	[HUDI-4471] Relocate AWSDmsAvroPayload class to hudi-common	2022-07-25 17:51:27 -07:00
Alexey Kudinkin	e7c8df7e8b	[HUDI-4250][HUDI-4202] Optimize performance of Column Stats Index reading in Data Skipping (#5746 ) We provide an alternative way of fetching Column Stats Index within the reading process to avoid the penalty of a more heavy-weight execution scheduled through a Spark engine.	2022-07-25 15:36:12 -07:00
Sagar Sumit	6e7ac45735	[HUDI-3884] Support archival beyond savepoint commits (#5837 ) Co-authored-by: sivabalan <n.siva.b@gmail.com>	2022-07-25 13:42:29 -05:00
Shiyan Xu	eee6a02f77	[HUDI-4456] Clean up test resources (#6203 )	2022-07-25 10:13:06 -05:00
Shiyan Xu	71c2c3102b	[HUDI-4455] Improve test classes for TestHiveSyncTool (#6202 ) Improve HiveTestService, HiveTestUtil, and related classes.	2022-07-25 19:05:34 +05:30
superche	1fda9ee9bb	[HUDI-4071] Match ROLLBACK_USING_MARKERS_ENABLE in sql as datasource (#6206 ) Co-authored-by: superche <superche@tencent.com>	2022-07-25 18:40:23 +08:00
Danny Chan	b513232449	[HUDI-4458] Add a converter cache for flink ColumnStatsIndices (#6205 )	2022-07-25 17:49:01 +08:00
Y Ethan Guo	f6e7227ed5	[MINOR] Only log stdout output for non-zero exit from commands in IT (#6199 )	2022-07-24 22:08:33 -07:00
Tim Brown	76a28daeb0	[HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness (#6201 )	2022-07-24 21:42:15 -07:00
Vander	2a08a65f71	[MINOR] Fix typos in Spark client related classes (#6204 )	2022-07-24 21:41:42 -07:00
simonsssu	1a910fd473	[HUDI-3510] Add sync validate procedure (#6200 ) * [HUDI-3510] Add sync validate procedure Co-authored-by: simonssu <simonssu@tencent.com>	2022-07-25 09:28:46 +08:00
KnightChess	a54c963543	[HUDI-4348] fix merge into sql data quality in concurrent scene (#6020 )	2022-07-24 06:29:47 -07:00
Rahil C	1a5a9f7f03	[HUDI-4439] Fix Amazon CloudWatch reporter for metadata enabled tables (#6164 ) Co-authored-by: Udit Mehrotra <uditme@amazon.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>	2022-07-23 21:08:21 -07:00
Danny Chan	ba11082282	[HUDI-4450] Revert the checkpoint abort notification (#6181 )	2022-07-24 08:44:22 +08:00
Danny Chan	a0ffd05b77	[HUDI-4448] Remove the latest commit refresh for timeline server (#6179 )	2022-07-23 16:10:53 -07:00
Alexey Kudinkin	2d745057ea	[HUDI-4420] Fixing table schema delineation on partition/data schema for Spark relations (#5708 )	2022-07-23 16:59:16 -05:00
Sagar Sumit	da28e38fe3	[HUDI-4071] Make NONE sort mode as default for bulk insert (#6195 )	2022-07-23 14:37:04 -05:00
Rahil C	f1f0109ab8	[HUDI-4440] Treat boostrapped table as non-partitioned in HudiFileIndex if partition column is missing from schema (#6163 ) Co-authored-by: Ryan Pifer <rmpifer@umich.edu>	2022-07-23 11:44:40 -07:00
Shiyan Xu	f0e843249c	[MINOR] Bump CI timeout to 150m (#6198 )	2022-07-23 10:07:51 -05:00
superche	859157ec01	[MINOR] Fix Call Procedure code style (#6186 ) * Fix Call Procedure code style. Co-authored-by: superche <superche@tencent.com>	2022-07-23 17:18:38 +08:00
Rahil C	a5348cc685	[HUDI-4436] Invalidate cached table in Spark after write (#6159 ) Co-authored-by: Ryan Pifer <rmpifer@umich.edu>	2022-07-22 22:47:47 -07:00
冯健	340c3dbbe1	[HUDI-4437] Fix test conflicts by clearing file system cache (#6123 ) Co-authored-by: jian.feng <fengjian428@gmial.com> Co-authored-by: jian.feng <jian.feng@shopee.com> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>	2022-07-22 17:58:04 -07:00
Rahil C	af10a97e7a	[HUDI-4435] Fix Avro field not found issue introduced by Avro 1.10 (#6155 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2022-07-22 17:26:16 -07:00
Shiyan Xu	d5c7c79d87	Revert "[HUDI-4324] Remove use_jdbc config from hudi sync (#6072 )" (#6160 ) This reverts commit `046044c83d`.	2022-07-22 17:18:45 -07:00
Sagar Sumit	a36762a862	[HUDI-4303] Use Hive sentinel value as partition default to avoid type caste issues (#5954 )	2022-07-22 17:14:36 -07:00
Alexey Kudinkin	39f2a06c85	[HUDI-3979] Optimize out mandatory columns when no merging is performed (#5430 ) For MOR, when no merging is performed there is no point in reading either primary-key or pre-combine-key values (unless query is referencing these). Avoiding reading these allows to potentially save substantial resources wasted for reading it out.	2022-07-22 15:32:44 -07:00
Shiyan Xu	6b84384022	Revert "[MINOR] Fix CI issue with TestHiveSyncTool (#6110 )" (#6192 ) This reverts commit `d5c904e10e`.	2022-07-22 12:20:39 -07:00
Sagar Sumit	716dd3512b	[MINOR] Disable Flink compactor IT test (#6189 )	2022-07-22 10:16:55 -07:00
Alexey Kudinkin	eea4a692c0	[HUDI-4039] Make sure all builtin `KeyGenerator`s properly implement Spark specific APIs (#5523 ) This set of changes makes sure that all builtin KeyGenerators properly implement Spark-specific APIs in a performant way (minimizing key-generators overhead)	2022-07-22 08:35:07 -07:00
Shiyan Xu	d5c904e10e	[MINOR] Fix CI issue with TestHiveSyncTool (#6110 )	2022-07-22 10:30:00 -05:00
Alexey Kudinkin	41653fc708	[MINOR] Fallback to default for hive-style partitioning, url-encoding configs (#6175 ) - Fixes broken ITTestHoodieDemo#testParquetDemo	2022-07-22 18:55:58 +05:30
ForwardXu	51b5783161	[HUDI-4404] Fix insert into dynamic partition write misalignment (#6124 )	2022-07-22 09:40:52 +08:00
superche	8e0b47e360	[MINOR] Fix result missing information issue in commits_compare Procedure (#6165 ) Co-authored-by: superche <superche@tencent.com>	2022-07-21 16:25:22 -07:00
Sivabalan Narayanan	36e656aa77	[HUDI-4247] Upgrading protocol buffers version for presto bundle (#5852 )	2022-07-21 15:58:40 -07:00
Sivabalan Narayanan	2e0dd29714	[HUDI-4204] Fixing NPE with row writer path and with OCC (#5850 )	2022-07-21 15:57:34 -07:00
Y Ethan Guo	50cdb867c7	[HUDI-4400] Fix missing bloom filters in metadata table in non-partitioned table (#6113 ) Fixes the missing bloom filters in metadata table in the non-partitioned table due to incorrect record key generation, because of wrong file names when generating the metadata payload for the bloom filter.	2022-07-21 11:38:25 -07:00
wenningd	f52b93fd10	Merge pull request #6154 from rahil-c/rahil-c/disable-emrSpark-properties [HUDI-4434] Disable EmrFS file metadata caching and EMR Spark's data prefetcher feature	2022-07-21 11:35:52 -07:00
Rahil C	2bf7920bd9	[MINOR] Add logger for HoodieCopyOnWriteTableInputFormat (#6161 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2022-07-21 22:27:18 +05:30
Alexey Kudinkin	a33bdd32e3	[HUDI-3993] Replacing UDF in Bulk Insert w/ RDD transformation (#5470 )	2022-07-21 06:20:47 -07:00
wenningd	c7fe3fd01d	[HUDI-3764] Allow loading external configs while querying Hudi tables with Spark (#4915 ) Currently when doing Hudi queries w/ Spark, it won't load the external configurations. Say if customers enabled metadata listing in their global config file, then this would let them actually query w/o metadata feature enabled. This PR fixes this issue and allows loading global configs during the Hudi reading phase. Co-authored-by: Wenning Ding <wenningd@amazon.com>	2022-07-21 15:12:17 +05:30
Alexey Kudinkin	de37774e12	[HUDI-3896] Porting Nested Schema Pruning optimization for Hudi's custom Relations (#5428 ) Currently, all Hudi Relations bear performance gap relative to Spark's HadoopFsRelation and the reason to that is SchemaPruning optimization rule (pruning nested schemas) that is unfortunately predicated on usage of HadoopFsRelation, meaning that it's not applied in cases when any other relation is used. This change is porting this rule to Hudi relations (MOR, Incremental, etc) by the virtue of leveraging HoodieSparkSessionExtensions mechanism injecting modified version of the original SchemaPruning rule that is adopted to work w/ Hudi's custom relations. - Added customOptimizerRules to HoodieAnalysis - Added NestedSchemaPrunning Spark's Optimizer rule - Handle Spark's Optimizer pruned data schema (to effectively prune nested schemas) - Enable HoodieClientTestHarness to inject HoodieSparkSessionExtensions - Injecting Spark Session extensions for TestMORDataSource, TestCOWDataSource - Disabled fallback to HadoopFsRelation	2022-07-21 15:06:06 +05:30
Shiyan Xu	2394c62973	[HUDI-4146][RFC-55] Update config changes proposal (#6162 )	2022-07-21 12:55:02 +05:30
Danny Chan	348519f3cd	[HUDI-4427] Add a computed column IT test (#6150 )	2022-07-21 09:38:26 +08:00
Rahil Chertara	473be87aa5	Disable EmrFS file metadata caching and EMR Spark's data prefetcher feature	2022-07-20 17:04:00 -07:00
Y Ethan Guo	2b828ccb98	[HUDI-4401] Skip HBase version check (#6114 )	2022-07-20 14:09:45 -07:00
Danny Chan	e3675fe9b0	[HUDI-4372] Enable matadata table by default for flink (#6066 )	2022-07-20 16:10:19 +08:00

1 2 3 4 5 ...

3106 Commits