lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Alexey Kudinkin	099c2c099a	[HUDI-3457] Refactored Spark DataSource Relations to avoid code duplication (#4877 ) Refactoring Spark DataSource Relations to avoid code duplication. Following Relations were in scope: - BaseFileOnlyViewRelation - MergeOnReadSnapshotRelaation - MergeOnReadIncrementalRelation	2022-03-18 22:32:16 -07:00
Sivabalan Narayanan	316e38c71e	[HUDI-3659] Reducing the validation frequency with integ tests (#5067 )	2022-03-18 12:45:33 -04:00
Sivabalan Narayanan	2551c26183	[HUDI-3656] Adding medium sized dataset for clustering and minor fixes to integ tests (#5063 )	2022-03-18 12:44:56 -04:00
JerryYue-M	6fe4d6e2f6	[HUDI-3598] Row Data to Hoodie Record Operator parallelism needs to always be consistent with input operator (#5049 ) for chaining purpose Co-authored-by: jerryyue <jerryyue@didiglobal.com>	2022-03-18 10:47:29 +08:00
RexAn	9ece77561a	[MINOR] HoodieFileScanRDD could print null path (#5056 ) Co-authored-by: Rex An <bonean131@gmail.com>	2022-03-17 12:53:45 -07:00
Raymond Xu	7446ff95a7	[HUDI-2439] Replace RDD with HoodieData in HoodieSparkTable and commit executors (#4856 ) - Adopt HoodieData in Spark action commit executors - Make Spark independent DeleteHelper, WriteHelper, MergeHelper in hudi-client-common - Make HoodieTable in WriteClient APIs have raw type to decouple with Client's generic types	2022-03-17 04:17:56 -07:00
冯健	bf191f8d46	[HUDI-3645] Fix NPE caused by multiple threads accessing non-thread-safe HashMap (#5028 ) - Change HashMap in HoodieROTablePathFilter to ConcurrentHashMap	2022-03-17 14:20:28 +05:30
Y Ethan Guo	5ba2d9ab2f	[HUDI-3494] Consider triggering condition of MOR compaction during archival (#4974 )	2022-03-17 01:28:11 -04:00
Y Ethan Guo	95e6e53810	[HUDI-3404] Automatically adjust write configs based on metadata table and write concurrency mode (#4975 )	2022-03-17 01:25:04 -04:00
YueZhang	8ca9a54db0	[Hudi-3376] Add an option to skip under deletion files for HoodieMetadataTableValidator (#4994 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2022-03-16 18:31:00 -07:00
that's cool	91849c3d66	[HUDI-3607] Support backend switch in HoodieFlinkStreamer (#5032 ) * [HUDI-3607] Support backend switch in HoodieFlinkStreamer * [HUDI-3607] Support backend switch in HoodieFlinkStreamer 1. checkstyle fix * [HUDI-3607] Support backend switch in HoodieFlinkStreamer 1. change the msg	2022-03-16 10:07:31 +04:00
Y Ethan Guo	296a0e6bcf	[HUDI-3588] Remove hudi-common and hudi-hadoop-mr jars in Presto Docker image (#4997 )	2022-03-15 18:49:30 -07:00
todd5167	55dca969f9	[HUDI-3589] flink sync hive metadata supports table properties and serde properties (#4995 )	2022-03-15 23:56:37 +04:00
Sagar Sumit	d514570e90	[HUDI-3633] Allow non-string values to be set in TypedProperties (#5045 ) * [HUDI-3633] Allow non-string values to be set in TypedProperties * Override getProperty to ignore instanceof string check	2022-03-15 22:33:22 +04:00
Alexey Kudinkin	5e8ff8d793	[HUDI-3514] Rebase Data Skipping flow to rely on MT Column Stats index (#4948 )	2022-03-15 10:38:36 -07:00
l-shen	9bdda2a312	[HUDI-3619] Fix HoodieOperation fromValue using wrong constant value (#5033 ) Co-authored-by: root <l-shen@localhost.localdomain>	2022-03-15 16:34:31 +04:00
Thinking Chen	6ed7106e59	[HUDI-3606] Add `org.objenesis:objenesis` to hudi-timeline-server-bundle pom (#5017 )	2022-03-15 15:06:50 +04:00
wangxianghu	3b59b76952	[HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string (#4987 ) * [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string * add ut * Address comment	2022-03-15 15:06:30 +04:00
Sivabalan Narayanan	d40adfa2d7	[HUDI-3620] Adding spark3.2.0 profile (#5038 )	2022-03-14 19:14:00 -04:00
Sivabalan Narayanan	30cf39301e	[HUDI-3623] Removing hive sync node from non hive yamls (#5040 )	2022-03-14 18:39:26 -04:00
Sivabalan Narayanan	22c3ce73db	[HUDI-3621] Fixing NullPointerException in DeltaStreamer (#5039 )	2022-03-14 18:34:17 -04:00
wangxianghu	003c6ee73e	[MINODR] Remove repeated kafka-clients dependencies (#5034 )	2022-03-14 18:24:06 +04:00
peanut-chenzhong	4b75cb6f23	fix NPE when run schdule using spark-sql if the commits time < hoodie.compact.inline.max.delta.commits (#4976 ) * Update CompactionHoodiePathCommand.scala fix NPE when run schdule using spark-sql if the commits time < hoodie.compact.inline.max.delta.commits * Update CompactionHoodiePathCommand.scala fix IndexOutOfBoundsException when there`s no schedule for compaction * Update CompactionHoodiePathCommand.scala fix CI issue	2022-03-14 16:40:38 +08:00
Danny Chan	465d553df8	[HUDI-3600] Tweak the default cleaning strategy to be more streaming friendly for flink (#5010 )	2022-03-14 14:22:07 +08:00
Sivabalan Narayanan	1ba8220617	[HUDI-3613] Adding/fixing yamls for metadata (#5029 )	2022-03-13 21:11:37 -04:00
ForwardXu	6c8224cae6	[HUDI-3501] Support savepoints command based on Call Produce Command (#5025 )	2022-03-13 16:58:21 +04:00
liujinhui	e60acc1258	[HUDI-3583] Fix MarkerBasedRollbackStrategy NoSuchElementException (#4984 ) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>	2022-03-12 23:00:50 -08:00
Sagar Sumit	eee96e9af3	[HUDI-3593] Restore TypedProperties and flush checksum in table config (#5013 ) Create new TypedProperties while performing clustering Add OrderedProperties and minor refactoring Add javadoc and remove getters from OrderedProperties	2022-03-13 07:58:55 +05:30
Sivabalan Narayanan	e7bb0413af	[HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way (#4971 )	2022-03-11 18:40:13 -05:00
wangxianghu	e8918b6c2c	[HUDI-3569] Introduce ChainedJsonKafkaSourePostProcessor to support setting multi processors at once (#4969 )	2022-03-11 17:49:30 -05:00
RexAn	93277b2bcd	[HUDI-3592] Fix NPE of DefaultHoodieRecordPayload if Property is empty (#4999 ) Co-authored-by: Rex An <bonean131@gmail.com>	2022-03-11 17:45:40 -05:00
Alexey Kudinkin	5d59bf67ae	[HUDI-3513] Make sure Column Stats does not fail in case it fails to load previous Index Table state (#5015 )	2022-03-11 17:39:22 -05:00
huberylee	56cb49485d	[HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable (#4982 )	2022-03-11 13:23:19 -08:00
wangxianghu	b00180342e	[HUDI-3575] Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in TestSchemaPostProcessor (#5019 )	2022-03-11 15:03:42 +04:00
苏承祥	faed6996ee	[HUDI-3566] Add thread factory in BoundedInMemoryExecutor (#4926 ) Co-authored-by: 苏承祥 <sucx@tuya.com>	2022-03-11 18:58:49 +08:00
Yuwei XIAO	18cdad9206	[HUDI-2999] [RFC-42] RFC for consistent hashing index (#4326 ) * [HUDI-2999] rfc for consistent hashing index * [HUDI-2999] review: add metadata table & non-dual-write solution (virtual log file) for resizing Co-authored-by: xiaoyuwei <xiaoyuwei.yw@alibaba-inc.com>	2022-03-11 14:41:01 +08:00
wangxianghu	83cff3afee	[HUDI-3522] Introduce DropColumnSchemaPostProcessor to support drop columns from schema (#4972 ) * [HUDI-3522] Introduce DropColumnSchemaPostProcessor to support drop columns from schema * Fix case sensitivity	2022-03-11 09:30:37 +04:00
Sivabalan Narayanan	9dc6df5dca	[HUDI-3595] Fixing NULL schema provider for empty batch (#5002 )	2022-03-10 22:52:55 -05:00
Y Ethan Guo	fa5e75068e	[HUDI-3586] Add Trino Queries in integration tests (#4988 )	2022-03-10 21:17:32 -05:00
Sagar Sumit	4e09545be4	[HUDI-3602][DOCS] Update docker README to build multi-arch images using buildx (#5011 )	2022-03-10 02:38:27 -08:00
Danny Chan	ec24407191	[HUDI-3581] Reorganize some clazz for hudi flink (#4983 )	2022-03-10 15:55:15 +08:00
Alexey Kudinkin	034addaef5	[HUDI-3396] Make sure `BaseFileOnlyViewRelation` only reads projected columns (#4818 ) NOTE: This change is first part of the series to clean up Hudi's Spark DataSource related implementations, making sure there's minimal code duplication among them, implementations are consistent and performant This PR is making sure that BaseFileOnlyViewRelation only reads projected columns as well as avoiding unnecessary serde from Row to InternalRow Brief change log - Introduced HoodieBaseRDD as a base for all custom RDD impls - Extracted common fields/methods to HoodieBaseRelation - Cleaned up and streamlined HoodieBaseFileViewOnlyRelation - Fixed all of the Relations to avoid superfluous Row <> InternalRow conversions	2022-03-09 21:45:25 -05:00
ForwardXu	ca0b8fccee	[MINOR] Add IT CI Test timeout option (#5003 )	2022-03-09 18:04:36 -08:00
MrSleeping123	8859b48b2a	[HUDI-3383] Sync column comments while syncing a hive table (#4960 ) Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false. While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.	2022-03-10 09:44:39 +08:00
wangxianghu	548000b0d6	[HUDI-3568] Introduce ChainedSchemaPostProcessor to support setting multi processors at once (#4968 )	2022-03-09 11:16:22 +04:00
Sivabalan Narayanan	4324e874ae	[HUDI-3587] Making SupportsUpgradeDowngrade serializable (#4991 )	2022-03-09 00:04:42 -05:00
ForwardXu	08fd80c913	[HUDI-3221] Support querying a table as of a savepoint (#4720 )	2022-03-08 10:02:34 -08:00
Sagar Sumit	575bc63468	[HUDI-3356][HUDI-3203] HoodieData for metadata index records; BloomFilter construction from index based on the type param (#4848 ) Rework of #4761 This diff introduces following changes: - Write stats are converted to metadata index records during the commit. Making them use the HoodieData type so that the record generation scales up with needs. - Metadata index init support for bloom filter and column stats partitions. - When building the BloomFilter from the index records, using the type param stored in the payload instead of hardcoded type. - Delta writes can change column ranges and the column stats index need to be properly updated with new ranges to be consistent with the table dataset. This fix add column stats index update support for the delta writes. Co-authored-by: Manoj Govindassamy <manoj.govindassamy@gmail.com>	2022-03-08 10:39:04 -05:00
Raymond Xu	ed26c5265c	[HUDI-3584] Skip integ test modules by default (#4986 )	2022-03-08 06:32:04 -08:00
ForwardXu	25385805aa	[HUDI-3574] Improve maven module configs for different spark profiles (#4970 )	2022-03-08 01:01:05 -08:00

... 8 9 10 11 12 ...

3055 Commits