lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
stayrascal	f15125c0cd	[HUDI-3389] fix ColumnarArrayData ClassCastException issue (#4842 ) * [HUDI-3389] fix ColumnarArrayData ClassCastException issue * [HUDI-3389] remove MapColumnVector.java, RowColumnVector.java, and add test case for array<int> field	2022-02-19 10:56:41 +08:00
RexAn	5009138d04	[HUDI-3438] Avoid getSmallFiles if hoodie.parquet.small.file.limit is 0 (#4823 ) Co-authored-by: Hui An <hui.an@shopee.com>	2022-02-18 08:57:04 -05:00
Y Ethan Guo	fba5822ee3	[HUDI-3430] Fix Deltastreamer to properly shut down the services upon failure (#4824 )	2022-02-18 08:44:56 -05:00
luokey	de8161ae96	HoodieSortedMergeHandle#close write data disorder (#4841 ) Co-authored-by: 854194341@qq.com <loukey_7821>	2022-02-18 13:31:38 +04:00
Sagar Sumit	ed106f671e	[HUDI-2809] Introduce a checksum mechanism for validating hoodie.properties (#4712 ) Fix dependency conflict Fix repairs command Implement putIfAbsent for DDB lock provider Add upgrade step and validate while fetching configs Validate checksum for latest table version only while fetching config Move generateChecksum to BinaryUtil Rebase and resolve conflict Fix table version check	2022-02-18 10:17:06 +05:30
Danny Chan	2844a77b43	[HUDI-3439] Remove the hive shade pattern for flink bundle jar (#4833 )	2022-02-17 22:42:39 +08:00
zhangxiang17	433c2573ef	[HUDI-3442]Duplicate code calls for 'FlinkOptions.flatOptions' (#4832 )	2022-02-17 11:04:09 +08:00
Sagar Sumit	ba0afe1426	[HUDI-3426] Sync datasource clustering config (#4828 )	2022-02-16 19:02:49 -05:00
Alexey Kudinkin	aaddaf524a	[HUDI-3280] Cleaning up Hive-related hierarchies after refactoring (#4743 )	2022-02-16 15:36:37 -08:00
YueZhang	3363c66468	[HUDI-3394] Check isWriteLockedByCurrentThread before unlock for InProcessLockProvider (#4819 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>	2022-02-15 22:41:25 -08:00
Y Ethan Guo	9a05940a74	[HUDI-3366] Remove hardcoded logic of disabling metadata table in tests (#4792 )	2022-02-15 16:41:47 -05:00
Raymond Xu	538ec44fa8	[HUDI-2931] Add config to disable table services (#4777 )	2022-02-15 09:49:53 -05:00
Yann Byron	fe02c64fea	fix build & ci (#4822 )	2022-02-15 03:40:40 -08:00
Yann Byron	cb6ca7f0d1	[HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re… (#4714 )	2022-02-14 23:38:38 -05:00
Raymond Xu	27bd7b538e	[HUDI-1576] Make archiving an async service (#4795 )	2022-02-14 21:15:06 -05:00
Yann Byron	3b401d839c	[HUDI-3200] deprecate hoodie.file.index.enable and unify to use BaseFileOnlyViewRelation to handle (#4798 )	2022-02-14 17:38:01 -08:00
YueZhang	0a97a9893a	[HUDI-3398] Fix TableSchemaResolver for all file formats and metadata table (#4782 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2022-02-14 16:02:47 -08:00
Yuqi Gu	e639d99387	[HUDI-1657] Fix the build on aarch64, Fedora 33 (#4617 )	2022-02-14 15:10:18 -08:00
Raymond Xu	bcfd8efe66	[MINOR] Prevent async service from starting twice (#4801 )	2022-02-14 11:06:31 -08:00
leesf	0db1e978c6	[HUDI-3254] Introduce HoodieCatalog to manage tables for Spark Datasource V2 (#4611 )	2022-02-14 06:26:58 -08:00
yuzhaojing	5ca4480a38	[HUDI-3417] Switch AbstractTableFileSystemView#filterBaseFileAfterPendingCompaction log level to debug (#4805 ) Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>	2022-02-14 16:18:34 +08:00
董可伦	94806d5cf7	[HUDI-3272] If `mode==ignore && tableExists`, do not execute write logic and sync hive (#4632 )	2022-02-14 09:22:00 +05:30
RexAn	93ee09fee8	[HUDI-3412] TypedProperties no need to create new set when check key exist or not (#4791 ) Co-authored-by: Hui An <hui.an@shopee.com>	2022-02-14 11:33:29 +08:00
YueZhang	76e2faa28d	[HUDI-3370] The files recorded in the commit may not match the actual ones for MOR Compaction (#4753 ) * use HoodieCommitMetadata to replace writeStatuses computation Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2022-02-14 11:12:52 +08:00
冯健	55777fec05	[HUDI-2413] fix Sql source's checkpoint issue (#3648 ) * [HUDI-2413] fix Sql source's checkpoint * Fixing sql source checkpoint handling * Fixing docs Co-authored-by: jian.feng <fengjian428@gmial.com> Co-authored-by: sivabalan <n.siva.b@gmail.com>	2022-02-14 08:07:48 +05:30
Y Ethan Guo	6aba00e84f	[MINOR] Fix typos in Spark client related classes (#4781 )	2022-02-13 06:41:58 -08:00
wangxianghu	ce9762d588	[MINOR] unused import (#4799 )	2022-02-12 13:11:37 +04:00
zhangxiang17	9518f78610	[HUDI-3413]fix jackson parse error when empty message from JsonKafkaSource Using HoodieDeltaStreamer (#4794 )	2022-02-12 11:37:29 +04:00
satishkotha	89ed6f062e	[HUDI-3362] Fix restore to rollback pending clustering operations followed by other rolling back other commits (#4772 )	2022-02-11 14:12:45 -05:00
Yann Byron	b431246710	[HUDI-3338] Custom relation instead of HadoopFsRelation (#4709 ) Currently, HadoopFsRelation will use the value of the real partition path as the value of the partition field. However, different from the normal table, Hudi will persist the partition value in the parquet file. And in some cases, it's different between the value of the real partition path and the value of the partition field. So here we implement BaseFileOnlyViewRelation which lets Hudi manage its own relation.	2022-02-11 10:48:44 -08:00
Yann Byron	10474e0962	[HUDI-3402] Set TIMESTAMP_MICROS as the default value for hoodie.parquet.outputtimestamptype (#4749 )	2022-02-11 12:23:55 -05:00
Sivabalan Narayanan	ba4e732ba7	[HUDI-2987] Update all deprecated calls to new apis in HoodieRecordPayload (#4681 )	2022-02-10 19:19:33 -05:00
Yann Byron	2fe7a3a41f	[HUDI-2610] pass the spark version when sync the table created by spark (#4758 ) * [HUDI-2610] pass the spark version when sync the table created by spark * [MINOR] sync spark version in DataSourceUtils#buildHiveSyncConfig	2022-02-10 21:05:28 +05:30
wenningd	1c778590d1	[HUDI-3395] Allow pass rollbackUsingMarkers to Hudi CLI rollback command (#4557 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2022-02-10 09:41:22 -05:00
Yann Byron	d971974063	[HUDI-3333] fix that getNestedFieldVal breaks with Spark 3.2 (#4783 )	2022-02-10 06:12:16 -08:00
Sivabalan Narayanan	e7ec3a82dc	[HUDI-2432] Adding restore.requested instant and restore plan for restore action (#4605 ) - This adds a restore plan and serializes it to restore.requested meta file in timeline. This also means that we are introducing schedule and execution phases for restore which was not present before.	2022-02-10 08:06:23 -05:00
Sivabalan Narayanan	0ababcfaa7	[HUDI-1847] Adding inline scheduling support for spark datasource path for compaction and clustering (#4420 ) - This adds support in spark-datasource to just schedule table services inline so that users can leverage async execution w/o the need for lock service providers.	2022-02-10 08:04:55 -05:00
Danny Chan	b3b44236fe	[HUDI-3389] Bump flink version to 1.14.3 (#4776 )	2022-02-10 11:32:01 +08:00
Alexey Kudinkin	464027ec37	[HUDI-3239] Convert `BaseHoodieTableFileIndex` to Java (#4669 ) Converting BaseHoodieTableFileIndex to Java, removing Scala as a dependency from "hudi-common"	2022-02-09 18:42:08 -05:00
Alexey Kudinkin	973087f385	[HUDI-3276] Rebased Parquet-based `FileInputFormat` impls to inherit from `MapredParquetInputFormat` (#4667 ) Rebased Parquet-based FileInputFormat impls to inherit from MapredParquetInputFormat, to make sure that Hive is appropriately recognizing those impls and applying corresponding optimizations. - Converted HoodieRealtimeFileInputFormatBase and HoodieFileInputFormatBase into standalone implementations that could be instantiated as standalone objects (which could be used for delegation) - Renamed HoodieFileInputFormatBase > HoodieCopyOnWriteTableInputFormat, HoodieRealtimeFileInputFormatBase > HoodieMergeOnReadTableInputFormat - Scaffolded HoodieParquetFileInputFormatBase for all Parquet impls to inherit from - Rebased Parquet impls onto HoodieParquetFileInputFormatBase	2022-02-08 15:21:45 -05:00
Sivabalan Narayanan	60831d6906	[HUDI-3361] Fixing missing begin checkpoint in HoodieIncremental pull (#4755 )	2022-02-08 12:03:07 -05:00
Sivabalan Narayanan	6a32cfe020	[HUDI-3091] Making SIMPLE index as the default index type (#4659 ) * [HUDI-3091] Making SIMPLE index as the default index type * Fixing tests * Traiging timeouts * disable SIMPLE index for bootstrap tests * removing test run start and end log statements * Fixing simple index parallellism for some tests * Disabling failing test for now * reverting previous disable * Reverting all changes * fixing azure pipeline script	2022-02-08 15:02:18 +05:30
Sivabalan Narayanan	ab73047958	Adding support for custom scheduler configs with streaming sink (#4762 )	2022-02-08 14:44:10 +05:30
YueZhang	1636876e8a	[HUDI-3320] Hoodie metadata table validator (#4721 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>	2022-02-08 00:29:44 -08:00
Sivabalan Narayanan	0ab1a8ec80	[HUDI-3312] Fixing spark yaml and adding hive validation to integ test suite (#4731 )	2022-02-08 00:40:36 -05:00
Vinish Reddy	8ab6f17149	[HUDI-3373] Add zero value metrics for empty data source and PROMETHEUS_PUSHGATEWAY reporter (#4760 )	2022-02-07 15:17:46 -05:00
satishkotha	3bd8fc1c3e	[HUDI-3058] Simplify Precommit file system view (#4570 )	2022-02-07 12:16:50 -08:00
Alexey Kudinkin	3f263b82ce	[HUDI-3206] Unify Hive's MOR implementations to avoid duplication (#4559 ) Unify Hive's MOR implementations to avoid duplication to avoid duplication across implementations for different file-formats (Parquet, HFile, etc) - Extracted HoodieRealtimeFileInputFormatBase (extending COW HoodieFileInputFormatBase base) - Rebased Parquet, HFile implementations onto HoodieRealtimeFileInputFormatBase - Tidying up	2022-02-07 14:06:28 -05:00
ForwardXu	773b317983	[HUDI-2941] Show _hoodie_operation in spark sql results (#4649 )	2022-02-07 06:28:13 -08:00
Sivabalan Narayanan	24f738fe68	[HUDI-3360] Adding retries to deltastreamer for source errors (#4744 )	2022-02-07 08:10:06 -05:00

1 2 3 4 5 ...

2478 Commits