YueZhang
359fbfde79
[HUDI-2648] Retry FileSystem action instead of failed directly. ( #3887 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-02-20 15:31:31 -05:00
Raymond Xu
0938f55a2b
[HUDI-3458] Fix BulkInsertPartitioner generic type ( #4854 )
2022-02-20 13:51:58 -05:00
Sivabalan Narayanan
66ac1446dd
[MINOR] Moving spark scheduling configs out of DataSourceOptions ( #4843 )
2022-02-20 13:49:18 -05:00
Bo Cui
83279971a1
[HUDI-3446] Supports batch reader in BootstrapOperator#loadRecords ( #4837 )
...
* [HUDI-3446] Supports batch Reader in BootstrapOperator#loadRecords
2022-02-19 21:21:48 +08:00
stayrascal
f15125c0cd
[HUDI-3389] fix ColumnarArrayData ClassCastException issue ( #4842 )
...
* [HUDI-3389] fix ColumnarArrayData ClassCastException issue
* [HUDI-3389] remove MapColumnVector.java, RowColumnVector.java, and add test case for array<int> field
2022-02-19 10:56:41 +08:00
RexAn
5009138d04
[HUDI-3438] Avoid getSmallFiles if hoodie.parquet.small.file.limit is 0 ( #4823 )
...
Co-authored-by: Hui An <hui.an@shopee.com >
2022-02-18 08:57:04 -05:00
Y Ethan Guo
fba5822ee3
[HUDI-3430] Fix Deltastreamer to properly shut down the services upon failure ( #4824 )
2022-02-18 08:44:56 -05:00
luokey
de8161ae96
HoodieSortedMergeHandle#close write data disorder ( #4841 )
...
Co-authored-by: 854194341@qq.com <loukey_7821>
2022-02-18 13:31:38 +04:00
Sagar Sumit
ed106f671e
[HUDI-2809] Introduce a checksum mechanism for validating hoodie.properties ( #4712 )
...
Fix dependency conflict
Fix repairs command
Implement putIfAbsent for DDB lock provider
Add upgrade step and validate while fetching configs
Validate checksum for latest table version only while fetching config
Move generateChecksum to BinaryUtil
Rebase and resolve conflict
Fix table version check
2022-02-18 10:17:06 +05:30
Danny Chan
2844a77b43
[HUDI-3439] Remove the hive shade pattern for flink bundle jar ( #4833 )
2022-02-17 22:42:39 +08:00
zhangxiang17
433c2573ef
[HUDI-3442]Duplicate code calls for 'FlinkOptions.flatOptions' ( #4832 )
2022-02-17 11:04:09 +08:00
Sagar Sumit
ba0afe1426
[HUDI-3426] Sync datasource clustering config ( #4828 )
2022-02-16 19:02:49 -05:00
Alexey Kudinkin
aaddaf524a
[HUDI-3280] Cleaning up Hive-related hierarchies after refactoring ( #4743 )
2022-02-16 15:36:37 -08:00
YueZhang
3363c66468
[HUDI-3394] Check isWriteLockedByCurrentThread before unlock for InProcessLockProvider ( #4819 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com >
2022-02-15 22:41:25 -08:00
Y Ethan Guo
9a05940a74
[HUDI-3366] Remove hardcoded logic of disabling metadata table in tests ( #4792 )
2022-02-15 16:41:47 -05:00
Raymond Xu
538ec44fa8
[HUDI-2931] Add config to disable table services ( #4777 )
2022-02-15 09:49:53 -05:00
Yann Byron
fe02c64fea
fix build & ci ( #4822 )
2022-02-15 03:40:40 -08:00
Yann Byron
cb6ca7f0d1
[HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re… ( #4714 )
2022-02-14 23:38:38 -05:00
Raymond Xu
27bd7b538e
[HUDI-1576] Make archiving an async service ( #4795 )
2022-02-14 21:15:06 -05:00
Yann Byron
3b401d839c
[HUDI-3200] deprecate hoodie.file.index.enable and unify to use BaseFileOnlyViewRelation to handle ( #4798 )
2022-02-14 17:38:01 -08:00
YueZhang
0a97a9893a
[HUDI-3398] Fix TableSchemaResolver for all file formats and metadata table ( #4782 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-02-14 16:02:47 -08:00
Yuqi Gu
e639d99387
[HUDI-1657] Fix the build on aarch64, Fedora 33 ( #4617 )
2022-02-14 15:10:18 -08:00
Raymond Xu
bcfd8efe66
[MINOR] Prevent async service from starting twice ( #4801 )
2022-02-14 11:06:31 -08:00
leesf
0db1e978c6
[HUDI-3254] Introduce HoodieCatalog to manage tables for Spark Datasource V2 ( #4611 )
2022-02-14 06:26:58 -08:00
yuzhaojing
5ca4480a38
[HUDI-3417] Switch AbstractTableFileSystemView#filterBaseFileAfterPendingCompaction log level to debug ( #4805 )
...
Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com >
2022-02-14 16:18:34 +08:00
董可伦
94806d5cf7
[HUDI-3272] If mode==ignore && tableExists, do not execute write logic and sync hive ( #4632 )
2022-02-14 09:22:00 +05:30
RexAn
93ee09fee8
[HUDI-3412] TypedProperties no need to create new set when check key exist or not ( #4791 )
...
Co-authored-by: Hui An <hui.an@shopee.com >
2022-02-14 11:33:29 +08:00
YueZhang
76e2faa28d
[HUDI-3370] The files recorded in the commit may not match the actual ones for MOR Compaction ( #4753 )
...
* use HoodieCommitMetadata to replace writeStatuses computation
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-02-14 11:12:52 +08:00
冯健
55777fec05
[HUDI-2413] fix Sql source's checkpoint issue ( #3648 )
...
* [HUDI-2413] fix Sql source's checkpoint
* Fixing sql source checkpoint handling
* Fixing docs
Co-authored-by: jian.feng <fengjian428@gmial.com >
Co-authored-by: sivabalan <n.siva.b@gmail.com >
2022-02-14 08:07:48 +05:30
Y Ethan Guo
6aba00e84f
[MINOR] Fix typos in Spark client related classes ( #4781 )
2022-02-13 06:41:58 -08:00
wangxianghu
ce9762d588
[MINOR] unused import ( #4799 )
2022-02-12 13:11:37 +04:00
zhangxiang17
9518f78610
[HUDI-3413]fix jackson parse error when empty message from JsonKafkaSource Using HoodieDeltaStreamer ( #4794 )
2022-02-12 11:37:29 +04:00
satishkotha
89ed6f062e
[HUDI-3362] Fix restore to rollback pending clustering operations followed by other rolling back other commits ( #4772 )
2022-02-11 14:12:45 -05:00
Yann Byron
b431246710
[HUDI-3338] Custom relation instead of HadoopFsRelation ( #4709 )
...
Currently, HadoopFsRelation will use the value of the real partition path as the value of the partition field. However, different from the normal table, Hudi will persist the partition value in the parquet file. And in some cases, it's different between the value of the real partition path and the value of the partition field.
So here we implement BaseFileOnlyViewRelation which lets Hudi manage its own relation.
2022-02-11 10:48:44 -08:00
Yann Byron
10474e0962
[HUDI-3402] Set TIMESTAMP_MICROS as the default value for hoodie.parquet.outputtimestamptype ( #4749 )
2022-02-11 12:23:55 -05:00
Sivabalan Narayanan
ba4e732ba7
[HUDI-2987] Update all deprecated calls to new apis in HoodieRecordPayload ( #4681 )
2022-02-10 19:19:33 -05:00
Yann Byron
2fe7a3a41f
[HUDI-2610] pass the spark version when sync the table created by spark ( #4758 )
...
* [HUDI-2610] pass the spark version when sync the table created by spark
* [MINOR] sync spark version in DataSourceUtils#buildHiveSyncConfig
2022-02-10 21:05:28 +05:30
wenningd
1c778590d1
[HUDI-3395] Allow pass rollbackUsingMarkers to Hudi CLI rollback command ( #4557 )
...
Co-authored-by: Wenning Ding <wenningd@amazon.com >
2022-02-10 09:41:22 -05:00
Yann Byron
d971974063
[HUDI-3333] fix that getNestedFieldVal breaks with Spark 3.2 ( #4783 )
2022-02-10 06:12:16 -08:00
Sivabalan Narayanan
e7ec3a82dc
[HUDI-2432] Adding restore.requested instant and restore plan for restore action ( #4605 )
...
- This adds a restore plan and serializes it to restore.requested meta file in timeline. This also means that we are introducing schedule and execution phases for restore which was not present before.
2022-02-10 08:06:23 -05:00
Sivabalan Narayanan
0ababcfaa7
[HUDI-1847] Adding inline scheduling support for spark datasource path for compaction and clustering ( #4420 )
...
- This adds support in spark-datasource to just schedule table services inline so that users can leverage async execution w/o the need for lock service providers.
2022-02-10 08:04:55 -05:00
Danny Chan
b3b44236fe
[HUDI-3389] Bump flink version to 1.14.3 ( #4776 )
2022-02-10 11:32:01 +08:00
Alexey Kudinkin
464027ec37
[HUDI-3239] Convert BaseHoodieTableFileIndex to Java ( #4669 )
...
Converting BaseHoodieTableFileIndex to Java, removing Scala as a dependency from "hudi-common"
2022-02-09 18:42:08 -05:00
Alexey Kudinkin
973087f385
[HUDI-3276] Rebased Parquet-based FileInputFormat impls to inherit from MapredParquetInputFormat ( #4667 )
...
Rebased Parquet-based FileInputFormat impls to inherit from MapredParquetInputFormat, to make sure that Hive is appropriately recognizing those impls and applying corresponding optimizations.
- Converted HoodieRealtimeFileInputFormatBase and HoodieFileInputFormatBase into standalone implementations that could be instantiated as standalone objects (which could be used for delegation)
- Renamed HoodieFileInputFormatBase > HoodieCopyOnWriteTableInputFormat, HoodieRealtimeFileInputFormatBase > HoodieMergeOnReadTableInputFormat
- Scaffolded HoodieParquetFileInputFormatBase for all Parquet impls to inherit from
- Rebased Parquet impls onto HoodieParquetFileInputFormatBase
2022-02-08 15:21:45 -05:00
Sivabalan Narayanan
60831d6906
[HUDI-3361] Fixing missing begin checkpoint in HoodieIncremental pull ( #4755 )
2022-02-08 12:03:07 -05:00
Sivabalan Narayanan
6a32cfe020
[HUDI-3091] Making SIMPLE index as the default index type ( #4659 )
...
* [HUDI-3091] Making SIMPLE index as the default index type
* Fixing tests
* Traiging timeouts
* disable SIMPLE index for bootstrap tests
* removing test run start and end log statements
* Fixing simple index parallellism for some tests
* Disabling failing test for now
* reverting previous disable
* Reverting all changes
* fixing azure pipeline script
2022-02-08 15:02:18 +05:30
Sivabalan Narayanan
ab73047958
Adding support for custom scheduler configs with streaming sink ( #4762 )
2022-02-08 14:44:10 +05:30
YueZhang
1636876e8a
[HUDI-3320] Hoodie metadata table validator ( #4721 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com >
2022-02-08 00:29:44 -08:00
Sivabalan Narayanan
0ab1a8ec80
[HUDI-3312] Fixing spark yaml and adding hive validation to integ test suite ( #4731 )
2022-02-08 00:40:36 -05:00
Vinish Reddy
8ab6f17149
[HUDI-3373] Add zero value metrics for empty data source and PROMETHEUS_PUSHGATEWAY reporter ( #4760 )
2022-02-07 15:17:46 -05:00