Danny Chan
329da34ee0
[HUDI-4163] Catch general exception instead of IOException while fetching rollback plan during rollback ( #5703 )
...
If the avro file is corrupted, an InvalidAvroMagicException throws.
2022-05-30 13:08:02 +08:00
苏承祥
7e86884604
[HUDI-4086] Use CustomizedThreadFactory in async compaction and clustering ( #5563 )
...
Co-authored-by: 苏承祥 <sucx@tuya.com >
2022-05-28 22:35:47 -07:00
komao
8d2f009048
[HUDI-4124] Add valid check in Spark Datasource configs ( #5637 )
...
Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com >
2022-05-26 05:21:28 -07:00
RexAn
98c5c6c654
[HUDI-4040] Bulk insert Support CustomColumnsSortPartitioner with Row ( #5502 )
...
* Along the lines of RDDCustomColumnsSortPartitioner but for Row
2022-05-26 10:39:04 +05:30
Danny Chan
4e42ed5eae
[HUDI-4145] Archives the metadata file in HoodieInstant.State sequence (part2) ( #5676 )
2022-05-26 11:21:39 +08:00
Sagar Sumit
cf837b4900
[HUDI-3193] Decouple hudi-aws from hudi-client-common ( #5666 )
...
Move HoodieMetricsCloudWatchConfig to hudi-client-common
2022-05-25 19:38:56 +05:30
喻兆靖
c20db99a7b
[HUDI-2207] Support independent flink hudi clustering function
2022-05-24 20:16:48 +08:00
Danny Chan
eb219010d2
[HUDI-4145] Archives the metadata file in HoodieInstant.State sequence ( #5669 )
2022-05-24 17:33:30 +08:00
Sivabalan Narayanan
c05ebf2417
[HUDI-2473] Fixing compaction write operation in commit metadata ( #5203 )
2022-05-24 13:03:21 +05:30
Danny Chan
676d5cefe0
[HUDI-4138] Fix the concurrency modification of hoodie table config for flink ( #5660 )
...
* Remove the metadata cleaning strategy for flink, that means the multi-modal index may be affected
* Improve the HoodieTable#clearMetadataTablePartitionsConfig to only update table config when necessary
* Remove the modification of read code path in HoodieTableConfig
2022-05-24 13:07:55 +08:00
Heap
47b764ec33
[HUDI-4134] Fix Method naming consistency issues in FSUtils ( #5655 )
2022-05-23 15:28:48 -07:00
Danny Chan
c7576f7613
[HUDI-4130] Remove the upgrade/downgrade for flink #initTable ( #5642 )
2022-05-20 21:31:23 +08:00
Danny Chan
6f37863ba8
[HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#initTable ( #5617 )
...
No need to #sync actively because the table instance is instantiated freshly,
its view manager has empty fiew instantces, the fs view would be synced lazily when
is it requested.
2022-05-19 10:59:05 +08:00
Danny Chan
f1f8a1abb7
[HUDI-4109] Copy the old record directly when it is chosen for merging ( #5603 )
2022-05-18 10:17:00 +08:00
Danny Chan
ebbe56e862
[minor] Some code refactoring for LogFileComparator and Instant instantiation ( #5600 )
2022-05-18 09:30:09 +08:00
BruceLin
99555c897a
[HUDI-4110] Clean the marker files for flink compaction ( #5604 )
2022-05-17 21:09:27 +08:00
Danny Chan
d52d13302d
[HUDI-4101] BucketIndexPartitioner should take partition path for better dispersion ( #5590 )
2022-05-17 10:34:57 +08:00
Shawy Geng
ad773b3d96
[HUDI-3654] Preparations for hudi metastore. ( #5572 )
...
* [HUDI-3654] Preparations for hudi metastore.
Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com >
2022-05-17 09:47:10 +08:00
Danny Chan
43e08193ef
[HUDI-4098] Metadata table heartbeat for instant has expired, last heartbeat 0 ( #5583 )
2022-05-16 17:40:08 +08:00
Yuwei XIAO
61030d8e7a
[HUDI-3123] consistent hashing index: basic write path (upsert/insert) ( #4480 )
...
1. basic write path(insert/upsert) implementation
2. adapt simple bucket index
2022-05-16 11:07:01 +08:00
xi chaomin
6e16e719cd
[HUDI-3980] Suport kerberos hbase index ( #5464 )
...
- Add configurations in HoodieHBaseIndexConfig.java to support kerberos hbase connection.
Co-authored-by: xicm <xicm@asiainfo.com >
2022-05-14 07:37:31 -04:00
wqwl611
52e63b39d6
[HUDI-4097] add table info to jobStatus ( #5529 )
...
Co-authored-by: wqwl611 <wqwl611@gmail.com >
2022-05-13 21:01:15 -04:00
Alexey Kudinkin
4a8589f222
[HUDI-4038] Avoid calling getDataSize after every record written ( #5497 )
...
- getDataSize has non-trivial overhead in the current ParquetWriter impl, requiring traversal of already composed Column Groups in memory. Instead we can sample these calls to getDataSize to amortize its cost.
Co-authored-by: sivabalan <n.siva.b@gmail.com >
2022-05-11 08:08:31 -04:00
Sivabalan Narayanan
6285a239a3
[HUDI-3995] Making perf optimizations for bulk insert row writer path ( #5462 )
...
- Avoid using udf for key generator for SimpleKeyGen and NonPartitionedKeyGen.
- Fixed NonPartitioned Key generator to directly fetch record key from row rather than involving GenericRecord.
- Other minor fixes around using static values instead of looking up hashmap.
2022-05-09 12:40:22 -04:00
guanziyue
abb4893b25
[HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor exit gracefully ( #4264 )
2022-05-05 13:49:34 -07:00
Sagar Sumit
1562bb658f
[HUDI-4031] Avoid clustering update handling when no pending replacecommit ( #5487 )
2022-05-04 10:17:11 -04:00
xicm
f492c52ee4
[HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig ( #5308 )
...
Co-authored-by: xicm <xicm@asiainfo.com >
2022-04-29 16:21:52 -07:00
LiChuang
4e928a6fe1
[HUDI-3943] Some description fixes for 0.10.1 docs ( #5447 )
2022-04-28 15:18:56 -07:00
Danny Chan
e1ccf2e00b
[HUDI-3977] Flink hudi table with date type partition path throws HoodieNotSupportedException ( #5432 )
2022-04-27 13:19:55 +08:00
Yuwei XIAO
f2ba0fead2
[HUDI-3085] Improve bulk insert partitioner abstraction ( #4441 )
2022-04-25 18:42:17 +08:00
Alexey Kudinkin
c05a4e7b6f
[HUDI-3934] Fix Spark32HoodieParquetFileFormat not being compatible w/ Spark 3.2.0 ( #5378 )
...
- Due to the fact that Spark 3.2.1 is non-BWC w/ 3.2.0, we have to handle all these incompatibilities in Spark32HoodieParquetFileFormat. This PR is addressing that.
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-04-21 21:00:38 -04:00
xiarixiaoyao
037f89ee7c
[HUDI-3921] Fixed schema evolution cannot work with HUDI-3855 ( #5376 )
...
- when columns names are renamed (schema evolution enabled), while copying records from old data file with HoodieMergeHande, renamed columns wasn't handled well.
2022-04-21 18:27:54 -04:00
Sagar Sumit
de5fa1fe03
[HUDI-3940] Fix retry count increment in lock manager ( #5387 )
2022-04-21 16:52:05 -04:00
Alexey Kudinkin
4b296f79cc
[HUDI-3935] Adding config to fallback to enabled Partition Values extraction from Partition path ( #5377 )
2022-04-21 01:36:19 -07:00
Sivabalan Narayanan
a9506aa545
[HUDI-3938] Fix default value for num retries to acquire lock ( #5380 )
2022-04-21 01:08:43 -07:00
Alexey Kudinkin
f7544e23ac
[HUDI-3204] Fixing partition-values being derived from partition-path instead of source columns ( #5364 )
...
- Scaffolded `Spark24HoodieParquetFileFormat` extending `ParquetFileFormat` and overriding the behavior of adding partition columns to every row
- Amended `SparkAdapter`s `createHoodieParquetFileFormat` API to be able to configure whether to append partition values or not
- Fallback to append partition values in cases when the source columns are not persisted in data-file
- Fixing HoodieBaseRelation incorrectly handling mandatory columns
2022-04-20 19:30:27 +08:00
Sagar Sumit
4f44e6aeb5
[HUDI-3899] Drop index to delete pending index instants from timeline if applicable ( #5342 )
...
Co-authored-by: sivabalan <n.siva.b@gmail.com >
2022-04-18 22:28:46 -04:00
Sagar Sumit
1718bcab84
[HUDI-3707] Fix target schema handling in HoodieSparkUtils while creating RDD ( #5347 )
2022-04-18 13:34:04 -04:00
董可伦
b8e465fdfc
[MINOR] Fix typos in log4j-surefire.properties ( #5212 )
2022-04-15 13:33:37 -07:00
董可伦
99dd1cb6e6
[HUDI-3835] Add UT for delete in java client ( #5270 )
2022-04-15 15:03:48 -04:00
Sivabalan Narayanan
57612c5c32
[HUDI-3848] Fixing restore with cleaned up commits ( #5288 )
2022-04-15 14:47:53 -04:00
Y Ethan Guo
bab691692e
[HUDI-3686] Fix inline and async table service check in HoodieWriteConfig ( #5307 )
2022-04-13 17:33:26 -04:00
Alexey Kudinkin
7b78dff45f
[HUDI-3855] Fixing FILENAME_METADATA_FIELD not being correctly updated in HoodieMergeHandle ( #5296 )
...
Fixing FILENAME_METADATA_FIELD not being correctly updated in HoodieMergeHandle, in cases when old-record is carried over from existing file as is.
- Revisited HoodieFileWriter API to accept HoodieKey instead of HoodieRecord
- Fixed FILENAME_METADATA_FIELD not being overridden in cases when simply old record is carried over
- Exposing standard JVM's debugger ports in Docker setup
2022-04-12 20:42:15 -04:00
Alexey Kudinkin
101b82a679
[HUDI-3839] Fixing incorrect selection of MT partitions to be updated ( #5274 )
...
* Fixing incorrect selection of MT partitions to be updated
* Ensure that metadata partitions table config is inherited correctly
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com >
2022-04-12 13:37:52 +05:30
Sivabalan Narayanan
f91e9e63e1
[HUDI-3799] Fixing not deleting empty instants w/o archiving ( #5261 )
2022-04-11 21:02:43 -07:00
Sagar Sumit
3d8fc78c66
[HUDI-3844] Update props in indexer based on table config ( #5293 )
2022-04-11 18:16:06 -04:00
Sivabalan Narayanan
2245a9515f
[HUDI-3798] Fixing ending of a transaction by different owner and removing some extraneous methods in trxn manager ( #5255 )
2022-04-11 10:16:07 +05:30
董可伦
15c264535f
[MINOR] Fix typos in the comments of HoodieMergeHandle ( #5271 )
2022-04-09 17:51:58 -07:00
Y Ethan Guo
3e97c88c4f
[HUDI-3807] Add a new config to control the use of metadata index in HoodieBloomIndex ( #5268 )
2022-04-09 15:30:11 -04:00
Alexey Kudinkin
81b25c543a
[HUDI-3825] Fixing Column Stats Index updating sequence ( #5267 )
2022-04-08 23:14:08 -07:00