wangxianghu
b7a79aa943
[HUDI-3283] Bootstrap support overwrite existing table ( #4647 )
2022-01-20 14:42:52 +04:00
Yann Byron
31b57a256f
[HUDI-3236] use fields'comments persisted in catalog to fill in schema ( #4587 )
2022-01-19 21:44:35 -08:00
Y Ethan Guo
a08a2b7306
[MINOR] Add instructions to build and upload Docker Demo images ( #4612 )
...
* [MINOR] Add instructions to build and upload Docker Demo images
* Add local test instruction
2022-01-20 09:55:28 +05:30
wangxianghu
db93ad2f4b
[HUDI-3277] Filter non-parquet files in bootstrap procedure ( #4639 )
2022-01-19 21:13:51 +04:00
YueZhang
7647562dad
[HUDI-2833][Design] Merge small archive files instead of expanding indefinitely. ( #4078 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-01-18 22:42:35 -08:00
Alexey Kudinkin
4bea758738
[HUDI-3191] Rebasing Hive's FileInputFormat onto AbstractHoodieTableFileIndex ( #4531 )
2022-01-18 14:54:51 -08:00
Thinking Chen
caeea946fb
[HUDI-3245] Convert uppercase letters to lowercase in storage configs ( #4602 )
2022-01-18 14:51:09 -05:00
Yann Byron
a09c231911
[HUDI-2903] get table schema from the last commit with data written ( #4180 )
2022-01-18 10:50:30 -05:00
Danny Chan
45f054ffde
[HUDI-3263] Do not nullify members in HoodieTableFileSystemView#resetViewState to avoid NPE ( #4625 )
2022-01-18 17:46:40 +08:00
EchoLee5
3b56320bd8
[HUDI-3261] Read rt table by hive cli throw NoSuchMethodError ( #4624 )
2022-01-18 16:58:08 +08:00
wangxianghu
3d93e857cc
[MINOR] Minor improvement in JsonkafkaSource ( #4620 )
2022-01-18 11:13:05 +04:00
RexAn
f18447406d
[HUDI-1558] Struct Stream Source Support Spark3 ( #4586 )
...
Co-authored-by: Hui An <hui.an@shopee.com >
2022-01-18 11:08:33 +08:00
董可伦
20e7983866
[HUDI-3252] Avoid creating empty requestedReplaceCommit in the startCommit method ( #4515 )
2022-01-17 17:28:18 -05:00
Yuwei XIAO
d36533735f
[HUDI-3194] fix MOR snapshot query during compaction ( #4540 )
2022-01-17 17:24:24 -05:00
Danny Chan
36a9f63e45
[HUDI-3257] Excluding clustering instants from pending rollback info ( #4616 )
2022-01-17 18:18:45 +08:00
Alexey Kudinkin
75caa7d3d8
[HUDI-3179] Extracted common AbstractHoodieTableFileIndex to be shared across engines ( #4520 )
2022-01-16 22:46:20 -08:00
xiaotianzhang01
ed92c217ed
[MINOR] Delete unused parameter in TablePathUtils ( #4595 )
...
Co-authored-by: zhangxiaotian13 <zhangxiaotian13@jd.com >
2022-01-16 22:24:43 -08:00
Yann Byron
d2dda55794
[HUDI-2968] add UT for update/delete on non-pk condition ( #4568 )
2022-01-16 12:02:12 -08:00
0x574C
28b3b6ad8f
[MINOR] Remove org.apache.directory.api.util.Strings import ( #4601 )
2022-01-16 16:58:18 +08:00
董可伦
822230d9ea
[MINOR] Optimize variable names and logs ( #4581 )
2022-01-16 16:09:22 +08:00
Yann Byron
5e0171a5ee
[HUDI-3198] Improve Spark SQL create table from existing hudi table ( #4584 )
...
To modify SQL statement for creating hudi table based on an existing hudi path.
From:
```sql
create table hudi_tbl using hudi tblproperties (primaryKey='id', preCombineField='ts', type='cow') partitioned by (pt) location '/path/to/hudi'
```
To:
```sql
create table hudi_tbl using hudi location '/path/to/hudi'
```
2022-01-14 10:15:29 -08:00
Y Ethan Guo
53f75f84b8
[HUDI-2785] Add Trino setup in Docker Demo ( #4300 )
...
* [HUDI-2785] Add Trino setup in Docker Demo
* Update docker account and remove unnecessary configs
* Adjust sparkadhoc Dockerfile
2022-01-14 22:08:55 +05:30
Y Ethan Guo
7d163ee3de
[MINOR] Fix local flaky test in TestFSUtils ( #4596 )
...
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com >
2022-01-13 22:48:57 -08:00
leesf
5ce45c440b
[HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation ( #4514 )
...
* Introduce hudi-spark3-common and hudi-spark2-common modules to place classes that would be reused in different spark versions, also introduce hudi-spark3.1.x to support spark 3.1.x.
* Introduce hudi format under hudi-spark2, hudi-spark3, hudi-spark3.1.x modules and change the hudi format in original hudi-spark module to hudi_v1 format.
* Manually tested on Spark 3.1.2 and Spark 3.2.0 SQL.
* Added a README.md file under hudi-spark-datasource module.
2022-01-14 13:42:35 +08:00
Sagar Sumit
195dac90fa
[MINOR] Disable flaky tests to unlock CI ( #4592 )
2022-01-13 19:43:27 -08:00
Sagar Sumit
209f91cb33
[HUDI-3010] Unbundle parquet-avro and shade other dependencies in prsto bundle ( #4551 )
2022-01-12 20:00:24 -08:00
Y Ethan Guo
397795c7d0
[HUDI-3007] Fix issues in HoodieRepairTool ( #4564 )
2022-01-12 09:03:27 -08:00
Sagar Sumit
12e95771ee
[HUDI-3235] Fix ClassNotFoundException due to log4j-core dependency ( #4574 )
...
- Move log4j-core to top level pom
2022-01-12 11:53:43 -05:00
Sagar Sumit
8a40d95506
[HUDI-3225] Claim RFC-45 for async metadata indexing ( #4569 )
2022-01-12 11:53:01 -05:00
todd5167
2969fb3835
[HUDI-3233] Make metadata commit synchronous for flink batch
...
close apache/hudi#4561
2022-01-12 20:22:53 +08:00
YueZhang
9fe28e56b4
[HUDI-3045] New clustering regex match config to choose partitions when building clustering plan ( #4346 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-01-11 23:23:55 -08:00
董可伦
017ddbbfac
[MINOR] Fix typos ( #4567 )
2022-01-11 23:17:10 -08:00
Town
4b0111974f
[HUDI-3184] hudi-flink support timestamp-micros ( #4548 )
...
* support both avro and parquet code path
* string rowdata conversion is also supported
2022-01-12 10:53:51 +08:00
Pratyaksh Sharma
a392e9ba46
[HUDI-485] Corrected the check for incremental sql ( #2768 )
...
* [HUDI-485]: corrected the check for incremental sql
* [HUDI-485]: added tests
* code review comments addressed
* [HUDI-485]: added happy flow test case
2022-01-12 08:22:07 +05:30
Alexey Kudinkin
6cdcd89afa
[HUDI-3094] Unify Hive's InputFormat implementations to avoid duplication ( #4417 )
2022-01-11 15:02:13 -08:00
xuzifu666
4b2fd37fb4
[MINOR] Remove unused static var in HoodieAvroWriteSupport ( #4543 )
2022-01-11 11:53:45 -08:00
Todd Gao
c9bc626299
[HUDI-3211] Claim RFC number for RFC for Hudi Connector for Presto ( #4562 )
2022-01-11 14:08:27 +05:30
Raymond Xu
f74cd57320
[HUDI-3195] Fix spark 3 pom ( #4554 )
...
- drop 3.0.x profile
- update readme
- update build CI bot.yml
- fix spark 3 bundle name
2022-01-10 19:11:22 -08:00
Sivabalan Narayanan
67ad4992e1
Removing extraneous warn logs in ClusteringUtils ( #4553 )
2022-01-11 08:20:14 +05:30
Alexey Kudinkin
f1e3762a94
[HUDI-2950] Addressing performance traps in Bulk Insert/Layout Optimization ( #4234 )
...
* Cleaned up Z-curve/Hilbert ordering seqs:
- Streamlined flow
- Removed unnecessary operations (double-mapping, boxing, etc)
Updated `CollectionUtils::combine` to avoid AL resizing
* Tidying up
* Reducing small objects churn due to Scala/Java conversions by re-using `RowFactory`, passing `Object[]`
* Fixing name resolution (disambiguation overloads)
* `lint`
* Replaced `OverwriteAvroPayloadRecord` w/ `RewriteRecordPayload` to avoid unnecessary Avro ser/de loop
* Added `PathCachingFileName` to avoid fetching substrings every time file-name is fetched;
Inject `PathCachingFileName` into `HoodieWrapperFileSystem.convertPathWithScheme`
* Drastically reducing size of the `ArrayDeque` allocated by `ObjectSizeCalculator`
* XXX
* Missing license
* Fixed refs (after rebase)
* Fixing compilation failure in Scala 2.11
* `PathCachingFileName` > `FileNameCachingPath`
* Tidying up
2022-01-10 18:23:22 -08:00
t0il3ts0ap
c8df9b09d7
[HUDI-3148] Create pushgateway client based on port ( #4497 )
...
Co-authored-by: anoop narang <anoop.narang@navi.com >
Co-authored-by: sivabalan narayanan <n.siva.b@gmail.com >
2022-01-10 18:09:47 -05:00
Y Ethan Guo
f230eca9b5
[MINOR] Fix port number in setupKafka.sh ( #4546 )
2022-01-10 16:07:52 -05:00
Sivabalan Narayanan
7a8b94c82d
[HUDI-3180] Include files from completed commits while bootstrapping metadata table ( #4519 )
2022-01-10 15:33:15 -05:00
Y Ethan Guo
bc95571caa
[HUDI-2735] Allow empty commits in Kafka Connect Sink for Hudi ( #4544 )
2022-01-10 15:31:25 -05:00
Manoj Govindassamy
251d4eb3b6
[HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override ( #4406 )
...
* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override
- Making InProcessLockProvider as the default lock provider when
any async services are enabled and when no lock provider is
explicitly set.
- This is the workaround for metadata table updates racing with
async table serice operations
* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override
- Renaming isAnyTableServicesInline/Async() to areAnyTableServicesInline/Async()
* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override
- Additionally checking for write config properties when verifying
the lock provider override. Updated the unit test for this case.
2022-01-10 08:40:24 +05:30
Sivabalan Narayanan
56f93f4ebd
Removing rollbacks instants from timeline for restore operation ( #4518 )
2022-01-10 07:44:28 +05:30
Thinking Chen
e9a7f49f55
[HUDI-3112] Fix KafkaConnect cannot sync to Hive Problem ( #4458 )
2022-01-09 15:31:57 -08:00
Sivabalan Narayanan
604d9885f1
[HUDI-3009] making some fixes to S3 incremental source ( #4517 )
2022-01-09 12:46:52 -05:00
RexAn
977d3c6dad
[HUDI-3157] Remove aws jars from hudi bundles ( #4542 )
...
Co-authored-by: Hui An <hui.an@shopee.com >
2022-01-09 02:23:46 -08:00
YueZhang
cf362fb2d5
[MINOR] Fix some code style issues based on check-style plugin ( #4532 )
...
Co-authored-by: yuezhang <yuezhang@freewheel.tv >
2022-01-09 01:14:56 -08:00