1
0
Commit Graph

2377 Commits

Author SHA1 Message Date
Y Ethan Guo
a08a2b7306 [MINOR] Add instructions to build and upload Docker Demo images (#4612)
* [MINOR] Add instructions to build and upload Docker Demo images

* Add local test instruction
2022-01-20 09:55:28 +05:30
wangxianghu
db93ad2f4b [HUDI-3277] Filter non-parquet files in bootstrap procedure (#4639) 2022-01-19 21:13:51 +04:00
YueZhang
7647562dad [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely. (#4078)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-01-18 22:42:35 -08:00
Alexey Kudinkin
4bea758738 [HUDI-3191] Rebasing Hive's FileInputFormat onto AbstractHoodieTableFileIndex (#4531) 2022-01-18 14:54:51 -08:00
Thinking Chen
caeea946fb [HUDI-3245] Convert uppercase letters to lowercase in storage configs (#4602) 2022-01-18 14:51:09 -05:00
Yann Byron
a09c231911 [HUDI-2903] get table schema from the last commit with data written (#4180) 2022-01-18 10:50:30 -05:00
Danny Chan
45f054ffde [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#resetViewState to avoid NPE (#4625) 2022-01-18 17:46:40 +08:00
EchoLee5
3b56320bd8 [HUDI-3261] Read rt table by hive cli throw NoSuchMethodError (#4624) 2022-01-18 16:58:08 +08:00
wangxianghu
3d93e857cc [MINOR] Minor improvement in JsonkafkaSource (#4620) 2022-01-18 11:13:05 +04:00
RexAn
f18447406d [HUDI-1558] Struct Stream Source Support Spark3 (#4586)
Co-authored-by: Hui An <hui.an@shopee.com>
2022-01-18 11:08:33 +08:00
董可伦
20e7983866 [HUDI-3252] Avoid creating empty requestedReplaceCommit in the startCommit method (#4515) 2022-01-17 17:28:18 -05:00
Yuwei XIAO
d36533735f [HUDI-3194] fix MOR snapshot query during compaction (#4540) 2022-01-17 17:24:24 -05:00
Danny Chan
36a9f63e45 [HUDI-3257] Excluding clustering instants from pending rollback info (#4616) 2022-01-17 18:18:45 +08:00
Alexey Kudinkin
75caa7d3d8 [HUDI-3179] Extracted common AbstractHoodieTableFileIndex to be shared across engines (#4520) 2022-01-16 22:46:20 -08:00
xiaotianzhang01
ed92c217ed [MINOR] Delete unused parameter in TablePathUtils (#4595)
Co-authored-by: zhangxiaotian13 <zhangxiaotian13@jd.com>
2022-01-16 22:24:43 -08:00
Yann Byron
d2dda55794 [HUDI-2968] add UT for update/delete on non-pk condition (#4568) 2022-01-16 12:02:12 -08:00
0x574C
28b3b6ad8f [MINOR] Remove org.apache.directory.api.util.Strings import (#4601) 2022-01-16 16:58:18 +08:00
董可伦
822230d9ea [MINOR] Optimize variable names and logs (#4581) 2022-01-16 16:09:22 +08:00
Yann Byron
5e0171a5ee [HUDI-3198] Improve Spark SQL create table from existing hudi table (#4584)
To modify SQL statement for creating hudi table based on an existing hudi path.

From:

```sql
create table hudi_tbl using hudi tblproperties (primaryKey='id', preCombineField='ts', type='cow') partitioned by (pt) location '/path/to/hudi'
```

To:
```sql
create table hudi_tbl using hudi location '/path/to/hudi'
```
2022-01-14 10:15:29 -08:00
Y Ethan Guo
53f75f84b8 [HUDI-2785] Add Trino setup in Docker Demo (#4300)
* [HUDI-2785] Add Trino setup in Docker Demo

* Update docker account and remove unnecessary configs

* Adjust sparkadhoc Dockerfile
2022-01-14 22:08:55 +05:30
Y Ethan Guo
7d163ee3de [MINOR] Fix local flaky test in TestFSUtils (#4596)
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-01-13 22:48:57 -08:00
leesf
5ce45c440b [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation (#4514)
* Introduce hudi-spark3-common and hudi-spark2-common modules to place classes that would be reused in different spark versions, also introduce hudi-spark3.1.x to support spark 3.1.x.
* Introduce hudi format under hudi-spark2, hudi-spark3, hudi-spark3.1.x modules and change the hudi format in original hudi-spark module to hudi_v1 format.
* Manually tested on Spark 3.1.2 and Spark 3.2.0 SQL.
* Added a README.md file under hudi-spark-datasource module.
2022-01-14 13:42:35 +08:00
Sagar Sumit
195dac90fa [MINOR] Disable flaky tests to unlock CI (#4592) 2022-01-13 19:43:27 -08:00
Sagar Sumit
209f91cb33 [HUDI-3010] Unbundle parquet-avro and shade other dependencies in prsto bundle (#4551) 2022-01-12 20:00:24 -08:00
Y Ethan Guo
397795c7d0 [HUDI-3007] Fix issues in HoodieRepairTool (#4564) 2022-01-12 09:03:27 -08:00
Sagar Sumit
12e95771ee [HUDI-3235] Fix ClassNotFoundException due to log4j-core dependency (#4574)
- Move log4j-core to top level pom
2022-01-12 11:53:43 -05:00
Sagar Sumit
8a40d95506 [HUDI-3225] Claim RFC-45 for async metadata indexing (#4569) 2022-01-12 11:53:01 -05:00
todd5167
2969fb3835 [HUDI-3233] Make metadata commit synchronous for flink batch
close apache/hudi#4561
2022-01-12 20:22:53 +08:00
YueZhang
9fe28e56b4 [HUDI-3045] New clustering regex match config to choose partitions when building clustering plan (#4346)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-01-11 23:23:55 -08:00
董可伦
017ddbbfac [MINOR] Fix typos (#4567) 2022-01-11 23:17:10 -08:00
Town
4b0111974f [HUDI-3184] hudi-flink support timestamp-micros (#4548)
* support both avro and parquet code path
* string rowdata conversion is also supported
2022-01-12 10:53:51 +08:00
Pratyaksh Sharma
a392e9ba46 [HUDI-485] Corrected the check for incremental sql (#2768)
* [HUDI-485]: corrected the check for incremental sql

* [HUDI-485]: added tests

* code review comments addressed

* [HUDI-485]: added happy flow test case
2022-01-12 08:22:07 +05:30
Alexey Kudinkin
6cdcd89afa [HUDI-3094] Unify Hive's InputFormat implementations to avoid duplication (#4417) 2022-01-11 15:02:13 -08:00
xuzifu666
4b2fd37fb4 [MINOR] Remove unused static var in HoodieAvroWriteSupport (#4543) 2022-01-11 11:53:45 -08:00
Todd Gao
c9bc626299 [HUDI-3211] Claim RFC number for RFC for Hudi Connector for Presto (#4562) 2022-01-11 14:08:27 +05:30
Raymond Xu
f74cd57320 [HUDI-3195] Fix spark 3 pom (#4554)
- drop 3.0.x profile
- update readme
- update build CI bot.yml
- fix spark 3 bundle name
2022-01-10 19:11:22 -08:00
Sivabalan Narayanan
67ad4992e1 Removing extraneous warn logs in ClusteringUtils (#4553) 2022-01-11 08:20:14 +05:30
Alexey Kudinkin
f1e3762a94 [HUDI-2950] Addressing performance traps in Bulk Insert/Layout Optimization (#4234)
* Cleaned up Z-curve/Hilbert ordering seqs:
  - Streamlined flow
  - Removed unnecessary operations (double-mapping, boxing, etc)
Updated `CollectionUtils::combine` to avoid AL resizing

* Tidying up

* Reducing small objects churn due to Scala/Java conversions by re-using `RowFactory`, passing `Object[]`

* Fixing name resolution (disambiguation overloads)

* `lint`

* Replaced `OverwriteAvroPayloadRecord` w/ `RewriteRecordPayload` to avoid unnecessary Avro ser/de loop

* Added `PathCachingFileName` to avoid fetching substrings every time file-name is fetched;
Inject `PathCachingFileName` into `HoodieWrapperFileSystem.convertPathWithScheme`

* Drastically reducing size of the `ArrayDeque` allocated by `ObjectSizeCalculator`

* XXX

* Missing license

* Fixed refs (after rebase)

* Fixing compilation failure in Scala 2.11

* `PathCachingFileName` > `FileNameCachingPath`

* Tidying up
2022-01-10 18:23:22 -08:00
t0il3ts0ap
c8df9b09d7 [HUDI-3148] Create pushgateway client based on port (#4497)
Co-authored-by: anoop narang <anoop.narang@navi.com>
Co-authored-by: sivabalan narayanan <n.siva.b@gmail.com>
2022-01-10 18:09:47 -05:00
Y Ethan Guo
f230eca9b5 [MINOR] Fix port number in setupKafka.sh (#4546) 2022-01-10 16:07:52 -05:00
Sivabalan Narayanan
7a8b94c82d [HUDI-3180] Include files from completed commits while bootstrapping metadata table (#4519) 2022-01-10 15:33:15 -05:00
Y Ethan Guo
bc95571caa [HUDI-2735] Allow empty commits in Kafka Connect Sink for Hudi (#4544) 2022-01-10 15:31:25 -05:00
Manoj Govindassamy
251d4eb3b6 [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override (#4406)
* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override

 - Making InProcessLockProvider as the default lock provider when
   any async services are enabled and when no lock provider is
   explicitly set.

 - This is the workaround for metadata table updates racing with
   async table serice operations

* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override

 - Renaming isAnyTableServicesInline/Async() to areAnyTableServicesInline/Async()

* [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override

 - Additionally checking for write config properties when verifying
   the lock provider override. Updated the unit test for this case.
2022-01-10 08:40:24 +05:30
Sivabalan Narayanan
56f93f4ebd Removing rollbacks instants from timeline for restore operation (#4518) 2022-01-10 07:44:28 +05:30
Thinking Chen
e9a7f49f55 [HUDI-3112] Fix KafkaConnect cannot sync to Hive Problem (#4458) 2022-01-09 15:31:57 -08:00
Sivabalan Narayanan
604d9885f1 [HUDI-3009] making some fixes to S3 incremental source (#4517) 2022-01-09 12:46:52 -05:00
RexAn
977d3c6dad [HUDI-3157] Remove aws jars from hudi bundles (#4542)
Co-authored-by: Hui An <hui.an@shopee.com>
2022-01-09 02:23:46 -08:00
YueZhang
cf362fb2d5 [MINOR] Fix some code style issues based on check-style plugin (#4532)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2022-01-09 01:14:56 -08:00
Yann Byron
36790709f7 [HUDI-3125] spark-sql write timestamp directly (#4471) 2022-01-08 23:43:25 -08:00
Thinking Chen
0d8ca8da4e [HUDI-3104] Kafka-connect support of hadoop config environments and properties (#4451) 2022-01-08 23:10:17 -08:00