1
0
Commit Graph

1459 Commits

Author SHA1 Message Date
wangxianghu
040756d8c0 [HUDI-1785] Move OperationConverter to hudi-client-common for code reuse (#2798) 2021-04-12 16:22:33 +08:00
hj2016
1da16dfd2e [HUDI-1784] Added print detailed stack log when hbase connection error (#2799) 2021-04-12 13:46:06 +08:00
wangxianghu
f3777f44fe [MINOR] Remove unused imports and some other checkstyle issues (#2800) 2021-04-11 21:42:34 +08:00
Roc Marshal
b554835053 [MINOR] fix typo. (#2804) 2021-04-11 10:31:07 +08:00
xiarixiaoyao
8d4a7fe33e [HUDI-1783] Support Huawei Cloud Object Storage (#2796) 2021-04-10 13:02:11 +08:00
Danny Chan
6786581c48 [HUDI-1775] Add option for compaction parallelism (#2785) 2021-04-09 13:46:19 +08:00
Vinoth Govindarajan
08e82c469c [HUDI-1762] Added HiveStylePartitionExtractor to support Hive style partitions (#2769) 2021-04-09 01:00:11 -04:00
Gary Li
cf3d2e21eb [MINOR] Update doap with 0.8.0 release (#2772) 2021-04-08 11:06:13 -04:00
hiscat
5b3608f149 [HUDI-1778] Add setter to CompactionPlanEvent and CompactionCommitEvent to have better SE/DE performance for Flink (#2789) 2021-04-08 19:40:37 +08:00
hongdd
ecdbd2517f [HUDI-699] Fix CompactionCommand and add unit test for CompactionCommand (#2325) 2021-04-08 15:35:33 +08:00
Simon
18459d4045 [MINOR] Some unit test code optimize (#2782)
* Optimized code

* Optimized code
2021-04-08 13:35:03 +08:00
hiscat
3a926aacf6 [HUDI-1773] HoodieFileGroup code optimize (#2781) 2021-04-07 18:16:03 +08:00
hiscat
f4f9dd9d83 [HUDI-1772] HoodieFileGroupId compareTo logical error(fileId self compare) (#2780) 2021-04-07 18:10:38 +08:00
li36909
dadd081d45 [HUDI-1751] DeltaStreamer print many unnecessary warn log (#2754) 2021-04-07 00:47:03 -07:00
hiscat
d035fcbb3c [HUDI-1767] Add setter to HoodieKey and HoodieRecordLocation to have better SE/DE performance for Flink (#2779) 2021-04-07 14:13:31 +08:00
li36909
8527590772 [HUDI-1750] Fail to load user's class if user move hudi-spark-bundle jar into spark classpath (#2753) 2021-04-06 22:33:32 -04:00
Harshit Mittal
e692c704da [MINOR] Fix deprecated build link for travis (#2778) 2021-04-07 08:57:10 +08:00
Danny Chan
9c369c607d [HUDI-1757] Assigns the buckets by record key for Flink writer (#2757)
Currently we assign the buckets by record partition path which could
cause hotspot if the partition field is datetime type. Changes to assign
buckets by grouping the record whth their key first, the assignment is
valid if only there is no conflict(two task write to the same bucket).

This patch also changes the coordinator execution to be asynchronous.
2021-04-06 19:06:41 +08:00
li36909
920537cac8 [HUDI-1749] Clean/Compaction/Rollback command maybe never exit when operation fail (#2752) 2021-04-05 23:23:15 -07:00
Harshit Mittal
e970e1f483 [HUDI-1696] add apache commons-codec dependency to flink-bundle explicitly (#2758) 2021-04-01 23:07:30 -07:00
Roc Marshal
94a5e72f16 [HUDI-1737][hudi-client] Code Cleanup: Extract common method in HoodieCreateHandle & FlinkCreateHandle (#2745) 2021-04-02 11:39:05 +08:00
pengzhiwei
684622c7c9 [HUDI-1591] Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning (#2651) 2021-04-01 11:12:28 -07:00
Danny Chan
9804662bc8 [HUDI-1738] Emit deletes for flink MOR table streaming read (#2742)
Current we did a soft delete for DELETE row data when writes into hoodie
table. For streaming read of MOR table, the Flink reader detects the
delete records and still emit them if the record key semantics are still
kept.

This is useful and actually a must for streaming ETL pipeline
incremental computation.
2021-04-01 15:25:31 +08:00
vinoyang
fe16d0de7c [MINOR] Delete useless UpsertPartitioner for flink integration (#2746) 2021-03-31 16:36:42 +08:00
Sebastian Bernauer
aa0da72c59 Preparation for Avro update (#2650) 2021-03-30 21:50:17 -07:00
leo-Iamok
8bc65b9318 [HUDI-1731] Rename UpsertPartitioner in hudi-java-client (#2734)
Co-authored-by: lei.zhu <lei.zhu@envisioncn.com>
2021-03-31 11:06:04 +08:00
vinoyang
3cab928b50 [HUDI-1735] Add hive-exec dependency for hudi-examples (#2737) 2021-03-30 21:35:16 +08:00
Gary Li
050626ad6c [MINOR] Add Missing Apache License to test files (#2736) 2021-03-29 07:17:23 -07:00
garyli1019
e069b64e10 [HOTFIX] fix deploy staging jars script 2021-03-29 06:04:48 -07:00
Gary Li
4db970dc8a [HOTFIX] Disable ITs for Spark3 and scala2.12 (#2733) 2021-03-29 06:04:48 -07:00
Gary Li
452f5e2d66 [HOTFIX] close spark session in functional test suite and disable spark3 test for spark2 (#2727) 2021-03-29 06:04:48 -07:00
Danny Chan
d415d45416 [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink writer (#2732) 2021-03-29 10:47:29 +08:00
Shen Hong
ecbd389a3f [HUDI-1478] Introduce HoodieBloomIndex to hudi-java-client (#2608) 2021-03-28 20:28:40 +08:00
n3nash
bec70413c0 [HUDI-1728] Fix MethodNotFound for HiveMetastore Locks (#2731) 2021-03-27 10:07:10 -07:00
Danny Chan
8b774fe331 [HUDI-1495] Bump Flink version to 1.12.2 (#2718) 2021-03-26 14:25:57 +08:00
garyli1019
6e803e08b1 Moving to 0.9.0-SNAPSHOT on master branch. 2021-03-24 21:37:14 +08:00
Danny Chan
29b79c99b0 [hotfix] Log the error message for creating table source first (#2711) 2021-03-24 18:25:37 +08:00
n3nash
01a1d7997b [HUDI-1712] Rename & standardize config to match other configs (#2708) 2021-03-24 17:24:02 +08:00
Danny Chan
03668dbaf1 [HUDI-1710] Read optimized query type for Flink batch reader (#2702)
Read optimized query returns the records from:

* COW table: the latest parquet files
* MOR table: parquet file records from the latest compaction committed
2021-03-23 18:41:30 -07:00
legendtkl
0e6909d3e2 [MINOR][DOCUMENT] Update README doc for integ test (#2703) 2021-03-23 20:21:56 +08:00
n3nash
d7b18783bd [HUDI-1709] Improving config names and adding hive metastore uri config (#2699) 2021-03-22 01:22:06 -07:00
Liulietong
ce3e8ec870 [HUDI-1667]: Fix a null value related bug for spark vectorized reader. (#2636) 2021-03-20 07:54:20 -07:00
Volodymyr Burenin
900de34e45 [HUDI-1650] Custom avro kafka deserializer. (#2619)
* Custom avro kafka deserializer

Co-authored-by: volodymyr.burenin <volodymyr.burenin@cloudkitchens.com>
Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>
2021-03-20 00:51:08 -07:00
Sivabalan Narayanan
161d530f93 Fixing kafka auto.reset.offsets config param key (#2691) 2021-03-19 12:54:29 -07:00
Sivabalan Narayanan
55a489c769 [1568] Fixing spark3 bundles (#2625)
- [1568] Fixing spark3 bundles
2021-03-19 14:21:36 -04:00
Danny Chan
f74828fca1 [HUDI-1705] Flush as per data bucket for mini-batch write (#2695)
Detects the buffer size for each data bucket before flushing. So that we
avoid flushing data buckets with few records.
2021-03-19 16:30:54 +08:00
Jintao Guan
1277c62398 [HUDI-1653] Add support for composite keys in NonpartitionedKeyGenerator (#2627)
* [HUDI-1653] Add support for composite keys in NonpartitionedKeyGenerator

* update NonpartitionedKeyGenerator to support composite record keys

* update NonpartitionedKeyGenerator
2021-03-18 15:33:31 -07:00
wangxianghu
e602e5dfb9 [MINOR] Remove unused var in AbstractHoodieWriteClient (#2693) 2021-03-18 14:56:02 -07:00
xiarixiaoyao
d429169ff7 [HUDI-1688]hudi write should uncache rdd, when the write operation is finnished (#2673) 2021-03-18 10:19:18 -07:00
Danny Chan
f1e0018f12 [HUDI-1704] Use PRIMARY KEY syntax to define record keys for Flink Hudi table (#2694)
The SQL PRIMARY KEY semantics is very same with Hoodie record key, using
PRIMARY KEY is more straight-forward way instead of a table option:
hoodie.datasource.write.recordkey.field.

After this change, both PRIMARY KEY and table option can define hoodie
record key, while the PRIMARY KEY has higher priority if both are
defined.

Note: a column with PRIMARY KEY constraint is forced to be non-nullable.
2021-03-18 20:21:52 +08:00