1
0
Commit Graph

178 Commits

Author SHA1 Message Date
Raymond Xu
2e6e302efe [HUDI-3859] Fix spark profiles and utilities-slim dep (#5297) 2022-04-12 15:33:08 -07:00
RexXiong
5c41e30ac5 [HUDI-3817] shade parquet dependency for hudi-hadoop-mr-bundle (#5250)
Co-authored-by: lvshuang.xjs <lvshuang.xjs@alibaba-inc.com>
2022-04-11 05:44:46 -07:00
Y Ethan Guo
672974c412 [HUDI-3823] Fix hudi-hive-sync-bundle to include HBase dependencies and shading (#5257) 2022-04-07 17:30:33 -07:00
Danny Chan
e33149be9a [HUDI-3808] Flink bulk_insert timestamp(3) can not be read by Spark (#5236) 2022-04-07 15:17:39 +08:00
Raymond Xu
e96f08f355 Moving to 0.12.0-SNAPSHOT on master branch. 2022-04-06 15:24:10 +08:00
Vinoth Govindarajan
20964df770 [HUDI-3357] MVP implementation of BigQuerySyncTool (#5125)
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
2022-04-02 13:18:06 -07:00
xiarixiaoyao
444ff496a4 [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark (#4910)
* [HUDI-2560] introduce id_based schema to support full schema evolution.

* add test for FileBasedInternalSchemaStorageManger and rebase code

* add support for change column type and fix some test case

* fix some bugs encountered in the production env and delete useless code

* fix test error

* rebase code

* fixed some nested schema change bugs

* [HUDI-2429][Stacked On HUDI-2560]Support full schema evolution for spark

* [use dummyInternalSchema instead of null]

* add support for spark3.1.x

* remove support for spark3.1.x , sicne some compile fail

* support spark3.1.x

* rebase and prepare solve all comments

* address all comments

* rebase code

* fixed the count(*) bug

* try to get internalSchema by parser commit file/history file directly, not use metaclient which is time cost
address some comments

* fixed all comments

* fix new comments

* rebase code,fix UT failed

* fixed mistake

* rebase code ,fixed new comments

* rebase code , and prepare for address new comments

* address commits

* address new comments

* fix new issues

* control fallback original write logical
2022-04-01 13:20:24 -07:00
Y Ethan Guo
f6ff95f97c [MINOR][DOCS] Update hudi-utilities-slim-bundle docs (#5184) 2022-03-30 21:48:54 -07:00
Y Ethan Guo
4fb1a590b1 [HUDI-3700] Add hudi-utilities-slim-bundle excluding hudi-spark-datasource modules (#5176) 2022-03-30 18:08:35 -07:00
Raymond Xu
31d4a16deb [HUDI-3536] Add hudi-datahub-sync implementation (#5155) 2022-03-30 14:38:02 -07:00
Alexey Kudinkin
e5a2baeed0 [HUDI-3549] Removing dependency on "spark-avro" (#4955)
Hudi will be taking on promise for it bundles to stay compatible with Spark minor versions (for ex 2.4, 3.1, 3.2): meaning that single build of Hudi (for ex "hudi-spark3.2-bundle") will be compatible with ALL patch versions in that minor branch (in that case 3.2.1, 3.2.0, etc)

To achieve that we'll have to remove (and ban) "spark-avro" as a dependency, which on a few occasions was the root-cause of incompatibility b/w consecutive Spark patch versions (most recently 3.2.1 and 3.2.0, due to this PR).

Instead of bundling "spark-avro" as dependency, we will be copying over some of the classes Hudi depends on and maintain them along the Hudi code-base to make sure we're able to provide for the aforementioned guarantee. To workaround arising compatibility issues we will be applying local patches to guarantee compatibility of Hudi bundles w/in the Spark minor version branches.

Following Hudi modules to Spark minor branches is currently maintained:

"hudi-spark3" -> 3.2.x
"hudi-spark3.1.x" -> 3.1.x
"hudi-spark2" -> 2.4.x
Following classes hierarchies (borrowed from "spark-avro") are maintained w/in these Spark-specific modules to guarantee compatibility with respective minor version branches:

AvroSerializer
AvroDeserializer
AvroUtils
Each of these classes has been correspondingly copied from Spark 3.2.1 (for 3.2.x branch), 3.1.2 (for 3.1.x branch), 2.4.4 (for 2.4.x branch) into their respective modules.

SchemaConverters class in turn is shared across all those modules given its relative stability (there're only cosmetical changes from 2.4.4 to 3.2.1).
All of the aforementioned classes have their corresponding scope of visibility limited to corresponding packages (org.apache.spark.sql.avro, org.apache.spark.sql) to make sure broader code-base does not become dependent on them and instead relies on facades abstracting them.

Additionally, given that Hudi plans on supporting all the patch versions of Spark w/in aforementioned minor versions branches of Spark, additional build steps were added to validate that Hudi could be properly compiled against those versions. Testing, however, is performed against the most recent patch versions of Spark with the help of Azure CI.

Brief change log:
- Removing spark-avro bundling from Hudi by default
- Scaffolded Spark 3.2.x hierarchy
- Bootstrapped Spark 3.1.x Avro serializer/deserializer hierarchy
- Bootstrapped Spark 2.4.x Avro serializer/deserializer hierarchy
- Moved ExpressionCodeGen,ExpressionPayload into hudi-spark module
- Fixed AvroDeserializer to stay compatible w/ both Spark 3.2.1 and 3.2.0
- Modified bot.yml to build full matrix of support Spark versions
- Removed "spark-avro" dependency from all modules
- Fixed relocation of spark-avro classes in bundles to assist in running integ-tests.
2022-03-29 14:44:47 -04:00
Y Ethan Guo
eaa4c4f2e2 [HUDI-1180] Upgrade HBase to 2.4.9 (#5004)
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
2022-03-24 19:04:53 -07:00
Danny Chan
799c78e688 [HUDI-3665] Support flink multiple versions (#5072) 2022-03-21 10:34:50 +08:00
Thinking Chen
6ed7106e59 [HUDI-3606] Add org.objenesis:objenesis to hudi-timeline-server-bundle pom (#5017) 2022-03-15 15:06:50 +04:00
wangxianghu
003c6ee73e [MINODR] Remove repeated kafka-clients dependencies (#5034) 2022-03-14 18:24:06 +04:00
从大数据到人工智能
01cbddef78 Add hive-standalone-metastore dependency to hudi-flink-bundle module (#4870) 2022-02-23 09:16:21 +08:00
Danny Chan
b87e95d621 [HUDI-3476] Remove the shade pattern for parquet for flink bundle jar (#4869) 2022-02-22 19:21:57 +08:00
Danny Chan
2844a77b43 [HUDI-3439] Remove the hive shade pattern for flink bundle jar (#4833) 2022-02-17 22:42:39 +08:00
Alexey Kudinkin
464027ec37 [HUDI-3239] Convert BaseHoodieTableFileIndex to Java (#4669)
Converting BaseHoodieTableFileIndex to Java, removing Scala as a dependency from "hudi-common"
2022-02-09 18:42:08 -05:00
Sivabalan Narayanan
e72553accf [HUDI-3262] Fixing utilities and integ test suite bundle to include hudi spark datasource (#4670) 2022-01-23 08:46:37 -05:00
Danny Chan
64b1426005 [minor] Fix hive-exec scope of flink bundle jar (#4664) 2022-01-23 10:28:41 +08:00
Alexey Kudinkin
4bea758738 [HUDI-3191] Rebasing Hive's FileInputFormat onto AbstractHoodieTableFileIndex (#4531) 2022-01-18 14:54:51 -08:00
EchoLee5
3b56320bd8 [HUDI-3261] Read rt table by hive cli throw NoSuchMethodError (#4624) 2022-01-18 16:58:08 +08:00
leesf
5ce45c440b [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation (#4514)
* Introduce hudi-spark3-common and hudi-spark2-common modules to place classes that would be reused in different spark versions, also introduce hudi-spark3.1.x to support spark 3.1.x.
* Introduce hudi format under hudi-spark2, hudi-spark3, hudi-spark3.1.x modules and change the hudi format in original hudi-spark module to hudi_v1 format.
* Manually tested on Spark 3.1.2 and Spark 3.2.0 SQL.
* Added a README.md file under hudi-spark-datasource module.
2022-01-14 13:42:35 +08:00
Sagar Sumit
209f91cb33 [HUDI-3010] Unbundle parquet-avro and shade other dependencies in prsto bundle (#4551) 2022-01-12 20:00:24 -08:00
RexAn
977d3c6dad [HUDI-3157] Remove aws jars from hudi bundles (#4542)
Co-authored-by: Hui An <hui.an@shopee.com>
2022-01-09 02:23:46 -08:00
Sagar Sumit
46bb00e4df [HUDI-3139] Shade htrace and parquet-avro in presto bundle (#4495)
Filter out unnecessary classes
2022-01-08 10:29:36 -05:00
Udit Mehrotra
9412281cb1 [HUDI-2983] Remove Log4j2 transitive dependencies (#4281) 2021-12-28 07:15:05 -08:00
Danny Chan
2dcb3f0062 [HUDI-2985] Shade jackson for hudi flink bundle jar (#4284) 2021-12-11 14:40:57 +08:00
Y Ethan Guo
72901a33a1 [HUDI-2784] Add a hudi-trino-bundle for Trino (#4279) 2021-12-10 14:27:22 -08:00
Danny Chan
bd08470421 [HUDI-2957] Shade kryo jar for flink bundle jar (#4251) 2021-12-09 10:16:42 +08:00
wenningd
4a437f25d3 [MINOR] Use maven-shade-plugin version for hudi-timeline-server-bundle from main pom.xml (#4209)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-12-06 12:29:18 -08:00
Sivabalan Narayanan
52aae36b53 [MINOR] Fixing integ test suite for hudi-aws and archival validation (#4142) 2021-11-28 20:11:50 -05:00
yuzhao.cyz
a1d0ff4209 Moving to 0.11.0-SNAPSHOT on master branch. 2021-11-27 17:22:10 +08:00
xiarixiaoyao
780a2ac5b2 [HUDI-2102] Support hilbert curve for hudi (#3952)
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
2021-11-26 23:20:19 -08:00
rmahindra123
9028e6e1e4 [HUDI-2864] Fix README and scripts with current limitations of hive sync (#4129)
* Fix README with current limitations of hive sync

* Fix README with current limitations of hive sync

* Fix dep issue

* Fix Copy on Write flow

Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
2021-11-26 15:09:32 -08:00
Danny Chan
f5da9b50fa [MINOR] Include hudi-aws in flink bundle jar (#4127)
HUDI-2801 makes this jar as required.
2021-11-26 14:36:44 +08:00
Ron
38585e4e57 [HUDI-2851] Shade org.apache.hadoop.hive.ql.optimizer package for flink bundle jar (#4104) 2021-11-26 11:27:21 +08:00
rmahindra123
7286b56d30 [HUDI-2853] Add JMX deps in hudi utilities and kafka connect bundles (#4108)
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
2021-11-24 19:03:01 -05:00
rmahindra123
fbff0799b9 [HUDI-2325] Add hive sync support to kafka connect (#3660)
Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
2021-11-23 15:48:06 -08:00
zhangyue19921010
9ed28b1570 [HUDI-2409] Using HBase shaded jars in Hudi presto bundle (#3623)
* using hbase-shaded-jars-in-hudi-presto-hundle

* test

* add hudi-common-bundle

* code review

* code review

* code review

* code review

* test

* test

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-11-23 11:25:12 +05:30
Ron
6cc97cc0c9 Remove the aws packages from hudi flink bundle jar (#4050) 2021-11-20 11:55:12 +08:00
wenningd
1ee12cfa6f [HUDI-2314] Add support for DynamoDb based lock provider (#3486)
- Co-authored-by: Wenning Ding <wenningd@amazon.com>
- Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
2021-11-17 12:09:31 -05:00
Danny Chan
689020f303 [HUDI-2684] Use DefaultHoodieRecordPayload when precombine field is specified specifically (#3922) 2021-11-04 16:23:36 +08:00
Alexey Kudinkin
b12a25b0b1 [MINOR] Fixed RAT config for "hudi-utilities-bundle" to ignore transient build-bound artifiacts (#3909) 2021-11-02 23:06:26 -04:00
vinoyang
13b637ddc3 [HUDI-2643] Remove duplicated hbase-common with tests classifier exists in bundles (#3886) 2021-11-01 20:11:00 +08:00
vinoyang
b1c4acf0ae [HUDI-2614] Remove duplicated hadoop-hdfs with tests classifier exists in bundles (#3864) 2021-10-26 22:36:10 +08:00
rmahindra123
3686c25fae [HUDI-2469] [Kafka Connect] Replace json based payload with protobuf for Transaction protocol. (#3694)
* Substitue Control Event with protobuf

* Fix tests

* Fix unit tests

* Add javadocs

* Add javadocs

* Address reviewer comments

Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
2021-10-19 14:29:48 -07:00
Danny Chan
588a34aa95 [HUDI-2571] Remove include-flink-sql-connector-hive profile from flink bundle (#3818) 2021-10-18 17:34:49 +08:00
yiduwangkai
dfdfbbedae HUDI-2569 shaded hive (#3816)
Co-authored-by: wangkai9 <wangkai9@tuhu.cn>
2021-10-18 17:12:13 +08:00