lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Y Ethan Guo	4fb1a590b1	[HUDI-3700] Add hudi-utilities-slim-bundle excluding hudi-spark-datasource modules (#5176 )	2022-03-30 18:08:35 -07:00
Y Ethan Guo	9830005e9b	[HUDI-3681] Provision additional hudi-spark-bundle with different versions (#5171 )	2022-03-30 17:35:56 -07:00
Raymond Xu	31d4a16deb	[HUDI-3536] Add hudi-datahub-sync implementation (#5155 )	2022-03-30 14:38:02 -07:00
Alexey Kudinkin	e5a2baeed0	[HUDI-3549] Removing dependency on "spark-avro" (#4955 ) Hudi will be taking on promise for it bundles to stay compatible with Spark minor versions (for ex 2.4, 3.1, 3.2): meaning that single build of Hudi (for ex "hudi-spark3.2-bundle") will be compatible with ALL patch versions in that minor branch (in that case 3.2.1, 3.2.0, etc) To achieve that we'll have to remove (and ban) "spark-avro" as a dependency, which on a few occasions was the root-cause of incompatibility b/w consecutive Spark patch versions (most recently 3.2.1 and 3.2.0, due to this PR). Instead of bundling "spark-avro" as dependency, we will be copying over some of the classes Hudi depends on and maintain them along the Hudi code-base to make sure we're able to provide for the aforementioned guarantee. To workaround arising compatibility issues we will be applying local patches to guarantee compatibility of Hudi bundles w/in the Spark minor version branches. Following Hudi modules to Spark minor branches is currently maintained: "hudi-spark3" -> 3.2.x "hudi-spark3.1.x" -> 3.1.x "hudi-spark2" -> 2.4.x Following classes hierarchies (borrowed from "spark-avro") are maintained w/in these Spark-specific modules to guarantee compatibility with respective minor version branches: AvroSerializer AvroDeserializer AvroUtils Each of these classes has been correspondingly copied from Spark 3.2.1 (for 3.2.x branch), 3.1.2 (for 3.1.x branch), 2.4.4 (for 2.4.x branch) into their respective modules. SchemaConverters class in turn is shared across all those modules given its relative stability (there're only cosmetical changes from 2.4.4 to 3.2.1). All of the aforementioned classes have their corresponding scope of visibility limited to corresponding packages (org.apache.spark.sql.avro, org.apache.spark.sql) to make sure broader code-base does not become dependent on them and instead relies on facades abstracting them. Additionally, given that Hudi plans on supporting all the patch versions of Spark w/in aforementioned minor versions branches of Spark, additional build steps were added to validate that Hudi could be properly compiled against those versions. Testing, however, is performed against the most recent patch versions of Spark with the help of Azure CI. Brief change log: - Removing spark-avro bundling from Hudi by default - Scaffolded Spark 3.2.x hierarchy - Bootstrapped Spark 3.1.x Avro serializer/deserializer hierarchy - Bootstrapped Spark 2.4.x Avro serializer/deserializer hierarchy - Moved ExpressionCodeGen,ExpressionPayload into hudi-spark module - Fixed AvroDeserializer to stay compatible w/ both Spark 3.2.1 and 3.2.0 - Modified bot.yml to build full matrix of support Spark versions - Removed "spark-avro" dependency from all modules - Fixed relocation of spark-avro classes in bundles to assist in running integ-tests.	2022-03-29 14:44:47 -04:00
Y Ethan Guo	eaa4c4f2e2	[HUDI-1180] Upgrade HBase to 2.4.9 (#5004 ) Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>	2022-03-24 19:04:53 -07:00
Y Ethan Guo	44ab3b73ed	[HUDI-3706] Downgrade maven surefire and failsafe version (#5123 )	2022-03-24 09:31:46 -07:00
Raymond Xu	686da41696	[HUDI-3689] Fix UT failures in TestHoodieDeltaStreamer (#5120 )	2022-03-24 09:10:33 -07:00
Danny Chan	799c78e688	[HUDI-3665] Support flink multiple versions (#5072 )	2022-03-21 10:34:50 +08:00
Sivabalan Narayanan	d40adfa2d7	[HUDI-3620] Adding spark3.2.0 profile (#5038 )	2022-03-14 19:14:00 -04:00
Raymond Xu	ed26c5265c	[HUDI-3584] Skip integ test modules by default (#4986 )	2022-03-08 06:32:04 -08:00
ForwardXu	25385805aa	[HUDI-3574] Improve maven module configs for different spark profiles (#4970 )	2022-03-08 01:01:05 -08:00
Alexey Kudinkin	85e8a5c4de	[HUDI-1296] Support Metadata Table in Spark Datasource (#4789 ) * Bootstrapping initial support for Metadata Table in Spark Datasource - Consolidated Avro/Row conversion utilities to center around Spark's AvroDeserializer ; removed duplication - Bootstrapped HoodieBaseRelation - Updated HoodieMergeOnReadRDD to be able to handle Metadata Table - Modified MOR relations to be able to read different Base File formats (Parquet, HFile)	2022-02-24 16:23:13 -05:00
Yann Byron	0c950181aa	[HUDI-3423] upgrade spark to 3.2.1 (#4815 )	2022-02-21 16:52:21 -08:00
Sagar Sumit	ed106f671e	[HUDI-2809] Introduce a checksum mechanism for validating hoodie.properties (#4712 ) Fix dependency conflict Fix repairs command Implement putIfAbsent for DDB lock provider Add upgrade step and validate while fetching configs Validate checksum for latest table version only while fetching config Move generateChecksum to BinaryUtil Rebase and resolve conflict Fix table version check	2022-02-18 10:17:06 +05:30
Yuqi Gu	e639d99387	[HUDI-1657] Fix the build on aarch64, Fedora 33 (#4617 )	2022-02-14 15:10:18 -08:00
Yann Byron	d971974063	[HUDI-3333] fix that getNestedFieldVal breaks with Spark 3.2 (#4783 )	2022-02-10 06:12:16 -08:00
Danny Chan	b3b44236fe	[HUDI-3389] Bump flink version to 1.14.3 (#4776 )	2022-02-10 11:32:01 +08:00
Sivabalan Narayanan	16138db4f2	[HUDI-3368] Revert "[HUDI-3306] Upgrade rocksdb version (#4663 )" (#4733 ) This reverts commit `6f10107998`.	2022-02-01 14:18:38 -05:00
Satyam Raj	6f10107998	[HUDI-3306] Upgrade rocksdb version (#4663 ) Co-authored-by: Satyam Raj <satyam.raj@olacabs.com>	2022-01-24 14:53:20 -05:00
leesf	5ce45c440b	[HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation (#4514 ) * Introduce hudi-spark3-common and hudi-spark2-common modules to place classes that would be reused in different spark versions, also introduce hudi-spark3.1.x to support spark 3.1.x. * Introduce hudi format under hudi-spark2, hudi-spark3, hudi-spark3.1.x modules and change the hudi format in original hudi-spark module to hudi_v1 format. * Manually tested on Spark 3.1.2 and Spark 3.2.0 SQL. * Added a README.md file under hudi-spark-datasource module.	2022-01-14 13:42:35 +08:00
Sagar Sumit	12e95771ee	[HUDI-3235] Fix ClassNotFoundException due to log4j-core dependency (#4574 ) - Move log4j-core to top level pom	2022-01-12 11:53:43 -05:00
Raymond Xu	f74cd57320	[HUDI-3195] Fix spark 3 pom (#4554 ) - drop 3.0.x profile - update readme - update build CI bot.yml - fix spark 3 bundle name	2022-01-10 19:11:22 -08:00
Yann Byron	03a83ffeb5	[HUDI-3195] optimize spark3 pom and modify build command (#4538 )	2022-01-07 23:21:39 -08:00
leesf	188d0338c4	[HUDI-3134] Fix insert error after adding columns on Spark 3.2.0 (#4488 )	2022-01-01 17:38:14 -08:00
Udit Mehrotra	9412281cb1	[HUDI-2983] Remove Log4j2 transitive dependencies (#4281 )	2021-12-28 07:15:05 -08:00
Yann Byron	05942e018c	[HUDI-2811] Support Spark 3.2 (#4270 )	2021-12-28 00:12:44 -08:00
zhangyue19921010	f3f6112b75	[HUDI-3070] Add rerunFailingTestsCount for flakly testes (#4398 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-12-20 19:59:50 -08:00
wenningd	15444c951f	[HUDI-2946] Upgrade maven plugins to be compatible with higher Java versions (#4232 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-12-11 20:18:39 -08:00
Y Ethan Guo	72901a33a1	[HUDI-2784] Add a hudi-trino-bundle for Trino (#4279 )	2021-12-10 14:27:22 -08:00
ForwardXu	63b15607ff	[HUDI-2937] Introduce a pulsar implementation of hoodie write commit … (#4217 ) * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback	2021-12-05 11:51:06 +04:00
yuzhao.cyz	a1d0ff4209	Moving to 0.11.0-SNAPSHOT on master branch.	2021-11-27 17:22:10 +08:00
wenningd	1ee12cfa6f	[HUDI-2314] Add support for DynamoDb based lock provider (#3486 ) - Co-authored-by: Wenning Ding <wenningd@amazon.com> - Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>	2021-11-17 12:09:31 -05:00
Alexey Kudinkin	cbcbec4d38	[MINOR] Fixed checkstyle config to be based off Maven root-dir (requires Maven >=3.3.1 to work properly); (#4009 ) Updated README	2021-11-16 21:30:16 -05:00
Yann Byron	1f17467f73	[HUDI-1869] Upgrading Spark3 To 3.1 (#3844 ) Co-authored-by: pengzhiwei <pengzhiwei2015@icloud.com>	2021-11-02 18:25:12 -07:00
Sivabalan Narayanan	f9bc3e03e5	[MINOR] Adding a deprecated constructor to AbstractSyncHoodieClient (#3902 )	2021-11-02 12:16:38 -04:00
Sagar Sumit	5302b9a4ef	[HUDI-2662] Downloads from Nexus Pentaho repo taking too long (#3901 ) Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>	2021-11-01 19:14:48 -04:00
vinoyang	b1c4acf0ae	[HUDI-2614] Remove duplicated hadoop-hdfs with tests classifier exists in bundles (#3864 )	2021-10-26 22:36:10 +08:00
rmahindra123	3686c25fae	[HUDI-2469] [Kafka Connect] Replace json based payload with protobuf for Transaction protocol. (#3694 ) * Substitue Control Event with protobuf * Fix tests * Fix unit tests * Add javadocs * Add javadocs * Address reviewer comments Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>	2021-10-19 14:29:48 -07:00
rmahindra123	e528dd798a	[HUDI-2394] Implement Kafka Sink Protocol for Hudi for Ingesting Immutable Data (#3592 ) - Fixing packaging, naming of classes - Use of log4j over slf4j for uniformity - More follow-on fixes - Added a version to control/coordinator events. - Eliminated the config added to write config - Fixed fetching of checkpoints based on table type - Clean up of naming, code placement Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-09-10 18:20:26 -07:00
Raymond Xu	38c9b85aa8	[HUDI-2280] Use GitHub Actions to build different scala spark versions (#3556 )	2021-09-01 08:51:00 -07:00
Danny Chan	66f951322a	[HUDI-2191] Bump flink version to 1.13.1 (#3291 )	2021-08-16 18:14:05 +08:00
Udit Mehrotra	3e301196bf	Moving to 0.10.0-SNAPSHOT on master branch.	2021-08-14 18:51:09 -07:00
Sagar Sumit	5cc96e85c1	[HUDI-1897] Deltastreamer source for AWS S3 (#3433 ) - Added two sources for two stage pipeline. a. S3EventsSource that fetches events from SQS and ingests to a meta hoodie table. b. S3EventsHoodieIncrSource reads S3 events from this meta hoodie table, fetches actual objects from S3 and ingests to sink hoodie table. - Added selectors to assist in S3EventsSource. Co-authored-by: Satish M <84978833+satishmittal1111@users.noreply.github.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-14 08:25:10 -04:00
pengzhiwei	3f8ca1a355	[HUDI-2182] Support Compaction Command For Spark Sql (#3277 )	2021-08-06 15:12:10 +08:00
pengzhiwei	0dcd6a8fca	[HUDI-2233] Use HMS To Sync Hive Meta For Spark Sql (#3387 )	2021-08-05 09:57:22 -04:00
pengzhiwei	151f22e43a	[HUDI-2195] Sync Hive Failed When Execute CTAS In Spark2 And Spark3 (#3299 )	2021-07-22 15:33:38 +08:00
Vinay Patil	5a94b6bf54	[HUDI-2192] Clean up Multiple versions of scala libraries detected Warning (#3292 )	2021-07-21 00:33:27 -07:00
Randal Boyle	60e0254e67	[HUDI-1996] Adding functionality to allow the providing of basic auth creds for confluent cloud schema registry (#3097 ) * adding support for basic auth with confluent cloud schema registry	2021-07-05 23:40:23 -07:00
Jintao Guan	b8fe5b91d5	[HUDI-764] [HUDI-765] ORC reader writer Implementation (#2999 ) Co-authored-by: Qingyun (Teresa) Kang <kteresa@uber.com>	2021-06-15 15:21:43 -07:00
Raymond Xu	f922837064	[HUDI-1950] Fix Azure CI failure in TestParquetUtils (#2984 ) * fix azure pipeline configs * add pentaho.org in maven repositories * Make sure file paths with scheme in TestParquetUtils * add azure build status to README	2021-06-15 03:45:17 -07:00

1 2 3 4 5 ...

266 Commits