lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
董可伦	bf5a52e51b	[HUDI-2320] Add support ByteArrayDeserializer in AvroKafkaSource (#3502 )	2021-08-30 10:01:15 +08:00
Udit Mehrotra	3e301196bf	Moving to 0.10.0-SNAPSHOT on master branch.	2021-08-14 18:51:09 -07:00
Sagar Sumit	5cc96e85c1	[HUDI-1897] Deltastreamer source for AWS S3 (#3433 ) - Added two sources for two stage pipeline. a. S3EventsSource that fetches events from SQS and ingests to a meta hoodie table. b. S3EventsHoodieIncrSource reads S3 events from this meta hoodie table, fetches actual objects from S3 and ingests to sink hoodie table. - Added selectors to assist in S3EventsSource. Co-authored-by: Satish M <84978833+satishmittal1111@users.noreply.github.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-08-14 08:25:10 -04:00
Randal Boyle	60e0254e67	[HUDI-1996] Adding functionality to allow the providing of basic auth creds for confluent cloud schema registry (#3097 ) * adding support for basic auth with confluent cloud schema registry	2021-07-05 23:40:23 -07:00
pengzhiwei	f760ec543e	[HUDI-1659] Basic Implement Of Spark Sql Support For Hoodie (#2645 ) Main functions: Support create table for hoodie. Support CTAS. Support Insert for hoodie. Including dynamic partition and static partition insert. Support MergeInto for hoodie. Support DELETE Support UPDATE Both support spark2 & spark3 based on DataSourceV1. Main changes: Add sql parser for spark2. Add HoodieAnalysis for sql resolve and logical plan rewrite. Add commands implementation for CREATE TABLE、INSERT、MERGE INTO & CTAS. In order to push down the update&insert logical to the HoodieRecordPayload for MergeInto, I make same change to the HoodieWriteHandler and other related classes. 1、Add the inputSchema for parser the incoming record. This is because the inputSchema for MergeInto is different from writeSchema as there are some transforms in the update& insert expression. 2、Add WRITE_SCHEMA to HoodieWriteConfig to pass the write schema for merge into. 3、Pass properties to HoodieRecordPayload#getInsertValue to pass the insert expression and table schema. Verify this pull request Add TestCreateTable for test create hoodie tables and CTAS. Add TestInsertTable for test insert hoodie tables. Add TestMergeIntoTable for test merge hoodie tables. Add TestUpdateTable for test update hoodie tables. Add TestDeleteTable for test delete hoodie tables. Add TestSqlStatement for test supported ddl/dml currently.	2021-06-07 23:24:32 -07:00
garyli1019	6e803e08b1	Moving to 0.9.0-SNAPSHOT on master branch.	2021-03-24 21:37:14 +08:00
Vinoth Chandar	3719e7b388	Moving to 0.8.0-SNAPSHOT on master branch.	2021-01-20 11:31:22 -08:00
Sivabalan Narayanan	a43e191d6c	[MINOR] Bumping snapshot version to 0.7.0 (#2435 )	2021-01-16 09:56:28 -05:00
wenningd	fce1453fa6	[HUDI-1040] Make Hudi support Spark 3 (#2208 ) * Fix flaky MOR unit test * Update Spark APIs to make it be compatible with both spark2 & spark3 * Refactor bulk insert v2 part to make Hudi be able to compile with Spark3 * Add spark3 profile to handle fasterxml & spark version * Create hudi-spark-common module & refactor hudi-spark related modules Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-12-09 15:52:23 -08:00
Mathieu	1f7add9291	[HUDI-1089] Refactor hudi-client to support multi-engine (#1827 ) - This change breaks `hudi-client` into `hudi-client-common` and `hudi-spark-client` modules - Simple usages of Spark using jsc.parallelize() has been redone using EngineContext#map, EngineContext#flatMap etc - Code changes in the PR, break classes into `BaseXYZ` parent classes with no spark dependencies living in `hudi-client-common` - Classes on `hudi-spark-client` are named `SparkXYZ` extending the parent classes with all the Spark dependencies - To simplify/cleanup, HoodieIndex#fetchRecordLocation has been removed and its usages in tests replaced with alternatives Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-01 14:25:29 -07:00
Mathieu	b883b6d268	[HUDI-1122] Introduce a kafka implementation of hoodie write commit ca… (#1886 )	2020-08-20 23:00:59 +08:00
Bhavani Sudha Saktheeswaran	4226d75144	Moving to 0.6.1-SNAPSHOT on master branch.	2020-08-14 12:54:15 -07:00
Nishith Agarwal	2fc2b01d86	[HUDI-394] Provide a basic implementation of test suite	2020-07-30 21:21:15 -07:00
Pratyaksh Sharma	c7f1a781ab	[HUDI-728]: Implemented custom key generator (#1433 )	2020-07-09 07:35:07 -04:00
Raymond Xu	3b9a30528b	[HUDI-996] Add functional test suite for hudi-utilities (#1746 ) - Share resources for functional tests - Add suite for functional test classes from hudi-utilities	2020-07-05 16:44:31 -07:00
Balaji Varadarajan	74ecc27e92	[HUDI-846][HUDI-848] Enable Incremental cleaning and embedded timeline-server by default (#1634 )	2020-05-20 05:29:43 -07:00
Joey	2600d2de8d	[MINOR] Fix apache-rat violations (#1639 ) * MINOR Fix apache-rat violations. Also, enabling RAT for hudi-utilities and hudi-integ-test	2020-05-18 11:16:49 -07:00
Raymond Xu	acdc4a8d00	[HUDI-798] Migrate to Mockito Jupiter for JUnit 5 (#1521 )	2020-04-16 16:07:32 +08:00
Raymond Xu	d65efe659d	[HUDI-780] Migrate test cases to Junit 5 (#1504 )	2020-04-15 12:35:01 -07:00
Y Ethan Guo	cf765df606	[HUDI-76] Add CSV Source support for Hudi Delta Streamer	2020-03-15 19:03:37 -07:00
lamber-ken	ccbf543607	[HUDI-654] Rename hudi-hive to hudi-hive-sync	2020-03-06 22:13:16 +08:00
yanghua	0dc8e493aa	Moving to 0.6.0-SNAPSHOT on master branch.	2020-03-01 15:08:30 +08:00
openopen2	dfbee673ef	[HUDI-514] A schema provider to get metadata through Jdbc (#1200 )	2020-02-13 18:06:06 -08:00
lamber-ken	425e3e6c78	[HUDI-585] Optimize the steps of building with scala-2.12 (#1293 )	2020-02-05 23:13:10 +08:00
leesf	6e59c1c777	Moving to 0.5.2-SNAPSHOT on master branch.	2020-01-20 10:51:33 -08:00
wenningd	292c1e2ff4	[HUDI-238] Make Hudi support Scala 2.12 (#1226 ) * [HUDI-238] Rename scala related artifactId & add maven profile to support Scala 2.12	2020-01-17 14:02:21 -08:00
Udit Mehrotra	ad50008a59	[HUDI-91][HUDI-12]Migrate to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add support for Decimal/Date types - Upgrade Spark to 2.4.4, Parquet to 1.10.1, Avro to 1.8.2 - Remove spark-avro from hudi-spark-bundle. Users need to provide --packages org.apache.spark:spark-avro:2.4.4 when running spark-shell or spark-submit - Replace com.databricks:spark-avro with org.apache.spark:spark-avro - Shade avro in hudi-hadoop-mr-bundle to make sure it does not conflict with hive's avro version.	2020-01-12 15:03:11 -08:00
Abhishek Modi	b5df6723a2	[HUDI-464] Use Hive Exec Core for tests (#1125 )	2020-01-06 16:32:55 -08:00
Balaji Varadarajan	14dd649d06	[MINOR] Remove release notes and move confluent repository to hoodie parent pom	2019-10-21 14:16:05 -07:00
leesf	b19bed442d	[HUDI-296] Explore use of spotless to auto fix formatting errors (#945 ) - Add spotless format fixing to project - One time reformatting for conformity - Build fails for formatting changes and mvn spotless:apply autofixes them	2019-10-10 05:19:40 -07:00
Balaji Varadarajan	9b66ea41fd	[HUDI-121] Remove leftover notice file and replace com.uber.hoodie with org.apache.hudi in log4j properties	2019-10-04 09:18:57 -07:00
Balaji Varadarajan	c1e7d0e5a6	[HUDI-121] Update Release notes and fix master version	2019-09-17 09:50:30 -07:00
Balaji Varadarajan	d2525c31b7	Moving to 0.6.0-SNAPSHOT on master branch.	2019-09-13 09:58:29 -07:00
vinoth chandar	7a973a6944	[HUDI-159] Redesigning bundles for lighter-weight integrations - Documented principles applied for redesign at packaging/README.md - No longer depends on incl commons-codec, commons-io, commons-pool, commons-dbcp, commons-lang, commons-logging, avro-mapred - Introduce new FileIOUtils & added checkstyle rule for illegal import of above - Parquet, Avro dependencies moved to provided scope to enable being picked up from Hive/Spark/Presto instead - Pickup jackson jars for Hive sync tool from HIVE_HOME & unbundling jackson everywhere - Remove hive-jdbc standalone jar from being bundled in Spark/Hive/Utilities bundles - 6.5x reduced number of classes across bundles	2019-09-11 11:08:27 -07:00
vinoth chandar	cd090871a1	[HUDI-159]: Pom cleanup and removal of com.twitter.parquet - Redo all classes based on org.parquet only - remove unuused dependencies like parquet-hadoop, common-configuration2 - timeline-service does not build a fat jar anymore - Fix utilities and hadoop-mr bundles based on above	2019-08-25 16:01:14 -07:00
vinoth chandar	6edf0b9def	[HUDI-68] Pom cleanup & demo automation (#846 ) - [HUDI-172] Cleanup Maven POM/Classpath - Fix ordering of dependencies in poms, to enable better resolution - Idea is to place more specific ones at the top - And place dependencies which use them below them - [HUDI-68] : Automate demo steps on docker setup - Move hive queries from hive cli to beeline - Standardize on taking query input from text command files - Deltastreamer ingest, also does hive sync in a single step - Spark Incremental Query materialized as a derived Hive table using datasource - Fix flakiness in HDFS spin up and output comparison - Code cleanup around streamlining and loc reduction - Also fixed pom to not shade some hive classs in spark, to enable hive sync	2019-08-22 20:18:50 -07:00
Balaji Varadarajan	a4f9d7575f	HUDI-123 Rename code packages/constants to org.apache.hudi (#830 ) - Rename com.uber.hoodie to org.apache.hudi - Flag to pass com.uber.hoodie Input formats for hoodie-sync - Works with HUDI demo. - Also tested for backwards compatibility with datasets built by com.uber.hoodie packages - Migration guide : https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi	2019-08-11 17:48:17 -07:00

37 Commits