lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
leesf	6e59c1c777	Moving to 0.5.2-SNAPSHOT on master branch.	2020-01-20 10:51:33 -08:00
wenningd	292c1e2ff4	[HUDI-238] Make Hudi support Scala 2.12 (#1226 ) * [HUDI-238] Rename scala related artifactId & add maven profile to support Scala 2.12	2020-01-17 14:02:21 -08:00
Udit Mehrotra	ad50008a59	[HUDI-91][HUDI-12]Migrate to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add support for Decimal/Date types - Upgrade Spark to 2.4.4, Parquet to 1.10.1, Avro to 1.8.2 - Remove spark-avro from hudi-spark-bundle. Users need to provide --packages org.apache.spark:spark-avro:2.4.4 when running spark-shell or spark-submit - Replace com.databricks:spark-avro with org.apache.spark:spark-avro - Shade avro in hudi-hadoop-mr-bundle to make sure it does not conflict with hive's avro version.	2020-01-12 15:03:11 -08:00
lamber-ken	d9675c4ec0	[HUDI-522] Use the same version jcommander uniformly (#1214 )	2020-01-12 10:48:52 -08:00
Udit Mehrotra	0bb5999f79	[HUDI-306] Support Glue catalog and other hive metastore implementations (#961 ) - Support Glue catalog and other metastore implementations - Remove shading from hudi utilities bundle - Add maven profile to optionally shade hive in utilities bundle	2019-11-11 17:27:31 -08:00
Gurudatt Kulkarni	031b067a3a	[MINOR] Move all repository declarations to parent pom (#966 )	2019-10-22 20:17:13 -07:00
Mehrotra	8c13340062	Shade and relocate Avro dependency in hadoop-mr-bundle	2019-10-16 02:08:12 -07:00
leesf	b19bed442d	[HUDI-296] Explore use of spotless to auto fix formatting errors (#945 ) - Add spotless format fixing to project - One time reformatting for conformity - Build fails for formatting changes and mvn spotless:apply autofixes them	2019-10-10 05:19:40 -07:00
Balaji Varadarajan	9b66ea41fd	[HUDI-121] Remove leftover notice file and replace com.uber.hoodie with org.apache.hudi in log4j properties	2019-10-04 09:18:57 -07:00
Balaji Varadarajan	6da2f9ac7c	[HUDI-287] Address comments during review of release candidate 1. Remove LICENSE and NOTICE files in hoodie child modules. 2. Remove developers and contributor section from pom 3. Also ensure any failures in validation script is reported appropriately 4. Make hoodie parent pom consistent with that of its parent apache-21 (https://github.com/apache/maven-apache-parent/blob/apache-21/pom.xml)	2019-10-03 09:00:07 -07:00
Balaji Varadarajan	6e8a28bcae	HUDI-121 : Address comments during RC2 voting 1. Remove dnl utils jar from git 2. Add LICENSE Headers in missing files 3. Fix NOTICE and LICENSE in all HUDI packages and in top-level 4. Fix License wording in certain HUDI source files 5. Include non java/scala code in RAT licensing check 6. Use whitelist to include dependencies as part of timeline-server bundling	2019-09-30 15:42:15 -07:00
Vinoth Chandar	e217db56ab	[HUDI-254]: Bundle and shade databricks/avro with spark bundle - spark 2.4 onwards, spark has built in support. shading to avoid conflicts - spark 2.3 still needs this bundled, so that dropping bundle into jars folder would work	2019-09-17 12:38:51 -07:00
Balaji Varadarajan	c1e7d0e5a6	[HUDI-121] Update Release notes and fix master version	2019-09-17 09:50:30 -07:00
Balaji Varadarajan	7190c022bb	[HUDI-249] Updating Notice files	2019-09-13 13:50:58 -07:00
Balaji Varadarajan	d2525c31b7	Moving to 0.6.0-SNAPSHOT on master branch.	2019-09-13 09:58:29 -07:00
Vinoth Chandar	d0b9b56b7d	[HUDI-143] Excluding javax.* from utilities and spark bundles - Plus minor code review comments	2019-09-11 11:08:27 -07:00
vinoth chandar	7a973a6944	[HUDI-159] Redesigning bundles for lighter-weight integrations - Documented principles applied for redesign at packaging/README.md - No longer depends on incl commons-codec, commons-io, commons-pool, commons-dbcp, commons-lang, commons-logging, avro-mapred - Introduce new FileIOUtils & added checkstyle rule for illegal import of above - Parquet, Avro dependencies moved to provided scope to enable being picked up from Hive/Spark/Presto instead - Pickup jackson jars for Hive sync tool from HIVE_HOME & unbundling jackson everywhere - Remove hive-jdbc standalone jar from being bundled in Spark/Hive/Utilities bundles - 6.5x reduced number of classes across bundles	2019-09-11 11:08:27 -07:00
leesf	5c2da6051e	[HUDI-225] Create Hudi Timeline Server Fat Jar	2019-08-29 20:03:06 -07:00
Balaji Varadarajan	5f9fa82f47	HUDI-124 : Exclude jdk.tools from hadoop-common and update Notice files (#858 )	2019-08-28 16:20:47 -07:00
vinoth chandar	cd090871a1	[HUDI-159]: Pom cleanup and removal of com.twitter.parquet - Redo all classes based on org.parquet only - remove unuused dependencies like parquet-hadoop, common-configuration2 - timeline-service does not build a fat jar anymore - Fix utilities and hadoop-mr bundles based on above	2019-08-25 16:01:14 -07:00
vinoth chandar	6edf0b9def	[HUDI-68] Pom cleanup & demo automation (#846 ) - [HUDI-172] Cleanup Maven POM/Classpath - Fix ordering of dependencies in poms, to enable better resolution - Idea is to place more specific ones at the top - And place dependencies which use them below them - [HUDI-68] : Automate demo steps on docker setup - Move hive queries from hive cli to beeline - Standardize on taking query input from text command files - Deltastreamer ingest, also does hive sync in a single step - Spark Incremental Query materialized as a derived Hive table using datasource - Fix flakiness in HDFS spin up and output comparison - Code cleanup around streamlining and loc reduction - Also fixed pom to not shade some hive classs in spark, to enable hive sync	2019-08-22 20:18:50 -07:00
Balaji Varadarajan	a4f9d7575f	HUDI-123 Rename code packages/constants to org.apache.hudi (#830 ) - Rename com.uber.hoodie to org.apache.hudi - Flag to pass com.uber.hoodie Input formats for hoodie-sync - Works with HUDI demo. - Also tested for backwards compatibility with datasets built by com.uber.hoodie packages - Migration guide : https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi	2019-08-11 17:48:17 -07:00
Balaji Varadarajan	ec965892b0	HUDI-149 - Remove platform dependencies and update NOTICE plugin	2019-08-05 08:57:15 -07:00
Luke Zhu	171901a9d0	Fix typo in hoodie-presto-bundle (#818 )	2019-08-01 08:51:57 -07:00
Balaji Varadarajan	6e0ff3a235	Generate Source Jars for bundle packages (#810 )	2019-07-30 18:17:14 -07:00
Balaji Varadarajan	a0d7ab2384	HUDI-70 : Making DeltaStreamer run in continuous mode with concurrent compaction	2019-06-18 17:48:14 -07:00
Balaji Varadarajan	479908fd20	HUDI-125 : Change License for all source files and update RAT configurations	2019-06-09 11:41:55 -07:00
Balaji Varadarajan	30b0f2636f	Changes related to Licensing work 1. Go through dependencies list one round to ensure compliance. Generated current NOTICE list in all submodules (other apache projects like flink does this). To be on conservative side regarding licensing, NOTICE.txt lists all dependencies including transitive. Pending Compliance questions reported in https://issues.apache.org/jira/browse/LEGAL-461 2. Automate generating NOTICE.txt files to allow future package compliance issues be identified early as part of code-review process. 3. Added NOTICE.txt and LICENSE.txt to all HUDI jars	2019-06-07 17:58:57 -07:00
guanjianhui	173e0b6be4	exlude fasterxml and parquet from presto bundle	2019-06-07 11:33:43 -07:00
Thinking	66893bfef2	fix spark-shell add jar problem jira link https://issues.apache.org/jira/browse/HUDI-101 issue link https://github.com/apache/incubator-hudi/issues/516#issue-386048519 when using spark-shell with hoodie save data like : ``` ./spark-shell --master yarn --jars /home/hdfs/software/spark/hoodie/hoodie-spark-bundle-0.4.8-SNAPSHOT.jar --conf spark.sql.hive.convertMetastoreParquet=false --packages com.databricks:spark-avro_2.11:4.0.0 ``` and ``` inputDF.write.format("com.uber.hoodie") .option("hoodie.insert.shuffle.parallelism", "1") // any hoodie client config can be passed like this .option("hoodie.upsert.shuffle.parallelism", "1") // full list in HoodieWriteConfig & its package .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, HoodieTableType.COPY_ON_WRITE.name()) .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL) // insert .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "_row_key") .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "partition") .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "extend_deal_date") .option(HoodieWriteConfig.TABLE_NAME, "c_upload_code") .mode(SaveMode.Overwrite) .save("/tmp/test/hoodie") ``` It also report error `Invalid signature file digest for Manifest main attributes`. Need to scan all infected dependency.	2019-06-03 15:01:43 -07:00
Vinoth Chandar	7b4a28ecf8	Move depedency repos to https urls	2019-05-31 20:37:03 -07:00
Vinoth Chandar	acd74129cd	Create hoodie-utilities-bundle to host the shaded jar - hoodie-utilities can now be pulled in as compile time dependency - Lets users test their DeltaStreamer transformers for e.g - Tested the docker demo works & takes in the bundle - Doc changes to follow, to move DeltaStreamer commands to bundle jar	2019-05-30 22:46:24 -07:00
vinothchandar	66c0b81b49	[maven-release-plugin] prepare for next development iteration	2019-05-28 19:17:26 -07:00
vinothchandar	227785c022	[maven-release-plugin] prepare release hoodie-0.4.7	2019-05-28 19:17:15 -07:00
Balaji Varadarajan	64fec64097	Timeline Service with Incremental View Syncing support	2019-05-16 13:25:33 -07:00
vinothchandar	446f99aa0f	[maven-release-plugin] prepare for next development iteration	2019-05-14 07:29:22 -07:00
vinothchandar	cc38abecc8	[maven-release-plugin] prepare release hoodie-0.4.6	2019-05-14 07:29:11 -07:00
Abhishek Sharma	e2dcef8606	HUDI-101: added exclusion filters for signature files.	2019-05-07 18:35:18 -07:00
Omkar Joshi	738635306b	migrating kryo's dependency from twitter chill to plain kryo library	2019-05-06 20:32:00 -07:00
Balaji Varadarajan	36ef94004e	Fix Hive RT query failure in hoodie demo	2019-04-17 16:36:32 -07:00
Omkar Joshi	e35d24f31d	Revert "Replacing Apache commons-lang3 object serializer with Kryo serializer" This reverts commit `a6c45feb2c`.	2019-04-17 09:23:37 -07:00
Bhavani Sudha Saktheeswaran	83b6aa5e91	Fix multiple issues when using build_local_docker_images for setting up the demo Details here - https://issues.apache.org/jira/browse/HUDI-98	2019-04-15 10:10:05 -07:00
Balaji Varadarajan	b07110b9fd	Essential Hive packages missing in hoodie spark bundle	2019-04-09 21:42:42 -07:00
Omkar Joshi	a6c45feb2c	Replacing Apache commons-lang3 object serializer with Kryo serializer	2019-03-18 14:12:25 -07:00
Balaji Varadarajan	adc8cac743	Fix hive sync (libfb version mismatch) and deltastreamer issue (missing cmdline argument) in demo	2019-03-13 16:14:32 -07:00
vinothchandar	687395e40f	[maven-release-plugin] prepare for next development iteration	2019-02-27 07:16:27 -08:00
vinothchandar	bbf40ef987	[maven-release-plugin] prepare release hoodie-0.4.5	2019-02-27 07:16:15 -08:00
Bhavani Sudha Saktheeswaran	75c7a2622b	Create hoodie-presto bundle jar Exclude common dependencies that are available in Presto	2019-02-24 19:49:02 -08:00
Kent Yao	09f203d324	typo: bundle jar with unrecongnized variables	2019-02-13 16:46:11 +08:00
Balaji Varadarajan	3a0044216c	New Features in DeltaStreamer : (1) Apply transformation when using delta-streamer to ingest data. (2) Add Hudi Incremental Source for Delta Streamer (3) Allow delta-streamer config-property to be passed as command-line (4) Add Hive Integration to Delta-Streamer and address Review comments (5) Ensure MultiPartKeysValueExtractor handle hive style partition description (6) Reuse same spark session on both source and transformer (7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource (8) Reuse Binary Avro coders (9) Add push down filter for Incremental source (10) Add Hoodie DeltaStreamer metrics to track total time taken	2019-02-11 18:22:05 -08:00

1 2

59 Commits