lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
n3nash	e109a61803	1. Fix merge on read DAG to make docker demo pass (#2092 ) 1. Fix merge on read DAG to make docker demo pass (#2092) 2. Fix repeat_count, rollback node	2020-10-28 22:34:26 -04:00
Prashant Wason	788d236c44	[HUDI-1303] Some improvements for the HUDI Test Suite. (#2128 ) 1. Use the DAG Node's label from the yaml as its name instead of UUID names which are not descriptive when debugging issues from logs. 2. Fix CleanNode constructor which is not correctly implemented 3. When generating upsets, allows more granualar control over the number of inserts and upserts - zero or more inserts and upserts can be specified instead of always requiring both inserts and upserts. 4. Fixed generation of records of specific size - The current code was using a class variable "shouldAddMore" which was reset to false after the first record generation causing subsequent records to be of minimum size. - In this change, we pre-calculate the extra size of the complex fields. When generating records, for complex fields we read the field size from this map. 5. Refresh the timeline of the DeltaSync service before calling readFromSource. This ensures that only the newest generated data is read and data generated in the older Dag Nodes is ignored (as their AVRO files will have an older timestamp). 6. Making --workload-generator-classname an optional parameter as most probably the default will be used	2020-10-07 08:33:51 -04:00
shenh062326	581d54097c	[HUDI-1143] Change timestamp field in HoodieTestDataGenerator from double to long	2020-09-15 20:58:29 -07:00
Abhishek Modi	53d1e55110	Test Suite should work with Docker + Unit Tests	2020-09-08 22:41:14 -07:00
Dongwook	8d19ebfd0f	[HUDI-993] Let delete API use "hoodie.delete.shuffle.parallelism" (#1703 ) For Delete API, "hoodie.delete.shuffle.parallelism" isn't used as opposed to "hoodie.upsert.shuffle.parallelism" is used for upsert, this creates the performance difference between delete by upsert API with "EmptyHoodieRecordPayload" and delete API for certain cases. This patch makes the following fixes in this regard. - Let deduplicateKeys method use "hoodie.delete.shuffle.parallelism" - Repartition inputRDD as "hoodie.delete.shuffle.parallelism" in case "hoodie.combine.before.delete=false"	2020-09-01 12:55:31 -04:00
Bhavani Sudha Saktheeswaran	4226d75144	Moving to 0.6.1-SNAPSHOT on master branch.	2020-08-14 12:54:15 -07:00
Sivabalan Narayanan	9c24151929	[HUDI-1175] Commenting out testsuite tests from Integration tests until we investigate the CI flakiness (#1945 )	2020-08-10 21:00:57 -07:00
lw0090	51ea27d665	[HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync, hudi-dla-sync (#1810 ) - Generalize the hive-sync module for syncing to multiple metastores - Added new options for datasource - Added new command line for delta streamer Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-05 21:34:55 -07:00
vinoth chandar	539621bd33	[HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876 ) - [HUDI-418] Bootstrap Index Implementation using HFile with unit-test - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-421] Bootstrap Write Client with tests - [HUDI-425] Added HoodieDeltaStreamer support - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly - [HUDI-424] Simplify Record reader implementation - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables Co-authored-by: Mehrotra <uditme@amazon.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-08-03 20:19:21 -07:00
n3nash	727f1df62c	[MINOR] Suppressing spark logs for hudi-integ and hudi-utilities (#1894 )	2020-07-31 19:01:25 -07:00
Nishith Agarwal	2fc2b01d86	[HUDI-394] Provide a basic implementation of test suite	2020-07-30 21:21:15 -07:00
hongdd	fa419213f6	[HUDI-703] Add test for HoodieSyncCommand (#1774 )	2020-07-28 08:31:43 +08:00
sathyaprakashg	df2e0c760e	HUDI-942 Increase default value number of delta commits for inline compaction (#1664 ) Co-authored-by: Sathyaprakash Govindasamy <sathyaprakashg@zillowgroup.com>	2020-06-10 16:16:44 -07:00
Vinoth Govindarajan	8cb86b4d36	Added python3 to the spark_base docker image to support pyspark (#1632 )	2020-05-31 22:53:50 -07:00
Satish Kotha	1f6be820f3	[HUDI-758] Modify Integration test to include incremental queries for MOR tables	2020-04-08 21:56:59 -07:00
lamber-ken	90227eeda7	[HUDI-673] Rename hudi-hive-bundle to hudi-hive-sync-bundle	2020-03-07 21:44:35 +08:00
lamber-ken	ccbf543607	[HUDI-654] Rename hudi-hive to hudi-hive-sync	2020-03-06 22:13:16 +08:00
yanghua	0dc8e493aa	Moving to 0.6.0-SNAPSHOT on master branch.	2020-03-01 15:08:30 +08:00
lamber-ken	323fffad0d	[HUDI-606] Improve execute build_local_docker_images.sh script	2020-02-26 19:38:19 +08:00
lamber-ken	11fb2c2614	[HUDI-580] Fix incorrect license header in files	2020-02-25 08:54:26 -08:00
lamber-ken	cdb028f1f3	[MINOR] Fix missing groupId / version property of dependency	2020-01-25 09:19:55 -08:00
leesf	fc8d4a71ad	[MINOR] fix license issue (#1273 )	2020-01-23 02:03:49 +08:00
leesf	ed54eb20a5	[MINOR] Add missing licenses (#1271 )	2020-01-22 08:06:45 -05:00
lamber-ken	a54535ed5a	[MINOR] Fix invalid maven repo address (#1265 )	2020-01-21 04:41:59 -08:00
leesf	6e59c1c777	Moving to 0.5.2-SNAPSHOT on master branch.	2020-01-20 10:51:33 -08:00
wenningd	292c1e2ff4	[HUDI-238] Make Hudi support Scala 2.12 (#1226 ) * [HUDI-238] Rename scala related artifactId & add maven profile to support Scala 2.12	2020-01-17 14:02:21 -08:00
vinoth chandar	c2c0f6b13d	[HUDI-509] Renaming code in sync with cWiki restructuring (#1212 ) - Storage Type replaced with Table Type (remaining instances) - View types replaced with query types; - ReadOptimized view referred as Snapshot Query - TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views - HoodieDataFile renamed to HoodieBaseFile - Hive Sync tool will register RO tables for MOR with a `_ro` suffix - Datasource/Deltastreamer options renamed accordingly - Support fallback to old config values as well, so migration is painless - Config for controlling _ro suffix addition - Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView	2020-01-16 23:58:47 -08:00
yuehan124	c78092d2d3	[HUDI-501] Execute docker/setup_demo.sh in any directory	2020-01-06 10:26:06 -08:00
lamber-ken	d9fbe33339	[HOTFIX] Fix error configuration item of dockerfile-maven-plugin	2019-11-19 16:30:03 -08:00
Balaji Varadarajan	f7c2f8cedc	[HUDI-329] Presto Containers for integration test must allow newly built local jars to override	2019-11-13 17:35:34 -08:00
Mehrotra	92c69f5703	Migrate integration tests to spark 2.4.4	2019-11-13 16:53:24 -08:00
Sivabalan Narayanan	23b303e4b1	[HUDI-218] Adding Presto support to Integration Test (#1003 )	2019-11-11 06:21:49 -08:00
Balaji Varadarajan	a6390aefc4	[HUDI-312] Make docker hdfs cluster ephemeral. This is needed to fix flakiness in integration tests. Also, Fix DeltaStreamer hanging issue due to uncaught exception	2019-11-01 11:49:59 -07:00
leesf	b19bed442d	[HUDI-296] Explore use of spotless to auto fix formatting errors (#945 ) - Add spotless format fixing to project - One time reformatting for conformity - Build fails for formatting changes and mvn spotless:apply autofixes them	2019-10-10 05:19:40 -07:00
Balaji Varadarajan	9b66ea41fd	[HUDI-121] Remove leftover notice file and replace com.uber.hoodie with org.apache.hudi in log4j properties	2019-10-04 09:18:57 -07:00
Balaji Varadarajan	6da2f9ac7c	[HUDI-287] Address comments during review of release candidate 1. Remove LICENSE and NOTICE files in hoodie child modules. 2. Remove developers and contributor section from pom 3. Also ensure any failures in validation script is reported appropriately 4. Make hoodie parent pom consistent with that of its parent apache-21 (https://github.com/apache/maven-apache-parent/blob/apache-21/pom.xml)	2019-10-03 09:00:07 -07:00
Balaji Varadarajan	6e8a28bcae	HUDI-121 : Address comments during RC2 voting 1. Remove dnl utils jar from git 2. Add LICENSE Headers in missing files 3. Fix NOTICE and LICENSE in all HUDI packages and in top-level 4. Fix License wording in certain HUDI source files 5. Include non java/scala code in RAT licensing check 6. Use whitelist to include dependencies as part of timeline-server bundling	2019-09-30 15:42:15 -07:00
Balaji Varadarajan	c1e7d0e5a6	[HUDI-121] Update Release notes and fix master version	2019-09-17 09:50:30 -07:00
Balaji Varadarajan	7190c022bb	[HUDI-249] Updating Notice files	2019-09-13 13:50:58 -07:00
Balaji Varadarajan	d2525c31b7	Moving to 0.6.0-SNAPSHOT on master branch.	2019-09-13 09:58:29 -07:00
Balaji Varadarajan	58623631d4	[HUDI-249] Update Release-notes. Add sign-artifacts to POM and release related scripts. Add missing license headers	2019-09-13 08:41:29 -07:00
leesf	8b150a3c6b	[HUDI-230] Add missing Apache License in some files	2019-08-30 09:38:28 -07:00
Balaji Varadarajan	5f9fa82f47	HUDI-124 : Exclude jdk.tools from hadoop-common and update Notice files (#858 )	2019-08-28 16:20:47 -07:00
Vinoth Chandar	78e0721507	[HUDI-159] Precursor cleanup to reduce build warnings	2019-08-26 19:41:00 -07:00
vinoth chandar	6edf0b9def	[HUDI-68] Pom cleanup & demo automation (#846 ) - [HUDI-172] Cleanup Maven POM/Classpath - Fix ordering of dependencies in poms, to enable better resolution - Idea is to place more specific ones at the top - And place dependencies which use them below them - [HUDI-68] : Automate demo steps on docker setup - Move hive queries from hive cli to beeline - Standardize on taking query input from text command files - Deltastreamer ingest, also does hive sync in a single step - Spark Incremental Query materialized as a derived Hive table using datasource - Fix flakiness in HDFS spin up and output comparison - Code cleanup around streamlining and loc reduction - Also fixed pom to not shade some hive classs in spark, to enable hive sync	2019-08-22 20:18:50 -07:00
Bhavani Sudha Saktheeswaran	92eed6aca8	[HUDI-82] Adds Presto integration in Docker demo (#847 )	2019-08-22 19:40:36 -07:00
Balaji Varadarajan	a4f9d7575f	HUDI-123 Rename code packages/constants to org.apache.hudi (#830 ) - Rename com.uber.hoodie to org.apache.hudi - Flag to pass com.uber.hoodie Input formats for hoodie-sync - Works with HUDI demo. - Also tested for backwards compatibility with datasets built by com.uber.hoodie packages - Migration guide : https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi	2019-08-11 17:48:17 -07:00
Balaji Varadarajan	ec965892b0	HUDI-149 - Remove platform dependencies and update NOTICE plugin	2019-08-05 08:57:15 -07:00
Balaji Varadarajan	479908fd20	HUDI-125 : Change License for all source files and update RAT configurations	2019-06-09 11:41:55 -07:00
Balaji Varadarajan	30b0f2636f	Changes related to Licensing work 1. Go through dependencies list one round to ensure compliance. Generated current NOTICE list in all submodules (other apache projects like flink does this). To be on conservative side regarding licensing, NOTICE.txt lists all dependencies including transitive. Pending Compliance questions reported in https://issues.apache.org/jira/browse/LEGAL-461 2. Automate generating NOTICE.txt files to allow future package compliance issues be identified early as part of code-review process. 3. Added NOTICE.txt and LICENSE.txt to all HUDI jars	2019-06-07 17:58:57 -07:00

1 2

63 Commits