1
0
Commit Graph

76 Commits

Author SHA1 Message Date
wenningd
d412fb2fe6 [HUDI-89] Add configOption & refactor all configs based on that (#2833)
Co-authored-by: Wenning Ding <wenningd@amazon.com>
2021-06-30 14:26:30 -07:00
Sivabalan Narayanan
5564c7ec01 [HUDI-2006] Adding more yaml templates to test suite (#3073) 2021-06-29 23:05:46 -04:00
Sivabalan Narayanan
ac72470e10 [HUDI-1851] Adding test suite long running automate scripts for docker (#2880) 2021-05-11 01:26:01 -07:00
Gary Li
050626ad6c [MINOR] Add Missing Apache License to test files (#2736) 2021-03-29 07:17:23 -07:00
garyli1019
6e803e08b1 Moving to 0.9.0-SNAPSHOT on master branch. 2021-03-24 21:37:14 +08:00
Sivabalan Narayanan
55a489c769 [1568] Fixing spark3 bundles (#2625)
- [1568] Fixing spark3 bundles
2021-03-19 14:21:36 -04:00
Sivabalan Narayanan
d5f202821b Adding fixes to test suite framework. Adding clustering node and validate async operations node. (#2400) 2021-02-12 09:29:21 -08:00
Vinoth Chandar
3719e7b388 Moving to 0.8.0-SNAPSHOT on master branch. 2021-01-20 11:31:22 -08:00
Sivabalan Narayanan
b9c2856d16 [HUDI-1535] Fix 0.7.0 snapshot (#2456)
* Revert "[MINOR] Bumping snapshot version to 0.7.0 (#2435)"

This reverts commit a43e191d6c.

* Fixing 0.7.0 snapshot bump
2021-01-19 12:20:43 -08:00
Sivabalan Narayanan
a43e191d6c [MINOR] Bumping snapshot version to 0.7.0 (#2435) 2021-01-16 09:56:28 -05:00
pengzhiwei
b83d1d3e61 [HUDI-1484] Escape the partition value in HiveSyncTool (#2363) 2020-12-28 23:02:36 -05:00
Sivabalan Narayanan
8cf6a7223f [HUDI-1331] Adding support for validating entire dataset and long running tests in test suite framework (#2168)
* trigger rebuild

* [HUDI-1156] Remove unused dependencies from HoodieDeltaStreamerWrapper Class (#1927)

* Adding support for validating records and long running tests in test sutie framework

* Adding partial validate node

* Fixing spark session initiation in Validate nodes

* Fixing validation

* Adding hive table validation to ValidateDatasetNode

* Rebasing with latest commits from master

* Addressing feedback

* Addressing comments

Co-authored-by: lamber-ken <lamberken@163.com>
Co-authored-by: linshan-ma <mabin194046@163.com>
2020-12-26 09:29:24 -08:00
Sivabalan Narayanan
a205dd10fa [HUDI-1338] Adding Delete support to test suite framework (#2172)
- Adding Delete support to test suite. 
         Added DeleteNode 
         Added support to generate delete records
2020-11-01 00:15:41 -04:00
n3nash
e109a61803 1. Fix merge on read DAG to make docker demo pass (#2092)
1. Fix merge on read DAG to make docker demo pass (#2092)
2. Fix repeat_count, rollback node
2020-10-28 22:34:26 -04:00
Prashant Wason
788d236c44 [HUDI-1303] Some improvements for the HUDI Test Suite. (#2128)
1. Use the DAG Node's label from the yaml as its name instead of UUID names which are not descriptive when debugging issues from logs.
2. Fix CleanNode constructor which is not correctly implemented
3. When generating upsets, allows more granualar control over the number of inserts and upserts - zero or more inserts and upserts can be specified instead of always requiring both inserts and upserts.
4. Fixed generation of records of specific size
   - The current code was using a class variable "shouldAddMore" which was reset to false after the first record generation causing subsequent records to be of minimum size.
   - In this change, we pre-calculate the extra size of the complex fields. When generating records, for complex fields we read the field size from this map.
5. Refresh the timeline of the DeltaSync service before calling readFromSource. This ensures that only the newest generated data is read and data generated in the older Dag Nodes is ignored (as their AVRO files will have an older timestamp).
6. Making --workload-generator-classname an optional parameter as most probably the default will be used
2020-10-07 08:33:51 -04:00
shenh062326
581d54097c [HUDI-1143] Change timestamp field in HoodieTestDataGenerator from double to long 2020-09-15 20:58:29 -07:00
Abhishek Modi
53d1e55110 Test Suite should work with Docker + Unit Tests 2020-09-08 22:41:14 -07:00
Dongwook
8d19ebfd0f [HUDI-993] Let delete API use "hoodie.delete.shuffle.parallelism" (#1703)
For Delete API, "hoodie.delete.shuffle.parallelism" isn't used as opposed to "hoodie.upsert.shuffle.parallelism" is used for upsert, this creates the performance difference between delete by upsert API with "EmptyHoodieRecordPayload" and delete API for certain cases.

This patch makes the following fixes in this regard. 
- Let deduplicateKeys method use "hoodie.delete.shuffle.parallelism"
- Repartition inputRDD as "hoodie.delete.shuffle.parallelism" in case "hoodie.combine.before.delete=false"
2020-09-01 12:55:31 -04:00
Bhavani Sudha Saktheeswaran
4226d75144 Moving to 0.6.1-SNAPSHOT on master branch. 2020-08-14 12:54:15 -07:00
Sivabalan Narayanan
9c24151929 [HUDI-1175] Commenting out testsuite tests from Integration tests until we investigate the CI flakiness (#1945) 2020-08-10 21:00:57 -07:00
lw0090
51ea27d665 [HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync, hudi-dla-sync (#1810)
- Generalize the hive-sync module for syncing to multiple metastores
- Added new options for datasource
- Added new command line for delta streamer 

Co-authored-by: Vinoth Chandar <vinoth@apache.org>
2020-08-05 21:34:55 -07:00
vinoth chandar
539621bd33 [HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876)
- [HUDI-418] Bootstrap Index Implementation using HFile with unit-test
 - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests
 - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices
 - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices
 - [HUDI-421] Bootstrap Write Client with tests
 - [HUDI-425] Added HoodieDeltaStreamer support
 - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap
 - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly
 - [HUDI-424] Simplify Record reader implementation
 - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices
 - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables

Co-authored-by: Mehrotra <uditme@amazon.com>
Co-authored-by: Vinoth Chandar <vinoth@apache.org>
Co-authored-by: Balaji Varadarajan <varadarb@uber.com>
2020-08-03 20:19:21 -07:00
n3nash
727f1df62c [MINOR] Suppressing spark logs for hudi-integ and hudi-utilities (#1894) 2020-07-31 19:01:25 -07:00
Nishith Agarwal
2fc2b01d86 [HUDI-394] Provide a basic implementation of test suite 2020-07-30 21:21:15 -07:00
hongdd
fa419213f6 [HUDI-703] Add test for HoodieSyncCommand (#1774) 2020-07-28 08:31:43 +08:00
sathyaprakashg
df2e0c760e HUDI-942 Increase default value number of delta commits for inline compaction (#1664)
Co-authored-by: Sathyaprakash Govindasamy <sathyaprakashg@zillowgroup.com>
2020-06-10 16:16:44 -07:00
Vinoth Govindarajan
8cb86b4d36 Added python3 to the spark_base docker image to support pyspark (#1632) 2020-05-31 22:53:50 -07:00
Satish Kotha
1f6be820f3 [HUDI-758] Modify Integration test to include incremental queries for MOR tables 2020-04-08 21:56:59 -07:00
lamber-ken
90227eeda7 [HUDI-673] Rename hudi-hive-bundle to hudi-hive-sync-bundle 2020-03-07 21:44:35 +08:00
lamber-ken
ccbf543607 [HUDI-654] Rename hudi-hive to hudi-hive-sync 2020-03-06 22:13:16 +08:00
yanghua
0dc8e493aa Moving to 0.6.0-SNAPSHOT on master branch. 2020-03-01 15:08:30 +08:00
lamber-ken
323fffad0d [HUDI-606] Improve execute build_local_docker_images.sh script 2020-02-26 19:38:19 +08:00
lamber-ken
11fb2c2614 [HUDI-580] Fix incorrect license header in files 2020-02-25 08:54:26 -08:00
lamber-ken
cdb028f1f3 [MINOR] Fix missing groupId / version property of dependency 2020-01-25 09:19:55 -08:00
leesf
fc8d4a71ad [MINOR] fix license issue (#1273) 2020-01-23 02:03:49 +08:00
leesf
ed54eb20a5 [MINOR] Add missing licenses (#1271) 2020-01-22 08:06:45 -05:00
lamber-ken
a54535ed5a [MINOR] Fix invalid maven repo address (#1265) 2020-01-21 04:41:59 -08:00
leesf
6e59c1c777 Moving to 0.5.2-SNAPSHOT on master branch. 2020-01-20 10:51:33 -08:00
wenningd
292c1e2ff4 [HUDI-238] Make Hudi support Scala 2.12 (#1226)
* [HUDI-238] Rename scala related artifactId & add maven profile to support Scala 2.12
2020-01-17 14:02:21 -08:00
vinoth chandar
c2c0f6b13d [HUDI-509] Renaming code in sync with cWiki restructuring (#1212)
- Storage Type replaced with Table Type (remaining instances)
 - View types replaced with query types;
 - ReadOptimized view referred as Snapshot Query
 - TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views
 - HoodieDataFile renamed to HoodieBaseFile
 - Hive Sync tool will register RO tables for MOR with a `_ro` suffix
 - Datasource/Deltastreamer options renamed accordingly
 - Support fallback to old config values as well, so migration is painless
 - Config for controlling _ro suffix addition
 - Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView
2020-01-16 23:58:47 -08:00
yuehan124
c78092d2d3 [HUDI-501] Execute docker/setup_demo.sh in any directory 2020-01-06 10:26:06 -08:00
lamber-ken
d9fbe33339 [HOTFIX] Fix error configuration item of dockerfile-maven-plugin 2019-11-19 16:30:03 -08:00
Balaji Varadarajan
f7c2f8cedc [HUDI-329] Presto Containers for integration test must allow newly built local jars to override 2019-11-13 17:35:34 -08:00
Mehrotra
92c69f5703 Migrate integration tests to spark 2.4.4 2019-11-13 16:53:24 -08:00
Sivabalan Narayanan
23b303e4b1 [HUDI-218] Adding Presto support to Integration Test (#1003) 2019-11-11 06:21:49 -08:00
Balaji Varadarajan
a6390aefc4 [HUDI-312] Make docker hdfs cluster ephemeral. This is needed to fix flakiness in integration tests. Also, Fix DeltaStreamer hanging issue due to uncaught exception 2019-11-01 11:49:59 -07:00
leesf
b19bed442d [HUDI-296] Explore use of spotless to auto fix formatting errors (#945)
- Add spotless format fixing to project
- One time reformatting for conformity
- Build fails for formatting changes and mvn spotless:apply autofixes them
2019-10-10 05:19:40 -07:00
Balaji Varadarajan
9b66ea41fd [HUDI-121] Remove leftover notice file and replace com.uber.hoodie with org.apache.hudi in log4j properties 2019-10-04 09:18:57 -07:00
Balaji Varadarajan
6da2f9ac7c [HUDI-287] Address comments during review of release candidate
1. Remove LICENSE and NOTICE files in hoodie child modules.
  2. Remove developers and contributor section from pom
  3. Also ensure any failures in validation script is reported appropriately
  4. Make hoodie parent pom consistent with that of its parent apache-21 (https://github.com/apache/maven-apache-parent/blob/apache-21/pom.xml)
2019-10-03 09:00:07 -07:00
Balaji Varadarajan
6e8a28bcae HUDI-121 : Address comments during RC2 voting
1. Remove dnl utils jar from git
2. Add LICENSE Headers in missing files
3. Fix NOTICE and LICENSE in all HUDI packages and in top-level
4. Fix License wording in certain HUDI source files
5. Include non java/scala code in RAT licensing check
6. Use whitelist to include dependencies as part of timeline-server bundling
2019-09-30 15:42:15 -07:00