lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
rmahindra123	8fef50e237	[HUDI-2044] Integrate consumers with rocksDB and compression within External Spillable Map (#3318 )	2021-07-28 01:31:03 -04:00
Sivabalan Narayanan	d58a8348dc	[HUDI-2007] Fixing hudi_test_suite for spark nodes and adding spark bulk_insert node (#3074 )	2021-07-21 00:11:01 -04:00
Samrat	a086d255c8	[HUDI-1860] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer (#3184 )	2021-07-19 21:49:43 -04:00
Jintao Guan	2debb9b3ed	[HUDI-1828] Update unit tests to support ORC as the base file format (#3237 )	2021-07-15 00:05:42 +08:00
pengzhiwei	ffa934182a	[HUDI-2045] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer	2021-07-12 13:03:14 +08:00
Shawy Geng	6e24434682	[HUDI-2113] Fix integration testing failure caused by sql results out of order (#3204 )	2021-07-06 00:35:12 -07:00
wenningd	d412fb2fe6	[HUDI-89] Add configOption & refactor all configs based on that (#2833 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2021-06-30 14:26:30 -07:00
Sivabalan Narayanan	5564c7ec01	[HUDI-2006] Adding more yaml templates to test suite (#3073 )	2021-06-29 23:05:46 -04:00
wangxianghu	7261f08507	[HUDI-1929] Support configure KeyGenerator by type (#2993 )	2021-06-08 09:26:10 -04:00
pengzhiwei	f760ec543e	[HUDI-1659] Basic Implement Of Spark Sql Support For Hoodie (#2645 ) Main functions: Support create table for hoodie. Support CTAS. Support Insert for hoodie. Including dynamic partition and static partition insert. Support MergeInto for hoodie. Support DELETE Support UPDATE Both support spark2 & spark3 based on DataSourceV1. Main changes: Add sql parser for spark2. Add HoodieAnalysis for sql resolve and logical plan rewrite. Add commands implementation for CREATE TABLE、INSERT、MERGE INTO & CTAS. In order to push down the update&insert logical to the HoodieRecordPayload for MergeInto, I make same change to the HoodieWriteHandler and other related classes. 1、Add the inputSchema for parser the incoming record. This is because the inputSchema for MergeInto is different from writeSchema as there are some transforms in the update& insert expression. 2、Add WRITE_SCHEMA to HoodieWriteConfig to pass the write schema for merge into. 3、Pass properties to HoodieRecordPayload#getInsertValue to pass the insert expression and table schema. Verify this pull request Add TestCreateTable for test create hoodie tables and CTAS. Add TestInsertTable for test insert hoodie tables. Add TestMergeIntoTable for test merge hoodie tables. Add TestUpdateTable for test update hoodie tables. Add TestDeleteTable for test delete hoodie tables. Add TestSqlStatement for test supported ddl/dml currently.	2021-06-07 23:24:32 -07:00
wangxianghu	e7020748b5	[HUDI-1920] Set archived as the default value of HOODIE_ARCHIVELOG_FOLDER_PROP_NAME (#2978 )	2021-05-25 16:29:55 +08:00
wangxianghu	f3777f44fe	[MINOR] Remove unused imports and some other checkstyle issues (#2800 )	2021-04-11 21:42:34 +08:00
Simon	18459d4045	[MINOR] Some unit test code optimize (#2782 ) * Optimized code * Optimized code	2021-04-08 13:35:03 +08:00
garyli1019	6e803e08b1	Moving to 0.9.0-SNAPSHOT on master branch.	2021-03-24 21:37:14 +08:00
n3nash	01a1d7997b	[HUDI-1712] Rename & standardize config to match other configs (#2708 )	2021-03-24 17:24:02 +08:00
legendtkl	0e6909d3e2	[MINOR][DOCUMENT] Update README doc for integ test (#2703 )	2021-03-23 20:21:56 +08:00
n3nash	d7b18783bd	[HUDI-1709] Improving config names and adding hive metastore uri config (#2699 )	2021-03-22 01:22:06 -07:00
n3nash	74241947c1	[HUDI-845] Added locking capability to allow multiple writers (#2374 ) * [HUDI-845] Added locking capability to allow multiple writers 1. Added LockProvider API for pluggable lock methodologies 2. Added Resolution Strategy API to allow for pluggable conflict resolution 3. Added TableService client API to schedule table services 4. Added Transaction Manager for wrapping actions within transactions	2021-03-16 16:43:53 -07:00
Balajee Nagasubramaniam	d8af24d8a2	[HUDI-1635] Improvements to Hudi Test Suite (#2628 )	2021-03-09 13:29:38 -08:00
pengzhiwei	bc883db5de	[HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig (#2596 )	2021-03-05 14:10:27 +08:00
Sivabalan Narayanan	9f5e8cc7c3	Fixing README for hudi test suite long running job (#2578 )	2021-02-25 16:50:18 -08:00
n3nash	ffcfb58bac	[HUDI-1486] Remove inline inflight rollback in hoodie writer (#2359 ) 1. Refactor rollback and move cleaning failed commits logic into cleaner 2. Introduce hoodie heartbeat to ascertain failed commits 3. Fix test cases	2021-02-19 20:12:22 -08:00
Sivabalan Narayanan	c9fcf964b2	[HUDI-1315] Adding builder for HoodieTableMetaClient initialization (#2534 )	2021-02-20 09:54:26 +08:00
Sivabalan Narayanan	d5f202821b	Adding fixes to test suite framework. Adding clustering node and validate async operations node. (#2400 )	2021-02-12 09:29:21 -08:00
cooper	048633da1a	[MINOR] Improve code readability,remove the continue keyword (#2459 )	2021-01-22 13:47:14 +08:00
Vinoth Chandar	3719e7b388	Moving to 0.8.0-SNAPSHOT on master branch.	2021-01-20 11:31:22 -08:00
vinoth chandar	5ca0625b27	[HUDI 1308] Harden RFC-15 Implementation based on production testing (#2441 ) Addresses leaks, perf degradation observed during testing. These were regressions from the original rfc-15 PoC implementation. * Pass a single instance of HoodieTableMetadata everywhere * Fix tests and add config for enabling metrics - Removed special casing of assumeDatePartitioning inside FSUtils#getAllPartitionPaths() - Consequently, IOException is never thrown and many files had to be adjusted - More diligent handling of open file handles in metadata table - Added config for controlling reuse of connections - Added config for turning off fallback to listing, so we can see tests fail - Changed all ipf listing code to cache/amortize the open/close for better performance - Timelineserver also reuses connections, for better performance - Without timelineserver, when metadata table is opened from executors, reuse is not allowed - HoodieMetadataConfig passed into HoodieTableMetadata#create as argument. - Fix TestHoodieBackedTableMetadata#testSync	2021-01-19 21:20:28 -08:00
Sivabalan Narayanan	a43e191d6c	[MINOR] Bumping snapshot version to 0.7.0 (#2435 )	2021-01-16 09:56:28 -05:00
n3nash	749f657856	[HUDI-1509]: Reverting LinkedHashSet changes to combine fields from oldSchema and newSchema in favor of using only new schema for record rewriting (#2424 )	2021-01-14 12:47:50 -08:00
Udit Mehrotra	7ce3ac778e	[HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths (#2417 ) * [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partition paths * Adding testClass for FileSystemBackedTableMetadata Co-authored-by: Nishith Agarwal <nagarwal@uber.com>	2021-01-10 21:19:52 -08:00
Udit Mehrotra	4e64226844	[HUDI-1450] Use metadata table for listing in HoodieROTablePathFilter (apache#2326) [HUDI-1394] [RFC-15] Use metadata table (if present) to get all partition paths (apache#2351)	2021-01-04 07:59:47 -08:00
Balajee Nagasubramaniam	e33a8f733c	[HUDI-1147] Modify GenericRecordFullPayloadGenerator to generate vali… (#2045 ) * [HUDI-1147] Modify GenericRecordFullPayloadGenerator to generate valid timestamps Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>	2020-12-29 16:33:19 -05:00
Sivabalan Narayanan	8cf6a7223f	[HUDI-1331] Adding support for validating entire dataset and long running tests in test suite framework (#2168 ) * trigger rebuild * [HUDI-1156] Remove unused dependencies from HoodieDeltaStreamerWrapper Class (#1927) * Adding support for validating records and long running tests in test sutie framework * Adding partial validate node * Fixing spark session initiation in Validate nodes * Fixing validation * Adding hive table validation to ValidateDatasetNode * Rebasing with latest commits from master * Addressing feedback * Addressing comments Co-authored-by: lamber-ken <lamberken@163.com> Co-authored-by: linshan-ma <mabin194046@163.com>	2020-12-26 09:29:24 -08:00
Sivabalan Narayanan	33d338f392	[HUDI-115] Adding DefaultHoodieRecordPayload to honor ordering with combineAndGetUpdateValue (#2311 ) * Added ability to pass in `properties` to payload methods, so they can perform table/record specific merges * Added default methods so existing payload classes are backwards compatible. * Adding DefaultHoodiePayload to honor ordering while merging two records * Fixing default payload based on feedback	2020-12-19 19:19:42 -08:00
Balajee Nagasubramaniam	5388c7f7a3	[HUDI-1470] Use the latest writer schema, when reading from existing parquet files in the hudi-test-suite (#2344 )	2020-12-18 19:18:52 +08:00
Danny Chan	4bc45a391a	[HUDI-1445] Refactor AbstractHoodieLogRecordScanner to use Builder (#2313 )	2020-12-10 20:02:02 +08:00
wenningd	fce1453fa6	[HUDI-1040] Make Hudi support Spark 3 (#2208 ) * Fix flaky MOR unit test * Update Spark APIs to make it be compatible with both spark2 & spark3 * Refactor bulk insert v2 part to make Hudi be able to compile with Spark3 * Add spark3 profile to handle fasterxml & spark version * Create hudi-spark-common module & refactor hudi-spark related modules Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-12-09 15:52:23 -08:00
wangxianghu	a23230c8c2	[HUDI-1400] Replace Operation enum with WriteOperationType (#2259 )	2020-11-19 13:40:04 +08:00
Sivabalan Narayanan	a205dd10fa	[HUDI-1338] Adding Delete support to test suite framework (#2172 ) - Adding Delete support to test suite. Added DeleteNode Added support to generate delete records	2020-11-01 00:15:41 -04:00
Prashant Wason	6310a2307a	[HUDI-1351] Improvements to the hudi test suite for scalability and repeated testing. (#2197 ) 1. Added the --clean-input and --clean-output parameters to clean the input and output directories before starting the job 2. Added the --delete-old-input parameter to deleted older batches for data already ingested. This helps keep number of redundant files low. 3. Added the --input-parallelism parameter to restrict the parallelism when generating input data. This helps keeping the number of generated input files low. 4. Added an option start_offset to Dag Nodes. Without ability to specify start offsets, data is generated into existing partitions. With start offset, DAG can control on which partition, the data is to be written. 5. Fixed generation of records for correct number of partitions - In the existing implementation, the partition is chosen as a random long. This does not guarantee exact number of requested partitions to be created. 6. Changed variable blacklistedFields to be a Set as that is faster than List for membership checks. 7. Fixed integer division for Math.ceil. If two integers are divided, the result is not double unless one of the integer is casted to double.	2020-10-29 06:50:37 -07:00
n3nash	e109a61803	1. Fix merge on read DAG to make docker demo pass (#2092 ) 1. Fix merge on read DAG to make docker demo pass (#2092) 2. Fix repeat_count, rollback node	2020-10-28 22:34:26 -04:00
Prashant Wason	49e855c348	[HUDI-1326] Added an API to force publish metrics and flush them. (#2152 ) * [HUDI-1326] Added an API to force publish metrics and flush them. Using the added API, publish metrics after each level of the DAG completed in hudi-test-suite. * Code cleanups Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-24 16:47:24 -07:00
Prashant Wason	788d236c44	[HUDI-1303] Some improvements for the HUDI Test Suite. (#2128 ) 1. Use the DAG Node's label from the yaml as its name instead of UUID names which are not descriptive when debugging issues from logs. 2. Fix CleanNode constructor which is not correctly implemented 3. When generating upsets, allows more granualar control over the number of inserts and upserts - zero or more inserts and upserts can be specified instead of always requiring both inserts and upserts. 4. Fixed generation of records of specific size - The current code was using a class variable "shouldAddMore" which was reset to false after the first record generation causing subsequent records to be of minimum size. - In this change, we pre-calculate the extra size of the complex fields. When generating records, for complex fields we read the field size from this map. 5. Refresh the timeline of the DeltaSync service before calling readFromSource. This ensures that only the newest generated data is read and data generated in the older Dag Nodes is ignored (as their AVRO files will have an older timestamp). 6. Making --workload-generator-classname an optional parameter as most probably the default will be used	2020-10-07 08:33:51 -04:00
Mathieu	1f7add9291	[HUDI-1089] Refactor hudi-client to support multi-engine (#1827 ) - This change breaks `hudi-client` into `hudi-client-common` and `hudi-spark-client` modules - Simple usages of Spark using jsc.parallelize() has been redone using EngineContext#map, EngineContext#flatMap etc - Code changes in the PR, break classes into `BaseXYZ` parent classes with no spark dependencies living in `hudi-client-common` - Classes on `hudi-spark-client` are named `SparkXYZ` extending the parent classes with all the Spark dependencies - To simplify/cleanup, HoodieIndex#fetchRecordLocation has been removed and its usages in tests replaced with alternatives Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-01 14:25:29 -07:00
vinoyang	5aaaf8bff1	[MINOR] Change the log level of the dag scheduler for the test suite (#2134 )	2020-09-30 17:17:44 +08:00
vinoyang	c0c0095fa9	[MINOR] Reformat prepare_integration_suite script (#2126 )	2020-09-28 14:12:57 -07:00
hongdd	2eaba0962a	[HUDI-544] Archived commits command code cleanup (#1242 ) * Archived commits command code cleanup	2020-09-25 09:36:41 -07:00
wenningd	d37977b310	[MINOR] Remove useless config for bootstrap integ testing (#2102 ) Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-09-22 13:29:59 -07:00
Balajee Nagasubramaniam	fec7cd3c97	[HUDI-1130] hudi-test-suite support for schema evolution (can be triggered on any insert/upsert DAG node).	2020-09-08 22:43:59 -07:00
Abhishek Modi	53d1e55110	Test Suite should work with Docker + Unit Tests	2020-09-08 22:41:14 -07:00

1 2 3

111 Commits