lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
zhangminglei	fe3f5c2d56	[HUDI-1913] Using streams instead of loops for input/output (#2962 )	2021-05-19 09:13:38 +08:00
TeRS-K	be9db2c4f5	[HUDI-1055] Remove hardcoded parquet in tests (#2740 ) * Remove hardcoded parquet in tests * Use DataFileUtils.getInstance * Renaming DataFileUtils to BaseFileUtils Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-05-11 10:01:45 -07:00
Volodymyr Burenin	8a48d16e41	[HUDI-1707] Reduces log level for too verbose messages from info to debug level. (#2714 ) * Reduces log level for too verbose messages from info to debug level. * Sort config output. * Code Review : Small restructuring + rebasing to master - Fixing flaky multi delta streamer test - Using isDebugEnabled() checks - Some changes to shorten log message without moving to DEBUG Co-authored-by: volodymyr.burenin <volodymyr.burenin@cloudkitchens.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2021-05-10 07:16:02 -07:00
Nick Young	f4e3b94971	[HUDI-1742] Improve table level config priority for HoodieMultiTableDeltaStreamer (#2744 )	2021-04-26 22:05:06 +08:00
Sivabalan Narayanan	3e4fa170cf	[HUDI-1835] Fixing kafka native config param for auto offset reset (#2864 )	2021-04-25 12:16:09 -04:00
pengzhiwei	aacb8be521	[HUDI-1415] Read Hoodie Table As Spark DataSource Table (#2283 )	2021-04-20 14:21:38 -07:00
Aditya Tiwari	ec2334ceac	[HUDI-1716]: Resolving default values for schema from dataframe (#2765 ) - Adding default values and setting null as first entry in UNION data types in avro schema. Co-authored-by: Aditya Tiwari <aditya.tiwari@flipkart.com>	2021-04-19 10:05:20 -04:00
Gary Li	452f5e2d66	[HOTFIX] close spark session in functional test suite and disable spark3 test for spark2 (#2727 )	2021-03-29 06:04:48 -07:00
n3nash	bec70413c0	[HUDI-1728] Fix MethodNotFound for HiveMetastore Locks (#2731 )	2021-03-27 10:07:10 -07:00
n3nash	01a1d7997b	[HUDI-1712] Rename & standardize config to match other configs (#2708 )	2021-03-24 17:24:02 +08:00
n3nash	d7b18783bd	[HUDI-1709] Improving config names and adding hive metastore uri config (#2699 )	2021-03-22 01:22:06 -07:00
Volodymyr Burenin	900de34e45	[HUDI-1650] Custom avro kafka deserializer. (#2619 ) * Custom avro kafka deserializer Co-authored-by: volodymyr.burenin <volodymyr.burenin@cloudkitchens.com> Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>	2021-03-20 00:51:08 -07:00
Sivabalan Narayanan	161d530f93	Fixing kafka auto.reset.offsets config param key (#2691 )	2021-03-19 12:54:29 -07:00
n3nash	74241947c1	[HUDI-845] Added locking capability to allow multiple writers (#2374 ) * [HUDI-845] Added locking capability to allow multiple writers 1. Added LockProvider API for pluggable lock methodologies 2. Added Resolution Strategy API to allow for pluggable conflict resolution 3. Added TableService client API to schedule table services 4. Added Transaction Manager for wrapping actions within transactions	2021-03-16 16:43:53 -07:00
Ankush Kanungo	f5e31be086	[HUDI-1685] keep updating current date for every batch (#2671 )	2021-03-12 15:53:01 -08:00
Sivabalan Narayanan	5cf2f2618b	[HUDI-1618] Fixing NPE with Parquet src in multi table delta streamer (#2577 )	2021-03-07 16:40:40 -05:00
pengzhiwei	bc883db5de	[HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig (#2596 )	2021-03-05 14:10:27 +08:00
Raymond Xu	f53bca404f	[HUDI-1655] Support custom date format and fix unsupported exception in DatePartitionPathSelector (#2621 ) - Add a config to allow parsing custom date format in `DatePartitionPathSelector`. Currently it assumes date partition string in the format of `yyyy-MM-dd`. - Fix a bug where `UnsupportedOperationException` was thrown when sort `eligibleFiles` in-place. Changed to sort it and store in a new list.	2021-03-04 21:01:51 -08:00
liujinhui	617cc24ad1	[HUDI-1367] Make deltaStreamer transition from dfsSouce to kafkasouce (#2227 ) Co-authored-by: Sivabalan Narayanan <sivabala@uber.com>	2021-02-25 07:08:13 -05:00
Sivabalan Narayanan	c9fcf964b2	[HUDI-1315] Adding builder for HoodieTableMetaClient initialization (#2534 )	2021-02-20 09:54:26 +08:00
lw0090	368c1a8f5c	[HUDI-1399] support a independent clustering spark job to asynchronously clustering (#2379 ) * [HUDI-1481] add structured streaming and delta streamer clustering unit test * [HUDI-1399] support a independent clustering spark job to asynchronously clustering * [HUDI-1399] support a independent clustering spark job to asynchronously clustering * [HUDI-1498] Read clustering plan from requested file for inflight instant (#2389) * [HUDI-1399] support a independent clustering spark job with schedule generate instant time Co-authored-by: satishkotha <satishkotha@uber.com>	2021-01-09 17:30:16 -08:00
Udit Mehrotra	4e64226844	[HUDI-1450] Use metadata table for listing in HoodieROTablePathFilter (apache#2326) [HUDI-1394] [RFC-15] Use metadata table (if present) to get all partition paths (apache#2351)	2021-01-04 07:59:47 -08:00
lw0090	9e6889a8ce	[HUDI-1481] add structured streaming and delta streamer clustering unit test (#2360 )	2020-12-27 20:27:09 -08:00
Bhavani Sudha Saktheeswaran	14d5d1100c	[HUDI-1406] Add date partition based source input selector for Delta streamer (#2264 ) - Adds ability to list only recent date based partitions from source data. - Parallelizes listing for faster tailing of DFSSources	2020-12-17 03:59:30 -08:00
liujinhui	62b392b49c	[HUDI-1343] Add standard schema postprocessor which would rewrite the schema using spark-avro conversion (#2192 ) Co-authored-by: liujh <liujh@t3go.cn>	2020-12-03 19:28:34 -08:00
wangxianghu	a23230c8c2	[HUDI-1400] Replace Operation enum with WriteOperationType (#2259 )	2020-11-19 13:40:04 +08:00
Ho Tien Vu	af5ef4d49d	[HUDI-1330] handle prefix filtering at directory level (#2157 ) The current DFSPathSelector only ignore prefix(_, .) at the file level while files under subdirectories e.g. (.checkpoint/*) are still considered which result in bad-format exception during reading.	2020-10-20 23:20:19 -07:00
lw0090	2126f13e13	[HUDI-791] Replace null by Option in Delta Streamer (#2171 )	2020-10-11 18:29:57 -07:00
Mathieu	1f7add9291	[HUDI-1089] Refactor hudi-client to support multi-engine (#1827 ) - This change breaks `hudi-client` into `hudi-client-common` and `hudi-spark-client` modules - Simple usages of Spark using jsc.parallelize() has been redone using EngineContext#map, EngineContext#flatMap etc - Code changes in the PR, break classes into `BaseXYZ` parent classes with no spark dependencies living in `hudi-client-common` - Classes on `hudi-spark-client` are named `SparkXYZ` extending the parent classes with all the Spark dependencies - To simplify/cleanup, HoodieIndex#fetchRecordLocation has been removed and its usages in tests replaced with alternatives Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-01 14:25:29 -07:00
liujinhui	20b9b399c9	[HUDI-1233] Deltastreamer Kafka consumption delay reporting indicators (#2074 )	2020-09-29 13:44:31 +08:00
Alexander Filipchik	c8e19e2def	[HUDI-801] Adding a way to post process schema after it is fetched (#1524 ) * [HUDI-801] Adding a way to post process schema after it is fetched Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com> Co-authored-by: Balaji Varadarajan <balaji.varadarajan@robinhood.com>	2020-09-19 11:18:36 -07:00
shenh062326	581d54097c	[HUDI-1143] Change timestamp field in HoodieTestDataGenerator from double to long	2020-09-15 20:58:29 -07:00
Abhishek Modi	53d1e55110	Test Suite should work with Docker + Unit Tests	2020-09-08 22:41:14 -07:00
Dongwook	8d19ebfd0f	[HUDI-993] Let delete API use "hoodie.delete.shuffle.parallelism" (#1703 ) For Delete API, "hoodie.delete.shuffle.parallelism" isn't used as opposed to "hoodie.upsert.shuffle.parallelism" is used for upsert, this creates the performance difference between delete by upsert API with "EmptyHoodieRecordPayload" and delete API for certain cases. This patch makes the following fixes in this regard. - Let deduplicateKeys method use "hoodie.delete.shuffle.parallelism" - Repartition inputRDD as "hoodie.delete.shuffle.parallelism" in case "hoodie.combine.before.delete=false"	2020-09-01 12:55:31 -04:00
Udit Mehrotra	e4a2d98f79	[HUDI-426] Bootstrap datasource integration (#1702 )	2020-08-09 14:06:13 -07:00
Sreeram Ramji	217a84192c	[HUDI-1140] Fix Jcommander issue for --hoodie-conf in DeltaStreamer (#1898 )	2020-08-04 21:42:51 -07:00
Sivabalan Narayanan	ab11ba43e1	[REVERT] "[HUDI-1058] Make delete marker configurable (#1819 )" (#1914 ) This reverts commit `433d7d2c98`.	2020-08-04 15:20:38 -07:00
vinoth chandar	539621bd33	[HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876 ) - [HUDI-418] Bootstrap Index Implementation using HFile with unit-test - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-421] Bootstrap Write Client with tests - [HUDI-425] Added HoodieDeltaStreamer support - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly - [HUDI-424] Simplify Record reader implementation - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables Co-authored-by: Mehrotra <uditme@amazon.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-08-03 20:19:21 -07:00
Shen Hong	433d7d2c98	[HUDI-1058] Make delete marker configurable (#1819 )	2020-08-03 11:06:31 -04:00
n3nash	727f1df62c	[MINOR] Suppressing spark logs for hudi-integ and hudi-utilities (#1894 )	2020-07-31 19:01:25 -07:00
Nishith Agarwal	2fc2b01d86	[HUDI-394] Provide a basic implementation of test suite	2020-07-30 21:21:15 -07:00
Raymond Xu	ca36c44cb3	[HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common (#1873 )	2020-07-27 19:21:45 +08:00
Sivabalan Narayanan	5b6026ba43	[HUDI-802] Fixing deletes for inserts in same batch in write path (#1792 ) * Fixing deletes for inserts in same batch in write path * Fixing delta streamer tests * Adding tests for OverwriteWithLatestAvroPayload	2020-07-22 19:39:57 -07:00
lw0090	1ec89e9a94	[HUDI-839] Introducing support for rollbacks using marker files (#1756 ) * [HUDI-839] Introducing rollback strategy using marker files - Adds a new mechanism for rollbacks where it's based on the marker files generated during the write - Consequently, marker file/dir deletion now happens post commit, instead of during finalize - Marker files are also generated for AppendHandle, making it consistent throughout the write path - Until upgrade-downgrade mechanism can upgrade non-marker based inflight writes to marker based, this should only be turned on for new datasets. - Added marker dir deletion after successful commit/rollback, individual files are not deleted during finalize - Fail safe for deleting marker directories, now during timeline archival process - Added check to ensure completed instants are not rolled back using marker based strategy. This will be incorrect - Reworked tests to rollback inflight instants, instead of completed instants whenever necessary - Added an unit test for MarkerBasedRollbackStrategy Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-07-20 22:41:42 -07:00
Pratyaksh Sharma	c7f1a781ab	[HUDI-728]: Implemented custom key generator (#1433 )	2020-07-09 07:35:07 -04:00
Trevor	d58644b657	[HUDI-1062]Remove unnecessary maxEvent check and add some log in KafkaOffsetGen (#1779 )	2020-07-08 21:07:34 -07:00
Raymond Xu	3b9a30528b	[HUDI-996] Add functional test suite for hudi-utilities (#1746 ) - Share resources for functional tests - Add suite for functional test classes from hudi-utilities	2020-07-05 16:44:31 -07:00
Shen Hong	89e37d5273	[HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690 )	2020-06-22 08:13:28 -07:00
Raymond Xu	ab724af5c4	[MINOR] Rename `TestSourceConfig` to `SourceConfigs` (#1749 )	2020-06-19 12:08:19 -07:00
Litianye	ede6c9bda4	[HUDI-1006] Deltastreamer use kafkaSource with offset reset strategy:latest can't consume data (#1719 )	2020-06-14 18:01:44 +08:00

1 2 3

124 Commits