lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
wangxianghu	e206ddd431	[MINOR] Private the NoArgsConstructor of SparkMergeHelper and code clean (#2194 )	2020-10-26 12:22:11 +08:00
lw0090	8545ea3856	[HUDI-1118] Cleanup rollback files residing in .hoodie folder (#2205 )	2020-10-25 21:04:56 -07:00
Prashant Wason	49e855c348	[HUDI-1326] Added an API to force publish metrics and flush them. (#2152 ) * [HUDI-1326] Added an API to force publish metrics and flush them. Using the added API, publish metrics after each level of the DAG completed in hudi-test-suite. * Code cleanups Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-24 16:47:24 -07:00
Raymond Xu	14c4611857	[MINOR] Fix caller to SparkBulkInsertCommitActionExecutor (#2195 ) Fixed calling the wrong constructor	2020-10-21 19:50:10 -07:00
lw0090	4d80e1e221	[HUDI-284] add more test for UpdateSchemaEvolution (#2127 ) Unit test different schema evolution scenarios.	2020-10-19 07:38:04 -07:00
wangxianghu	c7d962efff	[HUDI-1328] Introduce HoodieFlinkEngineContext to hudi-flink-client (#2161 )	2020-10-14 09:30:49 +08:00
satishkotha	0d407342ef	[HUDI-1304] Add unit test for testing compaction on replaced file groups (#2150 )	2020-10-12 16:48:29 -07:00
Raymond Xu	c5e10d668f	[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2167 ) Remove APIs in `HoodieTestUtils` - `createCommitFiles` - `createDataFile` - `createNewLogFile` - `createCompactionRequest` Migrated usages in `TestCleaner#testPendingCompactions`. Also improved some API names in `HoodieTestTable`.	2020-10-12 14:39:10 +08:00
hj2016	c0472d3317	[HUDI-1184] Fix the support of hbase index partition path change (#1978 ) When the hbase index is used, when the record partition is changed to another partition, the path does not change according to the value of the partition column Co-authored-by: huangjing <huangjing@clinbrain.com>	2020-10-11 19:05:57 -07:00
dugenkui	b58daf29ba	[MINOR] remove unused generics type (#2163 )	2020-10-11 18:38:42 -07:00
vinoyang	eafd7bf289	[MINOR] Fix wrong javadoc and refactor some naming issues (#2156 )	2020-10-09 15:09:26 -07:00
Raymond Xu	1d1d91d444	[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2143 ) * [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable Remove APIs in `HoodieTestUtils` - listAllDataFilesAndLogFilesInPath - listAllLogFilesInPath - listAllDataFilesInPath - writeRecordsToLogFiles - createCleanFiles - createPendingCleanFiles Migrate the callers to use `HoodieTestTable` and `HoodieWriteableTestTable` with new APIs added - listAllBaseAndLogFiles - listAllLogFiles - listAllBaseFiles - withLogAppends - addClean - addInflightClean Also added related APIs in `FileCreateUtils` - createCleanFile - createRequestedCleanFile - createInflightCleanFile	2020-10-09 10:21:27 +08:00
Pratyaksh Sharma	524193eb4b	[HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode (#1566 ) Co-authored-by: Balaji Varadarajan <balaji.varadarajan@robinhood.com>	2020-10-06 20:34:03 -07:00
lw0090	fdae388626	[HUDI-1203] add port configuration for EmbeddedTimelineService (#2142 )	2020-10-05 11:36:54 -07:00
Prashant Wason	6c610b91ef	[HUDI-1305] Added an API to shutdown and remove the metrics reporter. (#2132 ) This helps in removing reporter once the test has complete. Prevents log pollution from un-necessary metric logs. - Added an API to shutdown the metrics reporter after tests.	2020-10-04 09:30:04 -07:00
Mathieu	1f7add9291	[HUDI-1089] Refactor hudi-client to support multi-engine (#1827 ) - This change breaks `hudi-client` into `hudi-client-common` and `hudi-spark-client` modules - Simple usages of Spark using jsc.parallelize() has been redone using EngineContext#map, EngineContext#flatMap etc - Code changes in the PR, break classes into `BaseXYZ` parent classes with no spark dependencies living in `hudi-client-common` - Classes on `hudi-spark-client` are named `SparkXYZ` extending the parent classes with all the Spark dependencies - To simplify/cleanup, HoodieIndex#fetchRecordLocation has been removed and its usages in tests replaced with alternatives Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-10-01 14:25:29 -07:00
satishkotha	a99e93bed5	[HUDI-1072] Introduce REPLACE top level action. Implement insert_overwrite operation on top of replace action (#2048 )	2020-09-29 17:04:25 -07:00
Raymond Xu	1be0b06ef8	[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2112 ) Remove APIs in HoodieTestUtils - HoodieTestUtils#createInflightCommitFiles - HoodieTestUtils#getCommitFilePath - HoodieTestUtils#doesCommitExist and migrate usages to HoodieTestTable in - hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRollbacksCommand.java - hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestUpgradeDowngradeCommand.java - hudi-cli/src/test/java/org/apache/hudi/cli/integ/ITTestCommitsCommand.java - hudi-cli/src/test/java/org/apache/hudi/cli/testutils/HoodieTestCommitMetadataGenerator.java - hudi-client/src/test/java/org/apache/hudi/client/TestHoodieClientOnCopyOnWriteStorage.java	2020-09-26 21:21:47 +08:00
dugenkui	ae68b2b355	[MINOR] fix typos (#2116 )	2020-09-26 20:40:33 +08:00
dugenkui	6837118c21	[MINOR] Improve description (#2113 )	2020-09-25 22:21:37 +08:00
lw0090	fcc497eff1	[HUDI-1268] fix UpgradeDowngrade fs Rename issue for hdfs and aliyun oss (#2099 )	2020-09-22 09:57:20 -07:00
Kaiux	8087016504	[HUDI-1213] Set Default for the bootstrap config : hoodie.bootstrap.full.input.provider (#2087 )	2020-09-22 03:28:19 -07:00
Raymond Xu	7c45894f43	[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2094 ) Migrate deprecated APIs in HoodieTestUtils to HoodieTestTable for test classes - TestClientRollback - TestCopyOnWriteRollbackActionExecutor Use FileCreateUtils APIs in CompactionTestUtils. Then remove unused deprecated APIs after migration.	2020-09-19 17:55:24 +08:00
Raymond Xu	3201665295	[HUDI-995] Use HoodieTestTable in more classes (#2079 ) * [HUDI-995] Use HoodieTestTable in more classes Migrate test data prep logic in - TestStatsCommand - TestHoodieROTablePathFilter Re-implement methods for create new commit times in HoodieTestUtils and HoodieClientTestHarness - Move relevant APIs to HoodieTestTable - Migrate usages After changing to HoodieTestTable APIs, removed unused deprecated APIs in HoodieTestUtils	2020-09-17 09:29:07 -07:00
shenh062326	581d54097c	[HUDI-1143] Change timestamp field in HoodieTestDataGenerator from double to long	2020-09-15 20:58:29 -07:00
Raymond Xu	83e39e2b17	[HUDI-781] Add HoodieWriteableTestTable (#2040 ) - Introduce HoodieWriteableTestTable for writing records into files - Migrate writeParquetFiles() in HoodieClientTestUtils to HoodieWriteableTestTable - Adopt HoodieWrittableTestTable for test cases in - ITTestRepairsCommand.java - TestHoodieIndex.java - TestHoodieKeyLocationFetchHandle.java - TestHoodieGlobalBloomIndex.java - TestHoodieBloomIndex.java - Renamed HoodieTestTable and FileCreateUtils APIs - dataFile changed to baseFile	2020-09-07 17:54:36 +08:00
Dongwook	8d19ebfd0f	[HUDI-993] Let delete API use "hoodie.delete.shuffle.parallelism" (#1703 ) For Delete API, "hoodie.delete.shuffle.parallelism" isn't used as opposed to "hoodie.upsert.shuffle.parallelism" is used for upsert, this creates the performance difference between delete by upsert API with "EmptyHoodieRecordPayload" and delete API for certain cases. This patch makes the following fixes in this regard. - Let deduplicateKeys method use "hoodie.delete.shuffle.parallelism" - Repartition inputRDD as "hoodie.delete.shuffle.parallelism" in case "hoodie.combine.before.delete=false"	2020-09-01 12:55:31 -04:00
Prashant Wason	6461927eac	[HUDI-960] Implementation of the HFile base and log file format. (#1804 ) * [HUDI-960] Implementation of the HFile base and log file format. 1. Includes HFileWriter and HFileReader 2. Includes HFileInputFormat for both snapshot and realtime input format for Hive 3. Unit test for new code 4. IT for using HFile format and querying using Hive (Presto and SparkSQL are not supported) Advantage: HFile file format saves data as binary key-value pairs. This implementation chooses the following values: 1. Key = Hoodie Record Key (as bytes) 2. Value = Avro encoded GenericRecord (as bytes) HFile allows efficient lookup of a record by key or range of keys. Hence, this base file format is well suited to applications like RFC-15, RFC-08 which will benefit from the ability to lookup records by key or search in a range of keys without having to read the entire data/log format. Limitations: HFile storage format has certain limitations when used as a general purpose data storage format. 1. Does not have a implemented reader for Presto and SparkSQL 2. Is not a columnar file format and hence may lead to lower compression levels and greater IO on query side due to lack of column pruning Other changes: - Remove databricks/avro from pom - Fix HoodieClientTestUtils from not using scala imports/reflection based conversion etc - Breaking up limitFileSize(), per parquet and hfile base files - Added three new configs for HoodieHFileConfig - prefetchBlocksOnOpen, cacheDataInL1, dropBehindCacheCompaction - Throw UnsupportedException in HFileReader.getRecordKeys() - Updated HoodieCopyOnWriteTable to create the correct merge handle (HoodieSortedMergeHandle for HFile and HoodieMergeHandle otherwise) * Fixing checkstyle Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-31 08:05:59 -07:00
Raymond Xu	0360bef217	[MINOR] Improve helper methods in TestCleaner (#2052 ) - Use private static assert methods - Use ParameterizedTest - Rename HoodieTestTable APIs	2020-08-29 14:06:25 +08:00
Mathieu	fa81248247	[HUDI-531] Add java doc for hudi test suite general classes (#1900 )	2020-08-28 08:44:40 +08:00
Mathieu	7e68c42eb1	[HUDI-1223] Remove unused UpdateHandler class in HoodieCopyOnWriteTable (#2032 )	2020-08-26 08:46:19 +08:00
Trevor	6a4dc7384c	[HUDI-1218] Introduce BulkInsertSortMode as Independent class (#2021 )	2020-08-25 19:04:13 +08:00
Trevor	7291607ae3	[MINOR] Remove unused log code in HoodieReadClient (#2000 )	2020-08-22 21:45:50 +08:00
Shen Hong	1d09c02f1c	[HUDI-1083] Optimization in determining insert bucket location for a given key (#1868 ) - To determine insert bucket location for a given key, hudi walks through all insert buckets with O(N) cost, while this patch adds an optimization to make it O(logN).	2020-08-22 07:41:39 -04:00
Raymond Xu	3a2ae16961	[HUDI-781] Introduce HoodieTestTable for test preparation (#1997 )	2020-08-21 11:46:33 +08:00
Mathieu	34c8c9e3ea	[MINOR] Move HoodieUpgradeDowngradeException to exception package (#1993 )	2020-08-20 23:12:20 +08:00
Mathieu	b883b6d268	[HUDI-1122] Introduce a kafka implementation of hoodie write commit ca… (#1886 )	2020-08-20 23:00:59 +08:00
Mathieu	bd7814dadf	[HUDI-1206] Remove unused variable in Compactor (#1994 )	2020-08-20 18:18:36 +08:00
Ryan Pifer	1137b0b343	Fix HBASE index MOR tables not considering record index valid	2020-08-19 14:55:59 -07:00
Abhishek Modi	bedbb825e0	[HUDI-1025] Meter RPC calls in HoodieWrapperFileSystem (#1916 )	2020-08-18 22:42:05 +08:00
Bhavani Sudha Saktheeswaran	4226d75144	Moving to 0.6.1-SNAPSHOT on master branch.	2020-08-14 12:54:15 -07:00
vinoth chandar	9bde6d616c	[HUDI-1190] Introduce @PublicAPIClass and @PublicAPIMethod annotations to mark public APIs (#1965 ) - Maturity levels one of : evolving, stable, deprecated - Took a pass and marked out most of the existing public API	2020-08-13 23:28:17 -07:00
Sivabalan Narayanan	379cf0786f	[HUDI-1013] Adding Bulk Insert V2 implementation (#1834 ) - Adding ability to use native spark row writing for bulk_insert - Controlled by `ENABLE_ROW_WRITER_OPT_KEY` datasource write option - Introduced KeyGeneratorInterface in hudi-client, moved KeyGenerator back to hudi-spark - Simplified the new API additions to just two new methods : getRecordKey(row), getPartitionPath(row) - Fixed all built-in key generators with new APIs - Made the field position map lazily created upon the first call to row based apis - Implemented native row based key generators for CustomKeyGenerator - Fixed all the tests, with these new APIs Co-authored-by: Balaji Varadarajan <varadarb@uber.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-13 00:33:39 -07:00
wenningd	8b928e9bca	[HUDI-808] Support cleaning bootstrap source data (#1870 ) Co-authored-by: Wenning Ding <wenningd@amazon.com> Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>	2020-08-11 01:43:46 -07:00
Balaji Varadarajan	626f78f6f6	Revert "[HUDI-781] Introduce HoodieTestTable for test preparation (#1871 )" This reverts commit `b2e703d442`.	2020-08-10 22:13:02 -07:00
Raymond Xu	b2e703d442	[HUDI-781] Introduce HoodieTestTable for test preparation (#1871 )	2020-08-11 09:44:03 +08:00
Sivabalan Narayanan	858eda85d7	[HUDI-1098] Adding OptimisticConsistencyGuard to be used during FinalizeWrite (#1912 )	2020-08-09 17:51:37 -07:00
Sivabalan Narayanan	ff53e8f0b6	[HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback (#1858 ) - This pull request adds upgrade/downgrade infra for smooth transition from list based rollback to marker based rollback* - A new property called hoodie.table.version is added to hoodie.properties file as part of this. Whenever hoodie is launched with newer table version i.e 1(or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically to adhere to marker based rollback.* - This automatic upgrade step will happen just once per dataset as the hoodie.table.version will be updated in property file after upgrade is completed once* - Similarly, a command line tool for Downgrading is added if incase some user wants to downgrade hoodie from table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0* - Added UpgradeDowngrade to assist in upgrading or downgrading hoodie table - Added Interfaces for upgrade and downgrade and concrete implementations for upgrading from 0 to 1 and downgrading from 1 to 0. - Made some changes to ListingBasedRollbackHelper to expose just rollback stats w/o performing actual rollback, which will be consumed by Upgrade infra - Reworking failure handling for upgrade/downgrade - Changed tests accordingly, added one test around left over cleanup - New tables now write table version into hoodie.properties - Clean up code naming, abstractions. Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-09 15:32:43 -07:00
Udit Mehrotra	e4a2d98f79	[HUDI-426] Bootstrap datasource integration (#1702 )	2020-08-09 14:06:13 -07:00
liujinhui	6b349b7711	[HUDI-210] Hudi Supports Prometheus Pushgateway (#1931 ) Co-authored-by: leesf <leesf@apache.org>	2020-08-09 15:29:54 +08:00

1 2 3 4 5 ...

282 Commits