lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
chuangehh	51b16bd36f	[MINOR] fix typo	2020-09-08 11:55:38 +08:00
Prashant Wason	fe7c9e71eb	[MINOR] Fix BindException when running tests of shared machines. (#2070 ) When unit tests are run on shared machines (e.g. jenkins cluster), the unit tests sometimes fail due to BindException in starting HDFS Cluster. This is because the port chosen may have been bound by another process using the same machine. The fix is to retry the port selection a few times.	2020-09-07 19:30:45 -07:00
Raymond Xu	83e39e2b17	[HUDI-781] Add HoodieWriteableTestTable (#2040 ) - Introduce HoodieWriteableTestTable for writing records into files - Migrate writeParquetFiles() in HoodieClientTestUtils to HoodieWriteableTestTable - Adopt HoodieWrittableTestTable for test cases in - ITTestRepairsCommand.java - TestHoodieIndex.java - TestHoodieKeyLocationFetchHandle.java - TestHoodieGlobalBloomIndex.java - TestHoodieBloomIndex.java - Renamed HoodieTestTable and FileCreateUtils APIs - dataFile changed to baseFile	2020-09-07 17:54:36 +08:00
Sreeram Ramji	6537af2676	[HUDI-1153] Spark DataSource and Streaming Write must fail when operation type is misconfigured (#2014 )	2020-09-04 09:08:30 -07:00
Dongwook	8d19ebfd0f	[HUDI-993] Let delete API use "hoodie.delete.shuffle.parallelism" (#1703 ) For Delete API, "hoodie.delete.shuffle.parallelism" isn't used as opposed to "hoodie.upsert.shuffle.parallelism" is used for upsert, this creates the performance difference between delete by upsert API with "EmptyHoodieRecordPayload" and delete API for certain cases. This patch makes the following fixes in this regard. - Let deduplicateKeys method use "hoodie.delete.shuffle.parallelism" - Repartition inputRDD as "hoodie.delete.shuffle.parallelism" in case "hoodie.combine.before.delete=false"	2020-09-01 12:55:31 -04:00
Gary Li	48a58c98a1	[MINOR] fix get classname for hive sync (#2008 )	2020-08-31 16:26:10 -07:00
Prashant Wason	6461927eac	[HUDI-960] Implementation of the HFile base and log file format. (#1804 ) * [HUDI-960] Implementation of the HFile base and log file format. 1. Includes HFileWriter and HFileReader 2. Includes HFileInputFormat for both snapshot and realtime input format for Hive 3. Unit test for new code 4. IT for using HFile format and querying using Hive (Presto and SparkSQL are not supported) Advantage: HFile file format saves data as binary key-value pairs. This implementation chooses the following values: 1. Key = Hoodie Record Key (as bytes) 2. Value = Avro encoded GenericRecord (as bytes) HFile allows efficient lookup of a record by key or range of keys. Hence, this base file format is well suited to applications like RFC-15, RFC-08 which will benefit from the ability to lookup records by key or search in a range of keys without having to read the entire data/log format. Limitations: HFile storage format has certain limitations when used as a general purpose data storage format. 1. Does not have a implemented reader for Presto and SparkSQL 2. Is not a columnar file format and hence may lead to lower compression levels and greater IO on query side due to lack of column pruning Other changes: - Remove databricks/avro from pom - Fix HoodieClientTestUtils from not using scala imports/reflection based conversion etc - Breaking up limitFileSize(), per parquet and hfile base files - Added three new configs for HoodieHFileConfig - prefetchBlocksOnOpen, cacheDataInL1, dropBehindCacheCompaction - Throw UnsupportedException in HFileReader.getRecordKeys() - Updated HoodieCopyOnWriteTable to create the correct merge handle (HoodieSortedMergeHandle for HFile and HoodieMergeHandle otherwise) * Fixing checkstyle Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-31 08:05:59 -07:00
Mathieu	6df8f88d86	[HUDI-1252] Remove unused class NoOpBulkInsertPartitioner in DataSourceTestUtils (#2054 )	2020-08-31 03:03:10 -07:00
Thinking Chen	6b417d1a86	[HUDI-1225] Fix: Avro Date logical type not handled correctly when converting to Spark Row (#2047 )	2020-08-29 01:16:42 -07:00
Raymond Xu	0360bef217	[MINOR] Improve helper methods in TestCleaner (#2052 ) - Use private static assert methods - Use ParameterizedTest - Rename HoodieTestTable APIs	2020-08-29 14:06:25 +08:00
Satish Kotha	4dbeabffa3	[HUDI-1228] Add utility method to query extra metadata	2020-08-28 12:23:47 -07:00
Mathieu	fa81248247	[HUDI-531] Add java doc for hudi test suite general classes (#1900 )	2020-08-28 08:44:40 +08:00
Sivabalan Narayanan	3a578d7402	[HUDI-1056] Fix release validate script for rc_num and release_type (#2025 )	2020-08-26 09:26:33 -07:00
hongdd	dedc4517dd	[HUDI-978] Specify version information for each component separately (#1772 )	2020-08-26 21:08:09 +08:00
Satish Kotha	f468c20c6c	[HUDI-1226] Fix ComplexKeyGenerator for non-partitioned tables	2020-08-25 20:55:48 -07:00
Mathieu	df8f099c99	[HUDI-532] Add java doc for the test classes of hudi test suite (#1901 )	2020-08-26 08:49:01 +08:00
Mathieu	7e68c42eb1	[HUDI-1223] Remove unused UpdateHandler class in HoodieCopyOnWriteTable (#2032 )	2020-08-26 08:46:19 +08:00
Balajee Nagasubramaniam	cc555ba188	[HUDI-1133] Tune buffer sizes for the diskbased external spillable map	2020-08-25 14:23:58 -07:00
Satish Kotha	492ddcbb06	[HUDI-1191] Add incremental meta client API to query partitions modified in a time window	2020-08-25 12:40:10 -07:00
Trevor	6a4dc7384c	[HUDI-1218] Introduce BulkInsertSortMode as Independent class (#2021 )	2020-08-25 19:04:13 +08:00
Prashant Wason	218d4a6836	[HUDI-1135] Make timeline server timeout settings configurable.	2020-08-24 18:09:00 -07:00
Prashant Wason	9b1f16b604	[HUDI-1136] Add back findInstantsAfterOrEquals to the HoodieTimeline class.	2020-08-24 18:08:17 -07:00
Bhavani Sudha Saktheeswaran	f7e02aa8a3	[MINOR] Update DOAP with 0.6.0 Release (#2024 )	2020-08-24 14:47:38 -07:00
Satish Kotha	ea983ff912	[HUDI-1137] Add option to configure different path selector	2020-08-24 13:26:44 -07:00
Raymond Xu	111a9753a0	[MINOR] Update README.md (#2010 ) - add maven profile to test running commands - remove -DskipITs for packaging commands	2020-08-24 09:28:29 -07:00
Mathieu	f8dcd5334e	[HUDI-1217] Improve avroToBytes method of HoodieAvroUtils (#2018 )	2020-08-24 17:33:28 +08:00
Mathieu	35b21855da	[HUDI-1150] Fix unable to parse input partition field :1 exception when using TimestampBasedKeyGenerator(#1920 )	2020-08-23 19:56:50 +08:00
Trevor	7291607ae3	[MINOR] Remove unused log code in HoodieReadClient (#2000 )	2020-08-22 21:45:50 +08:00
Shen Hong	1d09c02f1c	[HUDI-1083] Optimization in determining insert bucket location for a given key (#1868 ) - To determine insert bucket location for a given key, hudi walks through all insert buckets with O(N) cost, while this patch adds an optimization to make it O(logN).	2020-08-22 07:41:39 -04:00
liujinhui	bfdce7b082	[HUDI-1193](Upgrade http dependency version) (#1970 )	2020-08-21 20:24:04 +08:00
Raymond Xu	3a2ae16961	[HUDI-781] Introduce HoodieTestTable for test preparation (#1997 )	2020-08-21 11:46:33 +08:00
Mathieu	34c8c9e3ea	[MINOR] Move HoodieUpgradeDowngradeException to exception package (#1993 )	2020-08-20 23:12:20 +08:00
Mathieu	b883b6d268	[HUDI-1122] Introduce a kafka implementation of hoodie write commit ca… (#1886 )	2020-08-20 23:00:59 +08:00
Mathieu	bd7814dadf	[HUDI-1206] Remove unused variable in Compactor (#1994 )	2020-08-20 18:18:36 +08:00
Pratyaksh Sharma	a2312fa1b7	[HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator (#1987 ) Co-authored-by: Bhavani Sudha Saktheeswaran <bhavanisudhas@gmail.com>	2020-08-19 17:43:34 -07:00
Ryan Pifer	1137b0b343	Fix HBASE index MOR tables not considering record index valid	2020-08-19 14:55:59 -07:00
Bhavani Sudha Saktheeswaran	6fa371a79c	[MINOR] Fix release script for onetime uploading of gpgkeys (#1949 )	2020-08-18 21:29:52 -07:00
Bhavani Sudha Saktheeswaran	824f23bcb8	[HUDI-1197] Fix import issue that fails scala 2.12 build (#1976 )	2020-08-18 08:41:16 -07:00
Abhishek Modi	bedbb825e0	[HUDI-1025] Meter RPC calls in HoodieWrapperFileSystem (#1916 )	2020-08-18 22:42:05 +08:00
Bhavani Sudha Saktheeswaran	4226d75144	Moving to 0.6.1-SNAPSHOT on master branch.	2020-08-14 12:54:15 -07:00
Balaji Varadarajan	b8f4a30efd	Fix Integration test flakiness in HoodieJavaStreamingApp (#1967 )	2020-08-14 01:42:15 -07:00
vinoth chandar	9bde6d616c	[HUDI-1190] Introduce @PublicAPIClass and @PublicAPIMethod annotations to mark public APIs (#1965 ) - Maturity levels one of : evolving, stable, deprecated - Took a pass and marked out most of the existing public API	2020-08-13 23:28:17 -07:00
Sivabalan Narayanan	379cf0786f	[HUDI-1013] Adding Bulk Insert V2 implementation (#1834 ) - Adding ability to use native spark row writing for bulk_insert - Controlled by `ENABLE_ROW_WRITER_OPT_KEY` datasource write option - Introduced KeyGeneratorInterface in hudi-client, moved KeyGenerator back to hudi-spark - Simplified the new API additions to just two new methods : getRecordKey(row), getPartitionPath(row) - Fixed all built-in key generators with new APIs - Made the field position map lazily created upon the first call to row based apis - Implemented native row based key generators for CustomKeyGenerator - Fixed all the tests, with these new APIs Co-authored-by: Balaji Varadarajan <varadarb@uber.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-13 00:33:39 -07:00
Udit Mehrotra	8d04268264	[HUDI-1174] Changes for bootstrapped tables to work with presto (#1944 ) The purpose of this pull request is to implement changes required on Hudi side to get Bootstrapped tables integrated with Presto. The testing was done against presto 0.232 and following changes were identified to make it work: Annotation UseRecordReaderFromInputFormat is required on HoodieParquetInputFormat as well, because the reading for bootstrapped tables needs to happen through record reader to be able to perform the merge. On presto side, this annotation is already handled. We need to internally maintain VIRTUAL_COLUMN_NAMES because presto's internal hive version hive-apache-1.2.2 has VirutalColumn as a class, versus the one we depend on in hudi which is an enum. Dependency changes in hudi-presto-bundle to avoid runtime exceptions.	2020-08-12 17:51:31 -07:00
wenningd	8b928e9bca	[HUDI-808] Support cleaning bootstrap source data (#1870 ) Co-authored-by: Wenning Ding <wenningd@amazon.com> Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>	2020-08-11 01:43:46 -07:00
Balaji Varadarajan	626f78f6f6	Revert "[HUDI-781] Introduce HoodieTestTable for test preparation (#1871 )" This reverts commit `b2e703d442`.	2020-08-10 22:13:02 -07:00
Sivabalan Narayanan	9c24151929	[HUDI-1175] Commenting out testsuite tests from Integration tests until we investigate the CI flakiness (#1945 )	2020-08-10 21:00:57 -07:00
Raymond Xu	b2e703d442	[HUDI-781] Introduce HoodieTestTable for test preparation (#1871 )	2020-08-11 09:44:03 +08:00
liujinhui	934f00b689	[HUDI-1173] fix hudi-prometheus pom dependency (#1942 )	2020-08-11 09:06:17 +08:00
Sivabalan Narayanan	858eda85d7	[HUDI-1098] Adding OptimisticConsistencyGuard to be used during FinalizeWrite (#1912 )	2020-08-09 17:51:37 -07:00

1 2 3 4 5 ...

1133 Commits