lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
dugenkui	d4d4c8c899	[MINOR] Fix typo and others (#2164 ) * remove HoodieSerializationException that will never be throw * remove unused method, make HoodieException more readable * fix typo	2020-10-11 17:52:44 -07:00
lw0090	585ce0094d	[HUDI-1301] use spark INCREMENTAL mode query hudi dataset support schema version. (#2125 )	2020-10-10 20:53:41 +08:00
dugenkui	00271af64e	[MINOR] Fix typo (#2159 ) * fix typo * fix typo	2020-10-09 14:52:55 -07:00
Raymond Xu	1d1d91d444	[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2143 ) * [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable Remove APIs in `HoodieTestUtils` - listAllDataFilesAndLogFilesInPath - listAllLogFilesInPath - listAllDataFilesInPath - writeRecordsToLogFiles - createCleanFiles - createPendingCleanFiles Migrate the callers to use `HoodieTestTable` and `HoodieWriteableTestTable` with new APIs added - listAllBaseAndLogFiles - listAllLogFiles - listAllBaseFiles - withLogAppends - addClean - addInflightClean Also added related APIs in `FileCreateUtils` - createCleanFile - createRequestedCleanFile - createInflightCleanFile	2020-10-09 10:21:27 +08:00
Shen Hong	b335459c80	[HUDI-1208] Ordering Field should be optional when precombine is turned off (#2088 )	2020-10-04 11:34:21 -07:00
satishkotha	a99e93bed5	[HUDI-1072] Introduce REPLACE top level action. Implement insert_overwrite operation on top of replace action (#2048 )	2020-09-29 17:04:25 -07:00
hongdd	32c9cad52c	[HUDI-840] Avoid blank file created by HoodieLogFormatWriter (#1567 )	2020-09-29 08:02:15 -07:00
leesf	b0f1b736f8	[MINOR] Fix checkstyle (#2117 )	2020-09-26 22:25:19 +08:00
Raymond Xu	1be0b06ef8	[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2112 ) Remove APIs in HoodieTestUtils - HoodieTestUtils#createInflightCommitFiles - HoodieTestUtils#getCommitFilePath - HoodieTestUtils#doesCommitExist and migrate usages to HoodieTestTable in - hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRollbacksCommand.java - hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestUpgradeDowngradeCommand.java - hudi-cli/src/test/java/org/apache/hudi/cli/integ/ITTestCommitsCommand.java - hudi-cli/src/test/java/org/apache/hudi/cli/testutils/HoodieTestCommitMetadataGenerator.java - hudi-client/src/test/java/org/apache/hudi/client/TestHoodieClientOnCopyOnWriteStorage.java	2020-09-26 21:21:47 +08:00
dugenkui	ae68b2b355	[MINOR] fix typos (#2116 )	2020-09-26 20:40:33 +08:00
Raymond Xu	7c45894f43	[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2094 ) Migrate deprecated APIs in HoodieTestUtils to HoodieTestTable for test classes - TestClientRollback - TestCopyOnWriteRollbackActionExecutor Use FileCreateUtils APIs in CompactionTestUtils. Then remove unused deprecated APIs after migration.	2020-09-19 17:55:24 +08:00
Raymond Xu	3201665295	[HUDI-995] Use HoodieTestTable in more classes (#2079 ) * [HUDI-995] Use HoodieTestTable in more classes Migrate test data prep logic in - TestStatsCommand - TestHoodieROTablePathFilter Re-implement methods for create new commit times in HoodieTestUtils and HoodieClientTestHarness - Move relevant APIs to HoodieTestTable - Migrate usages After changing to HoodieTestTable APIs, removed unused deprecated APIs in HoodieTestUtils	2020-09-17 09:29:07 -07:00
shenh062326	581d54097c	[HUDI-1143] Change timestamp field in HoodieTestDataGenerator from double to long	2020-09-15 20:58:29 -07:00
Karl-WangSK	a1cff8abae	[HUDI-1255] Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage (#2056 ) Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage ## Brief change log update current value for several fields that you want to change. The default payload OverwriteWithLatestAvroPayload overwrite the whole record when compared to `orderingVal`.This doesn't meet our need when we just want to change specified fields. For example: (suppose Default value is null) ``` current Value Field: name age gender Value: karl 20 male ``` ``` insert Value Field: name age gender Value: null 30 null ``` ``` After insert: Field: name age gender Value: karl 30 male ``` ## Verify this pull request Added TestOverwriteNonDefaultsWithLatestAvroPayload to verify the change.	2020-09-09 21:54:21 -07:00
linshan-ma	063a98fc2b	[HUDI-1254] TypedProperties can not get values by initializing an existing properties (#2059 )	2020-09-09 23:42:41 +08:00
Abhishek Modi	53d1e55110	Test Suite should work with Docker + Unit Tests	2020-09-08 22:41:14 -07:00
wenningd	2fee087f0f	[HUDI-1181] Fix decimal type display issue for record key field (#1953 ) * [HUDI-1181] Fix decimal type display issue for record key field * Remove getNestedFieldVal method from DataSourceUtils * resolve comments Co-authored-by: Wenning Ding <wenningd@amazon.com>	2020-09-08 17:50:54 -07:00
Prashant Wason	fe7c9e71eb	[MINOR] Fix BindException when running tests of shared machines. (#2070 ) When unit tests are run on shared machines (e.g. jenkins cluster), the unit tests sometimes fail due to BindException in starting HDFS Cluster. This is because the port chosen may have been bound by another process using the same machine. The fix is to retry the port selection a few times.	2020-09-07 19:30:45 -07:00
Raymond Xu	83e39e2b17	[HUDI-781] Add HoodieWriteableTestTable (#2040 ) - Introduce HoodieWriteableTestTable for writing records into files - Migrate writeParquetFiles() in HoodieClientTestUtils to HoodieWriteableTestTable - Adopt HoodieWrittableTestTable for test cases in - ITTestRepairsCommand.java - TestHoodieIndex.java - TestHoodieKeyLocationFetchHandle.java - TestHoodieGlobalBloomIndex.java - TestHoodieBloomIndex.java - Renamed HoodieTestTable and FileCreateUtils APIs - dataFile changed to baseFile	2020-09-07 17:54:36 +08:00
Sreeram Ramji	6537af2676	[HUDI-1153] Spark DataSource and Streaming Write must fail when operation type is misconfigured (#2014 )	2020-09-04 09:08:30 -07:00
Prashant Wason	6461927eac	[HUDI-960] Implementation of the HFile base and log file format. (#1804 ) * [HUDI-960] Implementation of the HFile base and log file format. 1. Includes HFileWriter and HFileReader 2. Includes HFileInputFormat for both snapshot and realtime input format for Hive 3. Unit test for new code 4. IT for using HFile format and querying using Hive (Presto and SparkSQL are not supported) Advantage: HFile file format saves data as binary key-value pairs. This implementation chooses the following values: 1. Key = Hoodie Record Key (as bytes) 2. Value = Avro encoded GenericRecord (as bytes) HFile allows efficient lookup of a record by key or range of keys. Hence, this base file format is well suited to applications like RFC-15, RFC-08 which will benefit from the ability to lookup records by key or search in a range of keys without having to read the entire data/log format. Limitations: HFile storage format has certain limitations when used as a general purpose data storage format. 1. Does not have a implemented reader for Presto and SparkSQL 2. Is not a columnar file format and hence may lead to lower compression levels and greater IO on query side due to lack of column pruning Other changes: - Remove databricks/avro from pom - Fix HoodieClientTestUtils from not using scala imports/reflection based conversion etc - Breaking up limitFileSize(), per parquet and hfile base files - Added three new configs for HoodieHFileConfig - prefetchBlocksOnOpen, cacheDataInL1, dropBehindCacheCompaction - Throw UnsupportedException in HFileReader.getRecordKeys() - Updated HoodieCopyOnWriteTable to create the correct merge handle (HoodieSortedMergeHandle for HFile and HoodieMergeHandle otherwise) * Fixing checkstyle Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-31 08:05:59 -07:00
Satish Kotha	4dbeabffa3	[HUDI-1228] Add utility method to query extra metadata	2020-08-28 12:23:47 -07:00
Balajee Nagasubramaniam	cc555ba188	[HUDI-1133] Tune buffer sizes for the diskbased external spillable map	2020-08-25 14:23:58 -07:00
Satish Kotha	492ddcbb06	[HUDI-1191] Add incremental meta client API to query partitions modified in a time window	2020-08-25 12:40:10 -07:00
Prashant Wason	218d4a6836	[HUDI-1135] Make timeline server timeout settings configurable.	2020-08-24 18:09:00 -07:00
Prashant Wason	9b1f16b604	[HUDI-1136] Add back findInstantsAfterOrEquals to the HoodieTimeline class.	2020-08-24 18:08:17 -07:00
Mathieu	f8dcd5334e	[HUDI-1217] Improve avroToBytes method of HoodieAvroUtils (#2018 )	2020-08-24 17:33:28 +08:00
Raymond Xu	3a2ae16961	[HUDI-781] Introduce HoodieTestTable for test preparation (#1997 )	2020-08-21 11:46:33 +08:00
Abhishek Modi	bedbb825e0	[HUDI-1025] Meter RPC calls in HoodieWrapperFileSystem (#1916 )	2020-08-18 22:42:05 +08:00
vinoth chandar	9bde6d616c	[HUDI-1190] Introduce @PublicAPIClass and @PublicAPIMethod annotations to mark public APIs (#1965 ) - Maturity levels one of : evolving, stable, deprecated - Took a pass and marked out most of the existing public API	2020-08-13 23:28:17 -07:00
Sivabalan Narayanan	379cf0786f	[HUDI-1013] Adding Bulk Insert V2 implementation (#1834 ) - Adding ability to use native spark row writing for bulk_insert - Controlled by `ENABLE_ROW_WRITER_OPT_KEY` datasource write option - Introduced KeyGeneratorInterface in hudi-client, moved KeyGenerator back to hudi-spark - Simplified the new API additions to just two new methods : getRecordKey(row), getPartitionPath(row) - Fixed all built-in key generators with new APIs - Made the field position map lazily created upon the first call to row based apis - Implemented native row based key generators for CustomKeyGenerator - Fixed all the tests, with these new APIs Co-authored-by: Balaji Varadarajan <varadarb@uber.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-13 00:33:39 -07:00
wenningd	8b928e9bca	[HUDI-808] Support cleaning bootstrap source data (#1870 ) Co-authored-by: Wenning Ding <wenningd@amazon.com> Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>	2020-08-11 01:43:46 -07:00
Balaji Varadarajan	626f78f6f6	Revert "[HUDI-781] Introduce HoodieTestTable for test preparation (#1871 )" This reverts commit `b2e703d442`.	2020-08-10 22:13:02 -07:00
Raymond Xu	b2e703d442	[HUDI-781] Introduce HoodieTestTable for test preparation (#1871 )	2020-08-11 09:44:03 +08:00
Sivabalan Narayanan	858eda85d7	[HUDI-1098] Adding OptimisticConsistencyGuard to be used during FinalizeWrite (#1912 )	2020-08-09 17:51:37 -07:00
Sivabalan Narayanan	ff53e8f0b6	[HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback (#1858 ) - This pull request adds upgrade/downgrade infra for smooth transition from list based rollback to marker based rollback* - A new property called hoodie.table.version is added to hoodie.properties file as part of this. Whenever hoodie is launched with newer table version i.e 1(or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically to adhere to marker based rollback.* - This automatic upgrade step will happen just once per dataset as the hoodie.table.version will be updated in property file after upgrade is completed once* - Similarly, a command line tool for Downgrading is added if incase some user wants to downgrade hoodie from table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0* - Added UpgradeDowngrade to assist in upgrading or downgrading hoodie table - Added Interfaces for upgrade and downgrade and concrete implementations for upgrading from 0 to 1 and downgrading from 1 to 0. - Made some changes to ListingBasedRollbackHelper to expose just rollback stats w/o performing actual rollback, which will be consumed by Upgrade infra - Reworking failure handling for upgrade/downgrade - Changed tests accordingly, added one test around left over cleanup - New tables now write table version into hoodie.properties - Clean up code naming, abstractions. Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-09 15:32:43 -07:00
Udit Mehrotra	e4a2d98f79	[HUDI-426] Bootstrap datasource integration (#1702 )	2020-08-09 14:06:13 -07:00
wenningd	9fe2d2b14a	[HUDI-427] [HUDI-971] Implement CLI support for performing bootstrap (#1869 ) * [HUDI-971] Clean partitions & fileIds returned by HFileBootstrapIndex * [HUDI-427] Implement CLI support for performing bootstrap Co-authored-by: Wenning Ding <wenningd@amazon.com> Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>	2020-08-08 12:37:29 -07:00
Raymond Xu	5ee676e34f	[MINOR] Move a test method to Transformations (#1934 ) - Move TestHoodieKeyLocationFetchHandle#getRecordsPerPartition to Transformations - Improve some var namings	2020-08-08 18:25:55 +08:00
Gary Li	4f74a84607	[HUDI-69] Support Spark Datasource for MOR table - RDD approach (#1848 ) - This PR implements Spark Datasource for MOR table in the RDD approach. - Implemented SnapshotRelation - Implemented HudiMergeOnReadRDD - Implemented separate Iterator to handle merge and unmerge record reader. - Added TestMORDataSource to verify this feature. - Clean up test file name, add tests for mixed query type tests - We can now revert the change made in DefaultSource Co-authored-by: Vinoth Chandar <vchandar@confluent.io>	2020-08-07 00:28:14 -07:00
Balaji Varadarajan	7a2429f5ba	[HUDI-575] Spark Streaming with async compaction support (#1752 )	2020-08-05 07:50:15 -07:00
Sivabalan Narayanan	ab11ba43e1	[REVERT] "[HUDI-1058] Make delete marker configurable (#1819 )" (#1914 ) This reverts commit `433d7d2c98`.	2020-08-04 15:20:38 -07:00
vinoth chandar	539621bd33	[HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876 ) - [HUDI-418] Bootstrap Index Implementation using HFile with unit-test - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-421] Bootstrap Write Client with tests - [HUDI-425] Added HoodieDeltaStreamer support - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly - [HUDI-424] Simplify Record reader implementation - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables Co-authored-by: Mehrotra <uditme@amazon.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-08-03 20:19:21 -07:00
Shen Hong	433d7d2c98	[HUDI-1058] Make delete marker configurable (#1819 )	2020-08-03 11:06:31 -04:00
Raymond Xu	10e4268792	[HUDI-995] Use Transformations, Assertions and SchemaTestUtil (#1884 ) - Consolidate transform functions for tests in Transformations.java - Consolidate assertion functions for tests in Assertions.java - Make use of SchemaTestUtil for loading schema from resource	2020-08-01 20:57:18 +08:00
Nishith Agarwal	2fc2b01d86	[HUDI-394] Provide a basic implementation of test suite	2020-07-30 21:21:15 -07:00
Raymond Xu	ca36c44cb3	[HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common (#1873 )	2020-07-27 19:21:45 +08:00
Raymond Xu	0cb24e4a2d	[MINOR] Use HoodieActiveTimeline.COMMIT_FORMATTER (#1874 )	2020-07-24 18:48:56 -07:00
Gary Li	467d097dae	[MINOR] Add Databricks File System to StorageSchemes (#1877 )	2020-07-24 18:47:09 -07:00
Sivabalan Narayanan	5b6026ba43	[HUDI-802] Fixing deletes for inserts in same batch in write path (#1792 ) * Fixing deletes for inserts in same batch in write path * Fixing delta streamer tests * Adding tests for OverwriteWithLatestAvroPayload	2020-07-22 19:39:57 -07:00

1 2 3 4

200 Commits