lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Mathieu	f8dcd5334e	[HUDI-1217] Improve avroToBytes method of HoodieAvroUtils (#2018 )	2020-08-24 17:33:28 +08:00
Raymond Xu	3a2ae16961	[HUDI-781] Introduce HoodieTestTable for test preparation (#1997 )	2020-08-21 11:46:33 +08:00
Abhishek Modi	bedbb825e0	[HUDI-1025] Meter RPC calls in HoodieWrapperFileSystem (#1916 )	2020-08-18 22:42:05 +08:00
vinoth chandar	9bde6d616c	[HUDI-1190] Introduce @PublicAPIClass and @PublicAPIMethod annotations to mark public APIs (#1965 ) - Maturity levels one of : evolving, stable, deprecated - Took a pass and marked out most of the existing public API	2020-08-13 23:28:17 -07:00
Sivabalan Narayanan	379cf0786f	[HUDI-1013] Adding Bulk Insert V2 implementation (#1834 ) - Adding ability to use native spark row writing for bulk_insert - Controlled by `ENABLE_ROW_WRITER_OPT_KEY` datasource write option - Introduced KeyGeneratorInterface in hudi-client, moved KeyGenerator back to hudi-spark - Simplified the new API additions to just two new methods : getRecordKey(row), getPartitionPath(row) - Fixed all built-in key generators with new APIs - Made the field position map lazily created upon the first call to row based apis - Implemented native row based key generators for CustomKeyGenerator - Fixed all the tests, with these new APIs Co-authored-by: Balaji Varadarajan <varadarb@uber.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-13 00:33:39 -07:00
wenningd	8b928e9bca	[HUDI-808] Support cleaning bootstrap source data (#1870 ) Co-authored-by: Wenning Ding <wenningd@amazon.com> Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>	2020-08-11 01:43:46 -07:00
Balaji Varadarajan	626f78f6f6	Revert "[HUDI-781] Introduce HoodieTestTable for test preparation (#1871 )" This reverts commit `b2e703d442`.	2020-08-10 22:13:02 -07:00
Raymond Xu	b2e703d442	[HUDI-781] Introduce HoodieTestTable for test preparation (#1871 )	2020-08-11 09:44:03 +08:00
Sivabalan Narayanan	858eda85d7	[HUDI-1098] Adding OptimisticConsistencyGuard to be used during FinalizeWrite (#1912 )	2020-08-09 17:51:37 -07:00
Sivabalan Narayanan	ff53e8f0b6	[HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback (#1858 ) - This pull request adds upgrade/downgrade infra for smooth transition from list based rollback to marker based rollback* - A new property called hoodie.table.version is added to hoodie.properties file as part of this. Whenever hoodie is launched with newer table version i.e 1(or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically to adhere to marker based rollback.* - This automatic upgrade step will happen just once per dataset as the hoodie.table.version will be updated in property file after upgrade is completed once* - Similarly, a command line tool for Downgrading is added if incase some user wants to downgrade hoodie from table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0* - Added UpgradeDowngrade to assist in upgrading or downgrading hoodie table - Added Interfaces for upgrade and downgrade and concrete implementations for upgrading from 0 to 1 and downgrading from 1 to 0. - Made some changes to ListingBasedRollbackHelper to expose just rollback stats w/o performing actual rollback, which will be consumed by Upgrade infra - Reworking failure handling for upgrade/downgrade - Changed tests accordingly, added one test around left over cleanup - New tables now write table version into hoodie.properties - Clean up code naming, abstractions. Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-08-09 15:32:43 -07:00
Udit Mehrotra	e4a2d98f79	[HUDI-426] Bootstrap datasource integration (#1702 )	2020-08-09 14:06:13 -07:00
wenningd	9fe2d2b14a	[HUDI-427] [HUDI-971] Implement CLI support for performing bootstrap (#1869 ) * [HUDI-971] Clean partitions & fileIds returned by HFileBootstrapIndex * [HUDI-427] Implement CLI support for performing bootstrap Co-authored-by: Wenning Ding <wenningd@amazon.com> Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>	2020-08-08 12:37:29 -07:00
Raymond Xu	5ee676e34f	[MINOR] Move a test method to Transformations (#1934 ) - Move TestHoodieKeyLocationFetchHandle#getRecordsPerPartition to Transformations - Improve some var namings	2020-08-08 18:25:55 +08:00
Gary Li	4f74a84607	[HUDI-69] Support Spark Datasource for MOR table - RDD approach (#1848 ) - This PR implements Spark Datasource for MOR table in the RDD approach. - Implemented SnapshotRelation - Implemented HudiMergeOnReadRDD - Implemented separate Iterator to handle merge and unmerge record reader. - Added TestMORDataSource to verify this feature. - Clean up test file name, add tests for mixed query type tests - We can now revert the change made in DefaultSource Co-authored-by: Vinoth Chandar <vchandar@confluent.io>	2020-08-07 00:28:14 -07:00
Balaji Varadarajan	7a2429f5ba	[HUDI-575] Spark Streaming with async compaction support (#1752 )	2020-08-05 07:50:15 -07:00
Sivabalan Narayanan	ab11ba43e1	[REVERT] "[HUDI-1058] Make delete marker configurable (#1819 )" (#1914 ) This reverts commit `433d7d2c98`.	2020-08-04 15:20:38 -07:00
vinoth chandar	539621bd33	[HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876 ) - [HUDI-418] Bootstrap Index Implementation using HFile with unit-test - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-421] Bootstrap Write Client with tests - [HUDI-425] Added HoodieDeltaStreamer support - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly - [HUDI-424] Simplify Record reader implementation - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables Co-authored-by: Mehrotra <uditme@amazon.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-08-03 20:19:21 -07:00
Shen Hong	433d7d2c98	[HUDI-1058] Make delete marker configurable (#1819 )	2020-08-03 11:06:31 -04:00
Raymond Xu	10e4268792	[HUDI-995] Use Transformations, Assertions and SchemaTestUtil (#1884 ) - Consolidate transform functions for tests in Transformations.java - Consolidate assertion functions for tests in Assertions.java - Make use of SchemaTestUtil for loading schema from resource	2020-08-01 20:57:18 +08:00
Nishith Agarwal	2fc2b01d86	[HUDI-394] Provide a basic implementation of test suite	2020-07-30 21:21:15 -07:00
Raymond Xu	ca36c44cb3	[HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common (#1873 )	2020-07-27 19:21:45 +08:00
Raymond Xu	0cb24e4a2d	[MINOR] Use HoodieActiveTimeline.COMMIT_FORMATTER (#1874 )	2020-07-24 18:48:56 -07:00
Gary Li	467d097dae	[MINOR] Add Databricks File System to StorageSchemes (#1877 )	2020-07-24 18:47:09 -07:00
Sivabalan Narayanan	5b6026ba43	[HUDI-802] Fixing deletes for inserts in same batch in write path (#1792 ) * Fixing deletes for inserts in same batch in write path * Fixing delta streamer tests * Adding tests for OverwriteWithLatestAvroPayload	2020-07-22 19:39:57 -07:00
DeyinZhong	743ef322b8	[HUDI-871] Add support for Tencent Cloud Object Storage(COS) (#1855 ) Co-authored-by: deyzhong <deyzhong@tencent.com>	2020-07-22 17:40:19 +08:00
lw0090	1ec89e9a94	[HUDI-839] Introducing support for rollbacks using marker files (#1756 ) * [HUDI-839] Introducing rollback strategy using marker files - Adds a new mechanism for rollbacks where it's based on the marker files generated during the write - Consequently, marker file/dir deletion now happens post commit, instead of during finalize - Marker files are also generated for AppendHandle, making it consistent throughout the write path - Until upgrade-downgrade mechanism can upgrade non-marker based inflight writes to marker based, this should only be turned on for new datasets. - Added marker dir deletion after successful commit/rollback, individual files are not deleted during finalize - Fail safe for deleting marker directories, now during timeline archival process - Added check to ensure completed instants are not rolled back using marker based strategy. This will be incorrect - Reworked tests to rollback inflight instants, instead of completed instants whenever necessary - Added an unit test for MarkerBasedRollbackStrategy Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-07-20 22:41:42 -07:00
Udit Mehrotra	1aae437257	[HUDI-1102] Add common useful Spark related and Table path detection utilities (#1841 ) Co-authored-by: Mehrotra <uditme@amazon.com>	2020-07-18 16:16:32 -07:00
Raymond Xu	3b9a30528b	[HUDI-996] Add functional test suite for hudi-utilities (#1746 ) - Share resources for functional tests - Add suite for functional test classes from hudi-utilities	2020-07-05 16:44:31 -07:00
Prashant Wason	2603cfb33e	[HUDI-684] Introduced abstraction for writing and reading different types of base file formats. (#1687 ) Notable changes: 1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format 2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock) 3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format 4. HiveSyncTool accepts the base file format as a CLI parameter 5. HoodieDeltaStreamer accepts the base file format as a CLI parameter 6. HoodieSparkSqlWriter accepts the base file format as a parameter	2020-06-25 23:46:55 -07:00
Shen Hong	89e37d5273	[HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690 )	2020-06-22 08:13:28 -07:00
hongdd	f3a701757b	[HUDI-696] Add unit test for CommitsCommand (#1724 )	2020-06-18 21:42:13 +08:00
garyli1019	e9cab67b80	[HUDI-988] Fix More Unit Test Flakiness	2020-06-07 23:14:46 -07:00
Balaji Varadarajan	fb283934a3	[HUDI-990] Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly. Also ensure timeline gets reloaded after we revert committed transactions	2020-06-04 02:52:21 -07:00
cxzl25	7c59095314	[HUDI-975] Add unit tests in TestHoodieTableFileSystemView to test view for non-partitioned table (#1692 )	2020-06-01 07:23:28 -07:00
Sivabalan Narayanan	5a0d3f1cf9	[HUDI-786] Fixing read beyond inline length in InlineFS (#1616 )	2020-05-28 12:59:11 -07:00
dengziming	bde7a7043e	[HUDI-476]: Add hudi-examples module (#1151 ) add hoodie delta streamer mock source example and dfs source and kafka source examples Signed-off-by: dengziming <dengziming1993@gmail.com> add defaultSparkConf utils method change version of hudi-examples to 0.5.2-SNAPSHOT change the artifcatId of hudi-spark and hudi-utilities alter some code to adapt kafka2.0 Update scritps Add license	2020-05-28 01:44:39 +08:00
Raymond Xu	03f136361a	[HUDI-811] Restructure test packages in hudi-common (#1644 ) * [HUDI-811] Restructure test packages in hudi-common	2020-05-27 16:28:17 +08:00
Raymond Xu	6c450957ce	[HUDI-690] Filter out inflight compaction in exporter (#1667 )	2020-05-26 09:23:34 -07:00
hongdd	802d16c8c9	[HUDI-707] Add unit test for StatsCommand (#1645 )	2020-05-21 18:28:04 +08:00
Pratyaksh Sharma	6a0aa9a645	[HUDI-803] Replaced used of NullNode with JsonProperties.NULL_VALUE in HoodieAvroUtils (#1538 ) - added more test cases in TestHoodieAvroUtils.class Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-05-20 09:04:43 -07:00
Balaji Varadarajan	e6f3bf10cf	[HUDI-858] Allow multiple operations to be executed within a single commit (#1633 )	2020-05-18 19:27:24 -07:00
Joey	2600d2de8d	[MINOR] Fix apache-rat violations (#1639 ) * MINOR Fix apache-rat violations. Also, enabling RAT for hudi-utilities and hudi-integ-test	2020-05-18 11:16:49 -07:00
Sivabalan Narayanan	29edf4b3b8	[HUDI-407] Adding Simple Index to Hoodie. (#1402 ) This index finds the location by joining incoming records with records from base files.	2020-05-17 18:32:24 -07:00
Balaji Varadarajan	3c9da2e5f0	[HUDI-895] Remove unnecessary listing .hoodie folder when using timeline server (#1636 )	2020-05-17 18:18:53 -07:00
Mathieu	25a0080b2f	[HUDI-714]Add javadoc and comments to hudi write method link (#1409 ) * [HUDI-714] Add javadoc and comments to hudi write method link	2020-05-16 08:36:51 -04:00
Alexander Filipchik	83796b3189	[HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine (#1513 ) * Adding proper default to hudi metadata fields and proper handling to rewrite routine * Handle fields declared with a null default Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>	2020-05-13 18:04:38 -07:00
liujinhui	32ea4c70ff	[HUDI-869] Add support for alluxio (#1608 )	2020-05-13 21:00:34 +08:00
vinoth chandar	f92b9fdcc4	[MINOR] Fix hardcoding of ports in TestHoodieJmxMetrics (#1606 )	2020-05-10 19:23:26 -04:00
Udit Mehrotra	d54b4b8a52	[HUDI-838] Support schema from HoodieCommitMetadata for HiveSync (#1559 ) Co-authored-by: Mehrotra <uditme@amazon.com>	2020-05-07 16:33:09 -07:00
Alexander Filipchik	e783ab1749	[HUDI-784] Adressing issue with log reader on GCS (#1516 ) [HUDI-784] Adressing issue with log reader on GCS (#1516) Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>	2020-05-07 13:05:32 -07:00

1 2 3 4

174 Commits