lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
wenningd	9fe2d2b14a	[HUDI-427] [HUDI-971] Implement CLI support for performing bootstrap (#1869 ) * [HUDI-971] Clean partitions & fileIds returned by HFileBootstrapIndex * [HUDI-427] Implement CLI support for performing bootstrap Co-authored-by: Wenning Ding <wenningd@amazon.com> Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>	2020-08-08 12:37:29 -07:00
Raymond Xu	5ee676e34f	[MINOR] Move a test method to Transformations (#1934 ) - Move TestHoodieKeyLocationFetchHandle#getRecordsPerPartition to Transformations - Improve some var namings	2020-08-08 18:25:55 +08:00
Gary Li	4f74a84607	[HUDI-69] Support Spark Datasource for MOR table - RDD approach (#1848 ) - This PR implements Spark Datasource for MOR table in the RDD approach. - Implemented SnapshotRelation - Implemented HudiMergeOnReadRDD - Implemented separate Iterator to handle merge and unmerge record reader. - Added TestMORDataSource to verify this feature. - Clean up test file name, add tests for mixed query type tests - We can now revert the change made in DefaultSource Co-authored-by: Vinoth Chandar <vchandar@confluent.io>	2020-08-07 00:28:14 -07:00
Balaji Varadarajan	7a2429f5ba	[HUDI-575] Spark Streaming with async compaction support (#1752 )	2020-08-05 07:50:15 -07:00
Sivabalan Narayanan	ab11ba43e1	[REVERT] "[HUDI-1058] Make delete marker configurable (#1819 )" (#1914 ) This reverts commit `433d7d2c98`.	2020-08-04 15:20:38 -07:00
vinoth chandar	539621bd33	[HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876 ) - [HUDI-418] Bootstrap Index Implementation using HFile with unit-test - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-421] Bootstrap Write Client with tests - [HUDI-425] Added HoodieDeltaStreamer support - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly - [HUDI-424] Simplify Record reader implementation - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables Co-authored-by: Mehrotra <uditme@amazon.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-08-03 20:19:21 -07:00
Shen Hong	433d7d2c98	[HUDI-1058] Make delete marker configurable (#1819 )	2020-08-03 11:06:31 -04:00
Raymond Xu	10e4268792	[HUDI-995] Use Transformations, Assertions and SchemaTestUtil (#1884 ) - Consolidate transform functions for tests in Transformations.java - Consolidate assertion functions for tests in Assertions.java - Make use of SchemaTestUtil for loading schema from resource	2020-08-01 20:57:18 +08:00
Nishith Agarwal	2fc2b01d86	[HUDI-394] Provide a basic implementation of test suite	2020-07-30 21:21:15 -07:00
Raymond Xu	ca36c44cb3	[HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common (#1873 )	2020-07-27 19:21:45 +08:00
Raymond Xu	0cb24e4a2d	[MINOR] Use HoodieActiveTimeline.COMMIT_FORMATTER (#1874 )	2020-07-24 18:48:56 -07:00
Gary Li	467d097dae	[MINOR] Add Databricks File System to StorageSchemes (#1877 )	2020-07-24 18:47:09 -07:00
Sivabalan Narayanan	5b6026ba43	[HUDI-802] Fixing deletes for inserts in same batch in write path (#1792 ) * Fixing deletes for inserts in same batch in write path * Fixing delta streamer tests * Adding tests for OverwriteWithLatestAvroPayload	2020-07-22 19:39:57 -07:00
DeyinZhong	743ef322b8	[HUDI-871] Add support for Tencent Cloud Object Storage(COS) (#1855 ) Co-authored-by: deyzhong <deyzhong@tencent.com>	2020-07-22 17:40:19 +08:00
lw0090	1ec89e9a94	[HUDI-839] Introducing support for rollbacks using marker files (#1756 ) * [HUDI-839] Introducing rollback strategy using marker files - Adds a new mechanism for rollbacks where it's based on the marker files generated during the write - Consequently, marker file/dir deletion now happens post commit, instead of during finalize - Marker files are also generated for AppendHandle, making it consistent throughout the write path - Until upgrade-downgrade mechanism can upgrade non-marker based inflight writes to marker based, this should only be turned on for new datasets. - Added marker dir deletion after successful commit/rollback, individual files are not deleted during finalize - Fail safe for deleting marker directories, now during timeline archival process - Added check to ensure completed instants are not rolled back using marker based strategy. This will be incorrect - Reworked tests to rollback inflight instants, instead of completed instants whenever necessary - Added an unit test for MarkerBasedRollbackStrategy Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-07-20 22:41:42 -07:00
Udit Mehrotra	1aae437257	[HUDI-1102] Add common useful Spark related and Table path detection utilities (#1841 ) Co-authored-by: Mehrotra <uditme@amazon.com>	2020-07-18 16:16:32 -07:00
Raymond Xu	3b9a30528b	[HUDI-996] Add functional test suite for hudi-utilities (#1746 ) - Share resources for functional tests - Add suite for functional test classes from hudi-utilities	2020-07-05 16:44:31 -07:00
Prashant Wason	2603cfb33e	[HUDI-684] Introduced abstraction for writing and reading different types of base file formats. (#1687 ) Notable changes: 1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format 2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock) 3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format 4. HiveSyncTool accepts the base file format as a CLI parameter 5. HoodieDeltaStreamer accepts the base file format as a CLI parameter 6. HoodieSparkSqlWriter accepts the base file format as a parameter	2020-06-25 23:46:55 -07:00
Shen Hong	89e37d5273	[HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690 )	2020-06-22 08:13:28 -07:00
hongdd	f3a701757b	[HUDI-696] Add unit test for CommitsCommand (#1724 )	2020-06-18 21:42:13 +08:00
garyli1019	e9cab67b80	[HUDI-988] Fix More Unit Test Flakiness	2020-06-07 23:14:46 -07:00
Balaji Varadarajan	fb283934a3	[HUDI-990] Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly. Also ensure timeline gets reloaded after we revert committed transactions	2020-06-04 02:52:21 -07:00
cxzl25	7c59095314	[HUDI-975] Add unit tests in TestHoodieTableFileSystemView to test view for non-partitioned table (#1692 )	2020-06-01 07:23:28 -07:00
Sivabalan Narayanan	5a0d3f1cf9	[HUDI-786] Fixing read beyond inline length in InlineFS (#1616 )	2020-05-28 12:59:11 -07:00
dengziming	bde7a7043e	[HUDI-476]: Add hudi-examples module (#1151 ) add hoodie delta streamer mock source example and dfs source and kafka source examples Signed-off-by: dengziming <dengziming1993@gmail.com> add defaultSparkConf utils method change version of hudi-examples to 0.5.2-SNAPSHOT change the artifcatId of hudi-spark and hudi-utilities alter some code to adapt kafka2.0 Update scritps Add license	2020-05-28 01:44:39 +08:00
Raymond Xu	03f136361a	[HUDI-811] Restructure test packages in hudi-common (#1644 ) * [HUDI-811] Restructure test packages in hudi-common	2020-05-27 16:28:17 +08:00
Raymond Xu	6c450957ce	[HUDI-690] Filter out inflight compaction in exporter (#1667 )	2020-05-26 09:23:34 -07:00
hongdd	802d16c8c9	[HUDI-707] Add unit test for StatsCommand (#1645 )	2020-05-21 18:28:04 +08:00
Pratyaksh Sharma	6a0aa9a645	[HUDI-803] Replaced used of NullNode with JsonProperties.NULL_VALUE in HoodieAvroUtils (#1538 ) - added more test cases in TestHoodieAvroUtils.class Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-05-20 09:04:43 -07:00
Balaji Varadarajan	e6f3bf10cf	[HUDI-858] Allow multiple operations to be executed within a single commit (#1633 )	2020-05-18 19:27:24 -07:00
Joey	2600d2de8d	[MINOR] Fix apache-rat violations (#1639 ) * MINOR Fix apache-rat violations. Also, enabling RAT for hudi-utilities and hudi-integ-test	2020-05-18 11:16:49 -07:00
Sivabalan Narayanan	29edf4b3b8	[HUDI-407] Adding Simple Index to Hoodie. (#1402 ) This index finds the location by joining incoming records with records from base files.	2020-05-17 18:32:24 -07:00
Balaji Varadarajan	3c9da2e5f0	[HUDI-895] Remove unnecessary listing .hoodie folder when using timeline server (#1636 )	2020-05-17 18:18:53 -07:00
Mathieu	25a0080b2f	[HUDI-714]Add javadoc and comments to hudi write method link (#1409 ) * [HUDI-714] Add javadoc and comments to hudi write method link	2020-05-16 08:36:51 -04:00
Alexander Filipchik	83796b3189	[HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine (#1513 ) * Adding proper default to hudi metadata fields and proper handling to rewrite routine * Handle fields declared with a null default Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>	2020-05-13 18:04:38 -07:00
liujinhui	32ea4c70ff	[HUDI-869] Add support for alluxio (#1608 )	2020-05-13 21:00:34 +08:00
vinoth chandar	f92b9fdcc4	[MINOR] Fix hardcoding of ports in TestHoodieJmxMetrics (#1606 )	2020-05-10 19:23:26 -04:00
Udit Mehrotra	d54b4b8a52	[HUDI-838] Support schema from HoodieCommitMetadata for HiveSync (#1559 ) Co-authored-by: Mehrotra <uditme@amazon.com>	2020-05-07 16:33:09 -07:00
Alexander Filipchik	e783ab1749	[HUDI-784] Adressing issue with log reader on GCS (#1516 ) [HUDI-784] Adressing issue with log reader on GCS (#1516) Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>	2020-05-07 13:05:32 -07:00
Raymond Xu	366bb10d8c	[HUDI-812] Migrate hudi common tests to JUnit 5 (#1590 ) * [HUDI-812] Migrate hudi-common tests to JUnit 5	2020-05-06 19:15:20 +08:00
vinoth chandar	c4b71622b9	[MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability (#1575 ) - reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc	2020-04-30 09:19:39 -07:00
Raymond Xu	06dae30297	[HUDI-810] Migrate ClientTestHarness to JUnit 5 (#1553 )	2020-04-28 23:38:16 +08:00
vinoth chandar	19ca0b5629	[HUDI-785] Refactor compaction/savepoint execution based on ActionExector abstraction (#1548 ) - Savepoint and compaction classes moved to table.action.* packages - HoodieWriteClient#savepoint(...) returns void - Renamed HoodieCommitArchiveLog -> HoodieTimelineArchiveLog - Fixed tests to take into account the additional validation done - Moved helper code into CompactHelpers and SavepointHelpers	2020-04-25 18:26:44 -07:00
Raymond Xu	6e15eebd81	[HUDI-809] Migrate CommonTestHarness to JUnit 5 (#1530 )	2020-04-22 14:10:25 +08:00
Prashant Wason	62bd3e7ded	[HUDI-757] Added hudi-cli command to export metadata of Instants. Example: hudi:db.table-> export instants --localFolder /tmp/ --limit 5 --actions clean,rollback,commit --desc false	2020-04-21 12:41:19 -07:00
n3nash	332072bc6d	[HUDI-371] Supporting hive combine input format for realtime tables (#1503 )	2020-04-20 20:40:06 -07:00
Mathieu	2a2f31d919	[MINOR] Remove reduntant code and fix typo in HoodieDefaultTimeline (#1535 )	2020-04-21 09:40:22 +08:00
baobaoyeye	75523657a4	[MINOR] use Option and fix description in toString method (#1527 ) * [MINOR] fix some places are not elegant, as a newcomer * [MINOR] fix some places are not elegant, as a newcomer	2020-04-18 12:51:37 +08:00
Raymond Xu	acdc4a8d00	[HUDI-798] Migrate to Mockito Jupiter for JUnit 5 (#1521 )	2020-04-16 16:07:32 +08:00
Prashant Wason	19d29ac7d0	[HUDI-741] Added checks to validate Hoodie's schema evolution. HUDI specific validation of schema evolution should ensure that a newer schema can be used for the dataset by checking that the data written using the old schema can be read using the new schema. Code changes: 1. Added a new config in HoodieWriteConfig to enable schema validation check (disabled by default) 2. Moved code that reads schema from base/log files into hudi-common from hudi-hive-sync 3. Added writerSchema to the extraMetadata of compaction commits in MOR table. This is same as that for commits on COW table. Testing changes: 4. Extended TestHoodieClientBase to add insertBatch API which allows inserting a new batch of unique records into a HUDI table 5. Added a unit test to verify schema evolution for both COW and MOR tables. 6. Added unit tests for schema compatiblity checks.	2020-04-15 23:34:59 -07:00

1 2 3 4

169 Commits