lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
wenningd	9fe2d2b14a	[HUDI-427] [HUDI-971] Implement CLI support for performing bootstrap (#1869 ) * [HUDI-971] Clean partitions & fileIds returned by HFileBootstrapIndex * [HUDI-427] Implement CLI support for performing bootstrap Co-authored-by: Wenning Ding <wenningd@amazon.com> Co-authored-by: Balaji Varadarajan <vbalaji@apache.org>	2020-08-08 12:37:29 -07:00
Gary Li	4f74a84607	[HUDI-69] Support Spark Datasource for MOR table - RDD approach (#1848 ) - This PR implements Spark Datasource for MOR table in the RDD approach. - Implemented SnapshotRelation - Implemented HudiMergeOnReadRDD - Implemented separate Iterator to handle merge and unmerge record reader. - Added TestMORDataSource to verify this feature. - Clean up test file name, add tests for mixed query type tests - We can now revert the change made in DefaultSource Co-authored-by: Vinoth Chandar <vchandar@confluent.io>	2020-08-07 00:28:14 -07:00
Sivabalan Narayanan	ab11ba43e1	[REVERT] "[HUDI-1058] Make delete marker configurable (#1819 )" (#1914 ) This reverts commit `433d7d2c98`.	2020-08-04 15:20:38 -07:00
vinoth chandar	539621bd33	[HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876 ) - [HUDI-418] Bootstrap Index Implementation using HFile with unit-test - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-421] Bootstrap Write Client with tests - [HUDI-425] Added HoodieDeltaStreamer support - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly - [HUDI-424] Simplify Record reader implementation - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables Co-authored-by: Mehrotra <uditme@amazon.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-08-03 20:19:21 -07:00
Shen Hong	433d7d2c98	[HUDI-1058] Make delete marker configurable (#1819 )	2020-08-03 11:06:31 -04:00
Gary Li	467d097dae	[MINOR] Add Databricks File System to StorageSchemes (#1877 )	2020-07-24 18:47:09 -07:00
Sivabalan Narayanan	5b6026ba43	[HUDI-802] Fixing deletes for inserts in same batch in write path (#1792 ) * Fixing deletes for inserts in same batch in write path * Fixing delta streamer tests * Adding tests for OverwriteWithLatestAvroPayload	2020-07-22 19:39:57 -07:00
DeyinZhong	743ef322b8	[HUDI-871] Add support for Tencent Cloud Object Storage(COS) (#1855 ) Co-authored-by: deyzhong <deyzhong@tencent.com>	2020-07-22 17:40:19 +08:00
lw0090	1ec89e9a94	[HUDI-839] Introducing support for rollbacks using marker files (#1756 ) * [HUDI-839] Introducing rollback strategy using marker files - Adds a new mechanism for rollbacks where it's based on the marker files generated during the write - Consequently, marker file/dir deletion now happens post commit, instead of during finalize - Marker files are also generated for AppendHandle, making it consistent throughout the write path - Until upgrade-downgrade mechanism can upgrade non-marker based inflight writes to marker based, this should only be turned on for new datasets. - Added marker dir deletion after successful commit/rollback, individual files are not deleted during finalize - Fail safe for deleting marker directories, now during timeline archival process - Added check to ensure completed instants are not rolled back using marker based strategy. This will be incorrect - Reworked tests to rollback inflight instants, instead of completed instants whenever necessary - Added an unit test for MarkerBasedRollbackStrategy Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-07-20 22:41:42 -07:00
Udit Mehrotra	1aae437257	[HUDI-1102] Add common useful Spark related and Table path detection utilities (#1841 ) Co-authored-by: Mehrotra <uditme@amazon.com>	2020-07-18 16:16:32 -07:00
Prashant Wason	2603cfb33e	[HUDI-684] Introduced abstraction for writing and reading different types of base file formats. (#1687 ) Notable changes: 1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format 2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock) 3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format 4. HiveSyncTool accepts the base file format as a CLI parameter 5. HoodieDeltaStreamer accepts the base file format as a CLI parameter 6. HoodieSparkSqlWriter accepts the base file format as a parameter	2020-06-25 23:46:55 -07:00
Shen Hong	89e37d5273	[HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690 )	2020-06-22 08:13:28 -07:00
garyli1019	e9cab67b80	[HUDI-988] Fix More Unit Test Flakiness	2020-06-07 23:14:46 -07:00
Balaji Varadarajan	fb283934a3	[HUDI-990] Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly. Also ensure timeline gets reloaded after we revert committed transactions	2020-06-04 02:52:21 -07:00
Sivabalan Narayanan	5a0d3f1cf9	[HUDI-786] Fixing read beyond inline length in InlineFS (#1616 )	2020-05-28 12:59:11 -07:00
dengziming	bde7a7043e	[HUDI-476]: Add hudi-examples module (#1151 ) add hoodie delta streamer mock source example and dfs source and kafka source examples Signed-off-by: dengziming <dengziming1993@gmail.com> add defaultSparkConf utils method change version of hudi-examples to 0.5.2-SNAPSHOT change the artifcatId of hudi-spark and hudi-utilities alter some code to adapt kafka2.0 Update scritps Add license	2020-05-28 01:44:39 +08:00
Raymond Xu	03f136361a	[HUDI-811] Restructure test packages in hudi-common (#1644 ) * [HUDI-811] Restructure test packages in hudi-common	2020-05-27 16:28:17 +08:00
Raymond Xu	6c450957ce	[HUDI-690] Filter out inflight compaction in exporter (#1667 )	2020-05-26 09:23:34 -07:00
Pratyaksh Sharma	6a0aa9a645	[HUDI-803] Replaced used of NullNode with JsonProperties.NULL_VALUE in HoodieAvroUtils (#1538 ) - added more test cases in TestHoodieAvroUtils.class Co-authored-by: Vinoth Chandar <vinoth@apache.org>	2020-05-20 09:04:43 -07:00
Balaji Varadarajan	e6f3bf10cf	[HUDI-858] Allow multiple operations to be executed within a single commit (#1633 )	2020-05-18 19:27:24 -07:00
Joey	2600d2de8d	[MINOR] Fix apache-rat violations (#1639 ) * MINOR Fix apache-rat violations. Also, enabling RAT for hudi-utilities and hudi-integ-test	2020-05-18 11:16:49 -07:00
Sivabalan Narayanan	29edf4b3b8	[HUDI-407] Adding Simple Index to Hoodie. (#1402 ) This index finds the location by joining incoming records with records from base files.	2020-05-17 18:32:24 -07:00
Balaji Varadarajan	3c9da2e5f0	[HUDI-895] Remove unnecessary listing .hoodie folder when using timeline server (#1636 )	2020-05-17 18:18:53 -07:00
Mathieu	25a0080b2f	[HUDI-714]Add javadoc and comments to hudi write method link (#1409 ) * [HUDI-714] Add javadoc and comments to hudi write method link	2020-05-16 08:36:51 -04:00
Alexander Filipchik	83796b3189	[HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine (#1513 ) * Adding proper default to hudi metadata fields and proper handling to rewrite routine * Handle fields declared with a null default Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>	2020-05-13 18:04:38 -07:00
liujinhui	32ea4c70ff	[HUDI-869] Add support for alluxio (#1608 )	2020-05-13 21:00:34 +08:00
Udit Mehrotra	d54b4b8a52	[HUDI-838] Support schema from HoodieCommitMetadata for HiveSync (#1559 ) Co-authored-by: Mehrotra <uditme@amazon.com>	2020-05-07 16:33:09 -07:00
Alexander Filipchik	e783ab1749	[HUDI-784] Adressing issue with log reader on GCS (#1516 ) [HUDI-784] Adressing issue with log reader on GCS (#1516) Co-authored-by: Alex Filipchik <alex.filipchik@csscompany.com>	2020-05-07 13:05:32 -07:00
vinoth chandar	c4b71622b9	[MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability (#1575 ) - reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc	2020-04-30 09:19:39 -07:00
Prashant Wason	62bd3e7ded	[HUDI-757] Added hudi-cli command to export metadata of Instants. Example: hudi:db.table-> export instants --localFolder /tmp/ --limit 5 --actions clean,rollback,commit --desc false	2020-04-21 12:41:19 -07:00
n3nash	332072bc6d	[HUDI-371] Supporting hive combine input format for realtime tables (#1503 )	2020-04-20 20:40:06 -07:00
Mathieu	2a2f31d919	[MINOR] Remove reduntant code and fix typo in HoodieDefaultTimeline (#1535 )	2020-04-21 09:40:22 +08:00
baobaoyeye	75523657a4	[MINOR] use Option and fix description in toString method (#1527 ) * [MINOR] fix some places are not elegant, as a newcomer * [MINOR] fix some places are not elegant, as a newcomer	2020-04-18 12:51:37 +08:00
Prashant Wason	19d29ac7d0	[HUDI-741] Added checks to validate Hoodie's schema evolution. HUDI specific validation of schema evolution should ensure that a newer schema can be used for the dataset by checking that the data written using the old schema can be read using the new schema. Code changes: 1. Added a new config in HoodieWriteConfig to enable schema validation check (disabled by default) 2. Moved code that reads schema from base/log files into hudi-common from hudi-hive-sync 3. Added writerSchema to the extraMetadata of compaction commits in MOR table. This is same as that for commits on COW table. Testing changes: 4. Extended TestHoodieClientBase to add insertBatch API which allows inserting a new batch of unique records into a HUDI table 5. Added a unit test to verify schema evolution for both COW and MOR tables. 6. Added unit tests for schema compatiblity checks.	2020-04-15 23:34:59 -07:00
vinoth chandar	661b0b3bab	[HUDI-761] Refactoring rollback and restore actions using the ActionExecutor abstraction (#1492 ) - rollback() and restore() table level APIs introduced - Restore is implemented by wrapping calls to rollback executor - Existing tests transparently cover this, since its just a refactor	2020-04-13 08:29:19 -07:00
Balaji Varadarajan	17bf930342	[HUDI-770] Organize upsert/insert API implementation under a single package (#1495 )	2020-04-12 23:11:00 -07:00
Pratyaksh Sharma	6d7ca2cf7e	[HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema (#1427 )	2020-04-12 17:55:26 -07:00
Shen Hong	5d717a28f4	[HUDI-782] Add support of Aliyun object storage service. (#1506 )	2020-04-12 10:06:30 +08:00
satishkotha	c0f96e0726	[HUDI-687] Stop incremental reader on RO table when there is a pending compaction (#1396 )	2020-04-10 10:45:41 -07:00
Ramachandran Madtas Subramaniam	f5f34bb1c1	[HUDI-568] Improve unit test coverage Classes improved: * HoodieTableMetaClient * RocksDBDAO * HoodieRealtimeFileSplit	2020-04-09 10:15:34 -07:00
Zhiyuan Zhao	b5d093a21b	[MINOR] Clear up the redundant comment. (#1489 )	2020-04-06 16:31:54 +08:00
vinoth chandar	eaf6cc2d90	[HUDI-756] Organize Cleaning Action execution into a single package in hudi-client (#1485 ) - Introduced a thin abstraction ActionExecutor, that all actions will implement - Pulled cleaning code from table, writeclient into a single package - CleanHelper is now CleanPlanner, HoodieCleanClient is no longer around - Minor refactor of HoodieTable factory method - HoodieTable.create() methods with and without metaclient passed in - HoodieTable constructor now does not do a redundant instantiation - Fixed existing unit tests to work at the HoodieWriteClient level	2020-04-04 00:07:34 -07:00
Shaofeng Shi	78b3194e82	[HUDI-751] Fix some coding issues reported by FindBugs (#1470 )	2020-03-31 21:19:32 +08:00
lamber-ken	dbc9acd23a	[HUDI-716] Exception: Not an Avro data file when running HoodieCleanClient.runClean (#1432 )	2020-03-30 11:19:17 -07:00
Suneel Marthi	fa36082554	[HUDI-746] Reduce build warnings < 10 (#1465 )	2020-03-30 11:46:52 +08:00
vinoth chandar	e057c27603	[HUDI-744] Restructure hudi-common and clean up files under util packages (#1462 ) - Brings more order and cohesion to the classes in hudi-common - Utils classes related to a particular concept (avro, timeline,...) are placed near to the package - common.fs package now contains all the filesystem level classes including wrapper filesystem - bloom.filter package renamed to just bloom - config package contains classes that help store properties - common.fs.inline package contains all the inline filesystem classes/impl - common.table.timeline now consolidates all timeline related classes - common.table.view consolidates all the classes related to filesystem view metadata - common.table.timeline.versioning contains all classes related to versioning of timeline - Fix few unit tests as a result - Moved the test packages around to match the source file move - Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos	2020-03-29 10:58:49 -07:00
Sivabalan Narayanan	ac73bdcdc3	[HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile (#1176 ) * Adding InlineFileSystem to support embedding any file format (parquet, hfile, etc). Supports reading the embedded file using respective readers.	2020-03-28 12:13:35 -04:00
Suneel Marthi	04449f33fe	[HUDI-743]: Remove FileIOUtils.close() (#1461 )	2020-03-28 18:03:15 +08:00
Suneel Marthi	8c3001363d	HUDI-479: Eliminate or Minimize use of Guava if possible (#1159 )	2020-03-28 03:11:32 -04:00
Zhiyuan Zhao	0241b21f77	[HUDI-65] commitTime rename to instantTime (#1431 )	2020-03-22 18:06:00 -07:00

1 2 3

132 Commits