lanyuanxiaoyao/hudi - hudi - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
vinoth chandar	539621bd33	[HUDI-242] Support for RFC-12/Bootstrapping of external datasets to hudi (#1876 ) - [HUDI-418] Bootstrap Index Implementation using HFile with unit-test - [HUDI-421] FileSystem View Changes to support Bootstrap with unit-tests - [HUDI-424] Implement Query Side Integration for querying tables containing bootstrap file slices - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-421] Bootstrap Write Client with tests - [HUDI-425] Added HoodieDeltaStreamer support - [HUDI-899] Add a knob to change partition-path style while performing metadata bootstrap - [HUDI-900] Metadata Bootstrap Key Generator needs to handle complex keys correctly - [HUDI-424] Simplify Record reader implementation - [HUDI-423] Implement upsert functionality for handling updates to these bootstrap file slices - [HUDI-420] Hoodie Demo working with hive and sparkSQL. Also, Hoodie CLI working with bootstrap tables Co-authored-by: Mehrotra <uditme@amazon.com> Co-authored-by: Vinoth Chandar <vinoth@apache.org> Co-authored-by: Balaji Varadarajan <varadarb@uber.com>	2020-08-03 20:19:21 -07:00
hongdd	fa419213f6	[HUDI-703] Add test for HoodieSyncCommand (#1774 )	2020-07-28 08:31:43 +08:00
hongdd	12ef8c9249	[HUDI-708] Add temps show and unit test for TempViewCommand (#1770 )	2020-07-23 08:43:46 +08:00
Prashant Wason	2603cfb33e	[HUDI-684] Introduced abstraction for writing and reading different types of base file formats. (#1687 ) Notable changes: 1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format 2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock) 3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format 4. HiveSyncTool accepts the base file format as a CLI parameter 5. HoodieDeltaStreamer accepts the base file format as a CLI parameter 6. HoodieSparkSqlWriter accepts the base file format as a parameter	2020-06-25 23:46:55 -07:00
hongdd	f3a701757b	[HUDI-696] Add unit test for CommitsCommand (#1724 )	2020-06-18 21:42:13 +08:00
hongdd	5099a91edd	[HUDI-709] Add unit test for UtilsCommand (#1686 )	2020-06-18 19:54:14 +08:00
hongdd	fcabc8fbca	[HUDI-1019] Clean refresh command in CLI (#1725 )	2020-06-14 14:30:28 +08:00
hongdd	802d16c8c9	[HUDI-707] Add unit test for StatsCommand (#1645 )	2020-05-21 18:28:04 +08:00
rolandjohann	244d47494e	[HUDI-888] fix NullPointerException in HoodieCompactor (#1622 )	2020-05-20 04:22:35 -07:00
hongdd	161a798337	[HUDI-706] Add unit test for SavepointsCommand (#1624 )	2020-05-19 18:36:01 +08:00
hongdd	57132f79bb	[HUDI-705] Add unit test for RollbacksCommand (#1611 )	2020-05-18 14:04:06 +08:00
hongdd	3a2fe13fcb	[HUDI-701] Add unit test for HDFSParquetImportCommand (#1574 )	2020-05-14 19:15:49 +08:00
Balaji Varadarajan	8d0e23173b	[HUDI-820] cleaner repair command should only inspect clean metadata files (#1542 )	2020-05-11 09:25:54 +08:00
hongdd	f921469afc	[HUDI-704] Add test for RepairsCommand (#1554 )	2020-05-07 23:02:28 +08:00
vinoth chandar	c4b71622b9	[MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability (#1575 ) - reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc	2020-04-30 09:19:39 -07:00
hongdd	9059bce977	[HUDI-702] Add test for HoodieLogFileCommand (#1522 )	2020-04-29 18:47:27 +08:00
vinoth chandar	19ca0b5629	[HUDI-785] Refactor compaction/savepoint execution based on ActionExector abstraction (#1548 ) - Savepoint and compaction classes moved to table.action.* packages - HoodieWriteClient#savepoint(...) returns void - Renamed HoodieCommitArchiveLog -> HoodieTimelineArchiveLog - Fixed tests to take into account the additional validation done - Moved helper code into CompactHelpers and SavepointHelpers	2020-04-25 18:26:44 -07:00
Prashant Wason	62bd3e7ded	[HUDI-757] Added hudi-cli command to export metadata of Instants. Example: hudi:db.table-> export instants --localFolder /tmp/ --limit 5 --actions clean,rollback,commit --desc false	2020-04-21 12:41:19 -07:00
Prashant Wason	19d29ac7d0	[HUDI-741] Added checks to validate Hoodie's schema evolution. HUDI specific validation of schema evolution should ensure that a newer schema can be used for the dataset by checking that the data written using the old schema can be read using the new schema. Code changes: 1. Added a new config in HoodieWriteConfig to enable schema validation check (disabled by default) 2. Moved code that reads schema from base/log files into hudi-common from hudi-hive-sync 3. Added writerSchema to the extraMetadata of compaction commits in MOR table. This is same as that for commits on COW table. Testing changes: 4. Extended TestHoodieClientBase to add insertBatch API which allows inserting a new batch of unique records into a HUDI table 5. Added a unit test to verify schema evolution for both COW and MOR tables. 6. Added unit tests for schema compatiblity checks.	2020-04-15 23:34:59 -07:00
hongdd	644c1cc8bd	[HUDI-698]Add unit test for CleansCommand (#1449 )	2020-04-14 17:54:47 +08:00
vinoth chandar	661b0b3bab	[HUDI-761] Refactoring rollback and restore actions using the ActionExecutor abstraction (#1492 ) - rollback() and restore() table level APIs introduced - Restore is implemented by wrapping calls to rollback executor - Existing tests transparently cover this, since its just a refactor	2020-04-13 08:29:19 -07:00
hongdd	a464a2972e	[HUDI-700]Add unit test for FileSystemViewCommand (#1490 )	2020-04-11 10:12:21 +08:00
hongdd	4e5c8671ef	[HUDI-740]Fix can not specify the sparkMaster and code clean for SparkUtil (#1452 )	2020-04-08 21:33:15 +08:00
Shaofeng Shi	78b3194e82	[HUDI-751] Fix some coding issues reported by FindBugs (#1470 )	2020-03-31 21:19:32 +08:00
lamber-ken	dbc9acd23a	[HUDI-716] Exception: Not an Avro data file when running HoodieCleanClient.runClean (#1432 )	2020-03-30 11:19:17 -07:00
Suneel Marthi	fa36082554	[HUDI-746] Reduce build warnings < 10 (#1465 )	2020-03-30 11:46:52 +08:00
vinoth chandar	e057c27603	[HUDI-744] Restructure hudi-common and clean up files under util packages (#1462 ) - Brings more order and cohesion to the classes in hudi-common - Utils classes related to a particular concept (avro, timeline,...) are placed near to the package - common.fs package now contains all the filesystem level classes including wrapper filesystem - bloom.filter package renamed to just bloom - config package contains classes that help store properties - common.fs.inline package contains all the inline filesystem classes/impl - common.table.timeline now consolidates all timeline related classes - common.table.view consolidates all the classes related to filesystem view metadata - common.table.timeline.versioning contains all classes related to versioning of timeline - Fix few unit tests as a result - Moved the test packages around to match the source file move - Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos	2020-03-29 10:58:49 -07:00
leesf	07c3c5d797	[HUDI-679] Make io package Spark free (#1460 ) * [HUDI-679] Make io package Spark free	2020-03-29 16:54:00 +08:00
Suneel Marthi	8c3001363d	HUDI-479: Eliminate or Minimize use of Guava if possible (#1159 )	2020-03-28 03:11:32 -04:00
Zhiyuan Zhao	0241b21f77	[HUDI-65] commitTime rename to instantTime (#1431 )	2020-03-22 18:06:00 -07:00
hongdd	3ef9e885ca	[HUDI-715] Fix duplicate name in TableCommand (#1410 )	2020-03-16 17:19:57 +08:00
hongdd	0f892ef62c	[HUDI-692] Add delete savepoint for cli (#1397 ) * Add delete savepoint for cli * Add check * Move JavaSparkContext to try	2020-03-11 16:49:02 -07:00
satishkotha	7194514aff	[HUDI-689] Change CLI command names to not have overlap (#1392 )	2020-03-11 16:29:54 -07:00
Satish Kotha	3d3781810c	[CLI] Add export to table	2020-03-06 08:53:23 -08:00
vinoth chandar	71170fafe7	[HUDI-554] Cleanup package structure in hudi-client (#1346 ) - Just package, class moves and renames with the following intent - `client` now has all the various client classes, that do the transaction management - `func` renamed to `execution` and some helpers moved to `client/utils` - All compaction code under `io` now under `table/compact` - Rollback code under `table/rollback` and in general all code for individual operations under `table` - `exception` `config`, `metrics` left untouched - Moved the tests also accordingly - Fixed some flaky tests	2020-02-27 08:05:58 -08:00
Suneel Marthi	078d4825d9	[HUDI-624]: Split some of the code from PR for HUDI-479 (#1344 )	2020-02-21 14:22:21 +08:00
Satish Kotha	20ed2516d3	[HUDI-571] Add show archived compaction(s) to CLI	2020-02-14 10:58:28 -08:00
lamber-ken	01c868ab86	[HUDI-574] Fix CLI counts small file inserts as updates (#1321 )	2020-02-13 22:20:58 +08:00
Satish Kotha	63b42166b1	CLI - add option to print additional commit metadata	2020-02-12 14:11:24 -08:00
Satish Kotha	462fd02556	[HUDI-571] Add 'commits show archived' command to CLI	2020-02-05 11:25:34 -08:00
lamber-ken	46842f4e92	[MINOR] Remove the declaration of thrown RuntimeException (#1305 )	2020-02-05 23:23:20 +08:00
Balaji Varadarajan	923e2b4a1e	[HUDI-535] Ensure Compaction Plan is always written in .aux folder to avoid 0.5.0/0.5.1 reader-writer compatibility issues (#1229 )	2020-01-17 10:56:35 -08:00
vinoth chandar	baa6b5e889	[HUDI-537] Introduce `repair overwrite-hoodie-props` CLI command (#1241 )	2020-01-17 01:21:44 -08:00
vinoth chandar	c2c0f6b13d	[HUDI-509] Renaming code in sync with cWiki restructuring (#1212 ) - Storage Type replaced with Table Type (remaining instances) - View types replaced with query types; - ReadOptimized view referred as Snapshot Query - TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views - HoodieDataFile renamed to HoodieBaseFile - Hive Sync tool will register RO tables for MOR with a `_ro` suffix - Datasource/Deltastreamer options renamed accordingly - Support fallback to old config values as well, so migration is painless - Config for controlling _ro suffix addition - Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView	2020-01-16 23:58:47 -08:00
leesf	04afac977d	[HUDI-248] CLI doesn't allow rolling back a Delta commit	2020-01-10 16:10:35 -08:00
vinoth chandar	9706f659db	[HUDI-508] Standardizing on "Table" instead of "Dataset" across code (#1197 ) - Docs were talking about storage types before, cWiki moved to "Table" - Most of code already has HoodieTable, HoodieTableMetaClient - correct naming - Replacing renaming use of dataset across code/comments - Few usages in comments and use of Spark SQL DataSet remain unscathed	2020-01-07 12:52:32 -08:00
SteNicholas	a733f4ef72	[MINOR] Optimize hudi-cli module (#1136 )	2020-01-04 09:05:50 -08:00
Pratyaksh Sharma	290278fc6c	[HUDI-118]: Options provided for passing properties to Cleaner, compactor and importer commands	2020-01-03 16:00:57 -08:00
hongdongdong	ff1113f3b7	[HUDI-492]Fix show env all in hudi-cli	2020-01-03 15:50:20 -08:00
Suneel Marthi	add4b1e329	Merge pull request #1143 from BigDataArtisans/outoflimit [MINOR] Fix out of limits for results	2019-12-31 02:08:54 -05:00

1 2 3

135 Commits