1
0
Commit Graph

191 Commits

Author SHA1 Message Date
Prashant Wason
2603cfb33e [HUDI-684] Introduced abstraction for writing and reading different types of base file formats. (#1687)
Notable changes:
    1. HoodieFileWriter and HoodieFileReader abstractions for writer/reader side of a base file format
    2. HoodieDataBlock abstraction for creation specific data blocks for base file formats. (e.g. Parquet has HoodieAvroDataBlock)
    3. All hardocded references to Parquet / Parquet based classes have been abstracted to call methods which accept a base file format
    4. HiveSyncTool accepts the base file format as a CLI parameter
    5. HoodieDeltaStreamer accepts the base file format as a CLI parameter
    6. HoodieSparkSqlWriter accepts the base file format as a parameter
2020-06-25 23:46:55 -07:00
hongdd
f3a701757b [HUDI-696] Add unit test for CommitsCommand (#1724) 2020-06-18 21:42:13 +08:00
hongdd
5099a91edd [HUDI-709] Add unit test for UtilsCommand (#1686) 2020-06-18 19:54:14 +08:00
hongdd
fcabc8fbca [HUDI-1019] Clean refresh command in CLI (#1725) 2020-06-14 14:30:28 +08:00
Balaji Varadarajan
a68180b179 [HUDI-988] Fix Unit Test Flakiness : Ensure all instantiations of HoodieWriteClient is closed properly. Fix bug in TestRollbacks. Make CLI unit tests for Hudi CLI check skip redering strings 2020-06-04 02:52:21 -07:00
Raymond Xu
742c204099 [HUDI-811] Restructure test packages in hudi-client/cli (#1689) 2020-06-02 10:25:42 +08:00
Raymond Xu
03f136361a [HUDI-811] Restructure test packages in hudi-common (#1644)
* [HUDI-811] Restructure test packages in hudi-common
2020-05-27 16:28:17 +08:00
hongdd
802d16c8c9 [HUDI-707] Add unit test for StatsCommand (#1645) 2020-05-21 18:28:04 +08:00
rolandjohann
244d47494e [HUDI-888] fix NullPointerException in HoodieCompactor (#1622) 2020-05-20 04:22:35 -07:00
hongdd
161a798337 [HUDI-706] Add unit test for SavepointsCommand (#1624) 2020-05-19 18:36:01 +08:00
hongdd
57132f79bb [HUDI-705] Add unit test for RollbacksCommand (#1611) 2020-05-18 14:04:06 +08:00
hongdd
3a2fe13fcb [HUDI-701] Add unit test for HDFSParquetImportCommand (#1574) 2020-05-14 19:15:49 +08:00
Shen Hong
295d00beea [HUDI-880] Replace part of spark context by hadoop configuration in HoodieTable. (#1614) 2020-05-11 23:33:57 -07:00
Balaji Varadarajan
8d0e23173b [HUDI-820] cleaner repair command should only inspect clean metadata files (#1542) 2020-05-11 09:25:54 +08:00
hongdd
f921469afc [HUDI-704] Add test for RepairsCommand (#1554) 2020-05-07 23:02:28 +08:00
vinoth chandar
c4b71622b9 [MINOR] Reorder HoodieTimeline#compareTimestamp arguments for better readability (#1575)
- reads nicely as (instantTime1, GREATER_THAN_OR_EQUALS, instantTime2) etc
2020-04-30 09:19:39 -07:00
hongdd
9059bce977 [HUDI-702] Add test for HoodieLogFileCommand (#1522) 2020-04-29 18:47:27 +08:00
Raymond Xu
06dae30297 [HUDI-810] Migrate ClientTestHarness to JUnit 5 (#1553) 2020-04-28 23:38:16 +08:00
vinoth chandar
19ca0b5629 [HUDI-785] Refactor compaction/savepoint execution based on ActionExector abstraction (#1548)
- Savepoint and compaction classes moved to table.action.* packages
 - HoodieWriteClient#savepoint(...) returns void
 - Renamed HoodieCommitArchiveLog -> HoodieTimelineArchiveLog
 - Fixed tests to take into account the additional validation done
 - Moved helper code into CompactHelpers and SavepointHelpers
2020-04-25 18:26:44 -07:00
Prashant Wason
62bd3e7ded [HUDI-757] Added hudi-cli command to export metadata of Instants.
Example:
hudi:db.table-> export instants --localFolder /tmp/ --limit 5 --actions clean,rollback,commit --desc false
2020-04-21 12:41:19 -07:00
Prashant Wason
19d29ac7d0 [HUDI-741] Added checks to validate Hoodie's schema evolution.
HUDI specific validation of schema evolution should ensure that a newer schema can be used for the dataset by checking that the data written using the old schema can be read using the new schema.

Code changes:

1. Added a new config in HoodieWriteConfig to enable schema validation check (disabled by default)
2. Moved code that reads schema from base/log files into hudi-common from hudi-hive-sync
3. Added writerSchema to the extraMetadata of compaction commits in MOR table. This is same as that for commits on COW table.

Testing changes:

4. Extended TestHoodieClientBase to add insertBatch API which allows inserting a new batch of unique records into a HUDI table
5. Added a unit test to verify schema evolution for both COW and MOR tables.
6. Added unit tests for schema compatiblity checks.
2020-04-15 23:34:59 -07:00
hongdd
644c1cc8bd [HUDI-698]Add unit test for CleansCommand (#1449) 2020-04-14 17:54:47 +08:00
vinoth chandar
661b0b3bab [HUDI-761] Refactoring rollback and restore actions using the ActionExecutor abstraction (#1492)
- rollback() and restore() table level APIs introduced
- Restore is implemented by wrapping calls to rollback executor
- Existing tests transparently cover this, since its just a refactor
2020-04-13 08:29:19 -07:00
hongdd
a464a2972e [HUDI-700]Add unit test for FileSystemViewCommand (#1490) 2020-04-11 10:12:21 +08:00
hongdd
4e5c8671ef [HUDI-740]Fix can not specify the sparkMaster and code clean for SparkUtil (#1452) 2020-04-08 21:33:15 +08:00
Ramachandran Madtas Subramaniam
639ec20412 [HUDI-562] Enable testing at debug log level
This is to ensure that tests will execute all code paths, even the ones
written under DEBUG log levels. This will improve coverage as well as
ensure there are no surprised when DEBUG log level is enabled in
production.
2020-04-02 11:14:35 -07:00
Shaofeng Shi
78b3194e82 [HUDI-751] Fix some coding issues reported by FindBugs (#1470) 2020-03-31 21:19:32 +08:00
lamber-ken
dbc9acd23a [HUDI-716] Exception: Not an Avro data file when running HoodieCleanClient.runClean (#1432) 2020-03-30 11:19:17 -07:00
Suneel Marthi
fa36082554 [HUDI-746] Reduce build warnings < 10 (#1465) 2020-03-30 11:46:52 +08:00
vinoth chandar
e057c27603 [HUDI-744] Restructure hudi-common and clean up files under util packages (#1462)
- Brings more order and cohesion to the classes in hudi-common
 - Utils classes related to a particular concept (avro, timeline,...) are placed near to the package
 - common.fs package now contains all the filesystem level classes including wrapper filesystem
 - bloom.filter package renamed to just bloom
 - config package contains classes that help store properties
 - common.fs.inline package contains all the inline filesystem classes/impl
 - common.table.timeline now consolidates all timeline related classes
 - common.table.view consolidates all the classes related to filesystem view metadata
 - common.table.timeline.versioning contains all classes related to versioning of timeline
 - Fix few unit tests as a result
 - Moved the test packages around to match the source file move
 - Rename AvroUtils to TimelineMetadataUtils & minor fixes/typos
2020-03-29 10:58:49 -07:00
leesf
07c3c5d797 [HUDI-679] Make io package Spark free (#1460)
* [HUDI-679] Make io package Spark free
2020-03-29 16:54:00 +08:00
Suneel Marthi
8c3001363d HUDI-479: Eliminate or Minimize use of Guava if possible (#1159) 2020-03-28 03:11:32 -04:00
hongdd
cafc87041b [HUDI-697]Add unit test for ArchivedCommitsCommand (#1424) 2020-03-23 13:46:10 +08:00
Zhiyuan Zhao
0241b21f77 [HUDI-65] commitTime rename to instantTime (#1431) 2020-03-22 18:06:00 -07:00
hongdd
f1d7bb381d [HUDI-695]Add unit test for TableCommand (#1411) 2020-03-17 14:15:30 +08:00
hongdd
3ef9e885ca [HUDI-715] Fix duplicate name in TableCommand (#1410) 2020-03-16 17:19:57 +08:00
hongdd
55e6d34815 [HUDI-694]Add unit test for SparkEnvCommand (#1401)
* Add test for SparkEnvCommand
2020-03-16 11:52:40 +08:00
hongdd
0f892ef62c [HUDI-692] Add delete savepoint for cli (#1397)
* Add delete savepoint for cli
* Add check
* Move JavaSparkContext to try
2020-03-11 16:49:02 -07:00
satishkotha
7194514aff [HUDI-689] Change CLI command names to not have overlap (#1392) 2020-03-11 16:29:54 -07:00
Satish Kotha
3d3781810c [CLI] Add export to table 2020-03-06 08:53:23 -08:00
vinoth chandar
71170fafe7 [HUDI-554] Cleanup package structure in hudi-client (#1346)
- Just package, class moves and renames with the following intent
 - `client` now has all the various client classes, that do the transaction management
 - `func` renamed to `execution` and some helpers moved to `client/utils`
 - All compaction code under `io` now under `table/compact`
 - Rollback code under `table/rollback` and in general all code for individual operations under `table`
 - `exception` `config`, `metrics` left untouched
 - Moved the tests also accordingly
 - Fixed some flaky tests
2020-02-27 08:05:58 -08:00
Suneel Marthi
078d4825d9 [HUDI-624]: Split some of the code from PR for HUDI-479 (#1344) 2020-02-21 14:22:21 +08:00
Satish Kotha
20ed2516d3 [HUDI-571] Add show archived compaction(s) to CLI 2020-02-14 10:58:28 -08:00
lamber-ken
01c868ab86 [HUDI-574] Fix CLI counts small file inserts as updates (#1321) 2020-02-13 22:20:58 +08:00
Satish Kotha
63b42166b1 CLI - add option to print additional commit metadata 2020-02-12 14:11:24 -08:00
Satish Kotha
462fd02556 [HUDI-571] Add 'commits show archived' command to CLI 2020-02-05 11:25:34 -08:00
lamber-ken
46842f4e92 [MINOR] Remove the declaration of thrown RuntimeException (#1305) 2020-02-05 23:23:20 +08:00
Balaji Varadarajan
923e2b4a1e [HUDI-535] Ensure Compaction Plan is always written in .aux folder to avoid 0.5.0/0.5.1 reader-writer compatibility issues (#1229) 2020-01-17 10:56:35 -08:00
vinoth chandar
baa6b5e889 [HUDI-537] Introduce repair overwrite-hoodie-props CLI command (#1241) 2020-01-17 01:21:44 -08:00
vinoth chandar
c2c0f6b13d [HUDI-509] Renaming code in sync with cWiki restructuring (#1212)
- Storage Type replaced with Table Type (remaining instances)
 - View types replaced with query types;
 - ReadOptimized view referred as Snapshot Query
 - TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views
 - HoodieDataFile renamed to HoodieBaseFile
 - Hive Sync tool will register RO tables for MOR with a `_ro` suffix
 - Datasource/Deltastreamer options renamed accordingly
 - Support fallback to old config values as well, so migration is painless
 - Config for controlling _ro suffix addition
 - Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView
2020-01-16 23:58:47 -08:00