1
0
Commit Graph

60 Commits

Author SHA1 Message Date
Vinoth Chandar
0cd186c899 Multi FS Support
- Reviving PR 191, to make FileSystem creation off actual path
 - Streamline all filesystem access to HoodieTableMetaClient
 - Hadoop Conf from Spark Context serialized & passed to executor code too
 - Pick up env vars prefixed with HOODIE_ENV_ into Configuration object
 - Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS
 - Adding s3a to supported schemes & support escaping "." in env vars
 - Tests use HoodieTestUtils.getDefaultHadoopConf
2018-01-17 23:34:21 -08:00
Nishith Agarwal
44839b88c6 Removing compaction action type and associated compaction timeline operations, replace with commit action type 2018-01-09 09:56:15 -08:00
Vinoth Chandar
e45679f5e2 Reformatting code per Google Code Style all over 2017-11-12 23:19:02 -08:00
Nishith Agarwal
c7d63a7622 1) Separated rollback as a table operation 2) Implement rollback for MOR 2017-10-12 07:36:46 -07:00
Vinoth Chandar
e1fe3ab937 [maven-release-plugin] prepare for next development iteration 2017-10-02 22:42:54 -07:00
Vinoth Chandar
50139fe904 [maven-release-plugin] prepare release hoodie-0.4.0 2017-10-02 22:42:32 -07:00
Vinoth Chandar
64e0573aca Adding hoodie-spark to support Spark Datasource for Hoodie
- Write with COW/MOR paths work fully
 - Read with RO view works on both storages*
 - Incremental view supported on COW
 - Refactored out HoodieReadClient methods, to just contain key based access
 - HoodieDataSourceHelpers class can be now used to construct inputs to datasource
 - Tests in hoodie-client using new helpers and mechanisms
 - Basic tests around save modes & insert/upserts (more to follow)
 - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest
 - Updated documentation to describe usage
 - New sample app written using the DataSource API
2017-10-02 20:44:53 -07:00
Nishith Agarwal
63f1b12355 adding ability to read archived files written in log format 2017-08-25 14:40:07 -07:00
Prasanna Rajaperumal
7d3963b4ab Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR) 2017-06-30 11:41:23 -07:00
Nishith Agarwal
3eba812a1b [maven-release-plugin] prepare for next development iteration 2017-06-30 11:17:07 -07:00
Nishith Agarwal
06d44daea3 [maven-release-plugin] prepare release hoodie-0.3.9 2017-06-30 11:16:58 -07:00
Vinoth Chandar
c00f1a9ed9 Refactoring HoodieTableFileSystemView using FileGroups/FileSlices
- Merged all filter* and get* methods
 - new constructor takes filestatus[]
 - All existing tests pass
 - FileGroup is all files that belong to a fileID within a partition
 - FileSlice is a generation of data and log files, starting at a base commit
2017-06-22 17:16:13 -07:00
Prasanna Rajaperumal
0ed3fac5e3 [maven-release-plugin] prepare for next development iteration 2017-06-16 11:03:17 -07:00
Prasanna Rajaperumal
45732e440c [maven-release-plugin] prepare release hoodie-0.3.8 2017-06-16 10:59:58 -07:00
Prasanna Rajaperumal
933cc8071f [maven-release-plugin] prepare for next development iteration 2017-05-24 14:02:50 -07:00
Prasanna Rajaperumal
bebae06b5b [maven-release-plugin] prepare release hoodie-0.3.7 2017-05-24 14:02:41 -07:00
Vinoth Chandar
da17c5c607 Introduce getCommitsAndCompactionsTimeline() explicitly & adjust usage across code base 2017-05-01 21:48:27 -07:00
Vinoth Chandar
bae0528013 Cleanup calls to HoodieTimeline.compareTimeStamps 2017-05-01 21:48:27 -07:00
Prasanna Rajaperumal
c3258039f0 [maven-release-plugin] prepare for next development iteration 2017-04-27 11:00:56 -07:00
Prasanna Rajaperumal
de1bdad756 [maven-release-plugin] prepare release hoodie-0.3.6 2017-04-27 11:00:45 -07:00
Vinoth Chandar
2b6322318c CR feedback 2017-04-03 18:28:01 -07:00
Vinoth Chandar
f9fd16069d FSUtils.getAllPartitionsPaths() works based on .hoodie_partition_metadata
- clean/rollback/write paths covered by existing tests
 - Snapshot copier fixed to copy metadata file also, and test fixed
 - Existing tables need to be repaired by addition of metadata, before this can be rolled out
2017-04-03 18:28:01 -07:00
Vinoth Chandar
3129770fd0 Create .hoodie_partition_metadata in each partition, linking back to basepath
- Concurreny handled via taskID, failure recovery handled via renames
 - Falls back to search 3 levels up
 - Cli tool has command to add this to existing tables
2017-04-03 18:28:01 -07:00
Prasanna Rajaperumal
57ab7a2405 [maven-release-plugin] prepare for next development iteration 2017-03-31 14:58:55 -07:00
Prasanna Rajaperumal
803c635098 [maven-release-plugin] prepare release hoodie-0.3.5 2017-03-31 14:58:46 -07:00
Prasanna Rajaperumal
f4bb44c1b1 Update snapshot version to 0.3.5-SNAPSHOT 2017-03-31 14:54:54 -07:00
ovj
21898907c1 tool for importing hive tables (in parquet format) into hoodie dataset (#89)
* tool for importing hive tables (in parquet format) into hoodie dataset

* review fixes

* review fixes

* review fixes
2017-03-21 14:42:13 -07:00
Prasanna Rajaperumal
d83b671ada Implement Savepoints and required metadata timeline - Part 2 2017-03-13 23:09:29 -07:00
prazanna
6f36e1eaaf Implement Savepoints and required metadata timeline (#86)
- Introduce avro to save clean metadata with details about the last commit that was retained
- Save rollback metadata in the meta timeline
- Create savepoint metadata and add API to createSavepoint, deleteSavepoint and rollbackToSavepoint
- Savepointed commit should not be rolledback or cleaned or archived
- introduce cli commands to show, create and rollback to savepoints
- Write unit tests to test savepoints and rollbackToSavepoints
2017-03-13 15:12:03 -07:00
prazanna
eb46e7c72b Implement Merge on Read Storage (#76)
1. Create HoodieTable abstraction for commits and fileSystemView
2. HoodieMergeOnReadTable created
3. View is now always obtained from the table and the correct view based on the table type is returned
2017-02-21 16:24:38 -08:00
Prasanna Rajaperumal
ccd8cb2407 Take 2: Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0
- Refactored timelines to be a single timeline for all active events and one for archived events. CommitTimeline and other timelines can be inferred by applying a filter on the activeTimelime
- Introduced HoodieInstant to abstract different types of action, commit time and if isInFlight
- Implemented other review comments
2017-02-21 16:23:53 -08:00
Prasanna Rajaperumal
8ee777a9bb Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0
The following is the gist of changes done

- All low-level operation of creating a commit code was in HoodieClient which made it hard to share code if there was a compaction commit.
- HoodieTableMetadata contained a mix of metadata and filtering files. (Also few operations required FileSystem to be passed in because those were called from TaskExecutors and others had FileSystem as a global variable). Since merge-on-read requires a lot of that code, but will have to change slightly on how it operates on the metadata and how it filters the files. The two set of operation are split into HoodieTableMetaClient and TableFileSystemView.
- Everything (active commits, archived commits, cleaner log, save point log and in future delta and compaction commits) in HoodieTableMetaClient is a HoodieTimeline. Timeline is a series of instants, which has an in-built concept of inflight and completed commit markers.
- A timeline can be queries for ranges, contains and also use to create new datapoint (create a new commit etc). Commit (and all the above metadata) creation/deletion is streamlined in a timeline
- Multiple timelines can be merged into a single timeline, giving us an audit timeline to whatever happened in a hoodie dataset. This also helps with #55.
- Move to java 8 and introduce java 8 succinct syntax in refactored code
2017-02-21 16:23:53 -08:00
Prasanna Rajaperumal
283269e57f [maven-release-plugin] prepare for next development iteration 2017-02-20 16:52:25 -08:00
Prasanna Rajaperumal
d5a5f2ddff [maven-release-plugin] prepare release hoodie-0.3.0 2017-02-20 16:52:04 -08:00
Prasanna Rajaperumal
be1dd9444f [maven-release-plugin] prepare for next development iteration 2017-02-20 16:09:05 -08:00
Prasanna Rajaperumal
47583e280f [maven-release-plugin] prepare release hoodie-0.2.14 2017-02-20 16:08:45 -08:00
Prasanna Rajaperumal
2d49711cce Changing the current development version to 0.2.14-SNAPSHOT 2017-02-20 16:01:24 -08:00
Prasanna Rajaperumal
cc58a4c3e0 [maven-release-plugin] prepare for next development iteration 2017-02-20 15:49:45 -08:00
Prasanna Rajaperumal
dd03038254 [maven-release-plugin] prepare release hoodie-0.2.13 2017-02-20 15:49:20 -08:00
Prasanna Rajaperumal
57a0b7a781 [maven-release-plugin] prepare for next development iteration 2017-02-20 15:35:19 -08:00
Prasanna Rajaperumal
9828bd8019 [maven-release-plugin] prepare release hoodie-0.2.12 2017-02-20 15:35:03 -08:00
Prasanna Rajaperumal
8f12163166 [maven-release-plugin] prepare for next development iteration 2017-02-20 15:00:35 -08:00
Prasanna Rajaperumal
6e6f6efb94 [maven-release-plugin] prepare release hoodie-0.2.11 2017-02-20 15:00:16 -08:00
Prasanna Rajaperumal
693d751506 [maven-release-plugin] prepare for next development iteration 2017-01-10 22:27:35 -08:00
Prasanna Rajaperumal
e9866bb4d9 [maven-release-plugin] prepare release hoodie-0.2.10 2017-01-10 22:27:28 -08:00
Prasanna Rajaperumal
1ced46ab3e [maven-release-plugin] prepare for next development iteration 2017-01-05 20:04:35 -08:00
Prasanna Rajaperumal
e9f0d4d0bf [maven-release-plugin] prepare release hoodie-0.2.9 2017-01-05 20:04:28 -08:00
Prasanna Rajaperumal
7171ea6909 [maven-release-plugin] prepare for next development iteration 2017-01-05 19:43:31 -08:00
Prasanna Rajaperumal
c1f2d1e456 [maven-release-plugin] prepare release hoodie-0.2.8 2017-01-05 19:43:25 -08:00
vinoth chandar
501ef9d4da Merge pull request #10 from vinothchandar/master
Adding hoodie-utilities module
2016-12-28 16:38:12 -08:00