1
0
Commit Graph

124 Commits

Author SHA1 Message Date
Vinoth Chandar
3129770fd0 Create .hoodie_partition_metadata in each partition, linking back to basepath
- Concurreny handled via taskID, failure recovery handled via renames
 - Falls back to search 3 levels up
 - Cli tool has command to add this to existing tables
2017-04-03 18:28:01 -07:00
Prasanna Rajaperumal
1e802ad4f2 Move HoodieAvroReader to hoodie-common, it will be used for compaction and in the record reader 2017-04-03 13:58:35 -07:00
Prasanna Rajaperumal
aee136777b Fixes needed to run merge-on-read testing on production scale data 2017-04-02 22:25:47 -07:00
Prasanna Rajaperumal
57ab7a2405 [maven-release-plugin] prepare for next development iteration 2017-03-31 14:58:55 -07:00
Prasanna Rajaperumal
803c635098 [maven-release-plugin] prepare release hoodie-0.3.5 2017-03-31 14:58:46 -07:00
Prasanna Rajaperumal
f4bb44c1b1 Update snapshot version to 0.3.5-SNAPSHOT 2017-03-31 14:54:54 -07:00
Prasanna Rajaperumal
77e54e78f8 Create the partition path if it does not exist when listing data files in a partition 2017-03-28 05:20:15 -07:00
Yash Sharma
e3b273e9fd formatting for docs 2017-03-28 05:08:54 -07:00
Yash Sharma
bca7e7dae4 improve documentations 2017-03-28 05:08:54 -07:00
Yash Sharma
d6f94b998d Hoodie operability with S3 2017-03-28 05:08:54 -07:00
prazanna
a7cd021f26 Update incremental pull query documentation 2017-03-23 16:20:54 -07:00
prazanna
0e3f635adb remove hardcoding of autoClean 2017-03-23 15:54:26 -07:00
Zeeshan Qureshi
a94f3a638e Pass table path as argument to HoodieClientExample 2017-03-23 08:12:20 -07:00
fishie9
b7047ab4fb Pass in String StroageLevel for WriteStatus (#113) 2017-03-23 04:31:30 -07:00
ovj
b02910c588 few fixes to quick start document (#112) 2017-03-22 18:25:26 -07:00
prazanna
f1b7afad21 Add config for index parallelism and make clean public (#109)
* Add config for index parallelism and make clean public

* Review comments on clean api modification
2017-03-21 17:36:46 -07:00
ovj
21898907c1 tool for importing hive tables (in parquet format) into hoodie dataset (#89)
* tool for importing hive tables (in parquet format) into hoodie dataset

* review fixes

* review fixes

* review fixes
2017-03-21 14:42:13 -07:00
prazanna
d835710c51 Metadata timeline marks an already complete instant as complete again (#98) 2017-03-17 12:42:26 -07:00
Prasanna Rajaperumal
d83b671ada Implement Savepoints and required metadata timeline - Part 2 2017-03-13 23:09:29 -07:00
prazanna
6f36e1eaaf Implement Savepoints and required metadata timeline (#86)
- Introduce avro to save clean metadata with details about the last commit that was retained
- Save rollback metadata in the meta timeline
- Create savepoint metadata and add API to createSavepoint, deleteSavepoint and rollbackToSavepoint
- Savepointed commit should not be rolledback or cleaned or archived
- introduce cli commands to show, create and rollback to savepoints
- Write unit tests to test savepoints and rollbackToSavepoints
2017-03-13 15:12:03 -07:00
vinoth chandar
69d3950a32 Revamped Deltastreamer (#93)
* Add analytics to site

* Fix ugly favicon

* New & Improved HoodieDeltaStreamer

 - Can incrementally consume from HDFS or Kafka, with exactly-once semantics!
 - Supports Json/Avro data, Source can also do custom things
 - Source is totally pluggable, via reflection
 - Key generation is pluggable, currently added SimpleKeyGenerator
 - Schema provider is pluggable, currently Filebased schemas
 - Configurable field to break ties during preCombine
 - Finally, can also plugin the HoodieRecordPayload, to get other merge types than overwriting
 - Handles efficient avro serialization in Spark

 Pending :
 - Rewriting of HiveIncrPullSource
 - Hive sync via hoodie-hive
 - Cleanup & tests

* Minor fixes from master rebase

* Implementation of HiveIncrPullSource
 - Copies commit by commit from source to target

* Adding TimestampBasedKeyGenerator
 - Supports unix time & date strings
2017-03-13 12:41:29 -07:00
Vinoth Chandar
c3257b9680 Fix ugly favicon 2017-03-12 20:30:42 -07:00
Vinoth Chandar
b252633fab Add analytics to site 2017-03-12 20:30:42 -07:00
prazanna
404726031d Adding Siddhartha Gunda as a contributor
for his contribution on the delete api
2017-03-04 01:36:21 -08:00
siddharthagunda
348a48aa80 Add delete support to Hoodie (#85) 2017-03-04 01:33:49 -08:00
Prasanna Rajaperumal
41e08018fc Fixing minor documentation fixes 2017-03-02 11:42:04 -08:00
Prasanna Rajaperumal
d84aea3512 Fixing minor documentation fixes 2017-03-02 11:39:40 -08:00
prazanna
8a2a9ae764 Making minor documentation fixes 2017-03-02 11:35:09 -08:00
vinoth chandar
116a78094f Cleanup code based on Java8 Lambdas (#84) 2017-02-27 15:52:13 -08:00
Wei Yan
c4fa585b27 Switch some info log to debug (#83)
* Switch some info log to debug

* fix a typo

* remote HoodieTableMetadata file
2017-02-23 20:12:36 -08:00
Prasanna Rajaperumal
fe5c5e8021 Test Failure in Travis-ci 2017-02-21 20:25:01 -08:00
Prasanna Rajaperumal
1132f3533d Merge and pull master commits 2017-02-21 17:53:28 -08:00
prazanna
eb46e7c72b Implement Merge on Read Storage (#76)
1. Create HoodieTable abstraction for commits and fileSystemView
2. HoodieMergeOnReadTable created
3. View is now always obtained from the table and the correct view based on the table type is returned
2017-02-21 16:24:38 -08:00
prazanna
11d2fd3428 Introduce RealtimeTableView and Implement HoodieRealtimeTableCompactor (#73) 2017-02-21 16:24:18 -08:00
Prasanna Rajaperumal
48fbb0f425 Implement reliable log file management for Merge on read, which is fault tolerant and allows random block level access on avro file 2017-02-21 16:23:53 -08:00
Prasanna Rajaperumal
ccd8cb2407 Take 2: Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0
- Refactored timelines to be a single timeline for all active events and one for archived events. CommitTimeline and other timelines can be inferred by applying a filter on the activeTimelime
- Introduced HoodieInstant to abstract different types of action, commit time and if isInFlight
- Implemented other review comments
2017-02-21 16:23:53 -08:00
Prasanna Rajaperumal
8ee777a9bb Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0
The following is the gist of changes done

- All low-level operation of creating a commit code was in HoodieClient which made it hard to share code if there was a compaction commit.
- HoodieTableMetadata contained a mix of metadata and filtering files. (Also few operations required FileSystem to be passed in because those were called from TaskExecutors and others had FileSystem as a global variable). Since merge-on-read requires a lot of that code, but will have to change slightly on how it operates on the metadata and how it filters the files. The two set of operation are split into HoodieTableMetaClient and TableFileSystemView.
- Everything (active commits, archived commits, cleaner log, save point log and in future delta and compaction commits) in HoodieTableMetaClient is a HoodieTimeline. Timeline is a series of instants, which has an in-built concept of inflight and completed commit markers.
- A timeline can be queries for ranges, contains and also use to create new datapoint (create a new commit etc). Commit (and all the above metadata) creation/deletion is streamlined in a timeline
- Multiple timelines can be merged into a single timeline, giving us an audit timeline to whatever happened in a hoodie dataset. This also helps with #55.
- Move to java 8 and introduce java 8 succinct syntax in refactored code
2017-02-21 16:23:53 -08:00
Prasanna Rajaperumal
283269e57f [maven-release-plugin] prepare for next development iteration 2017-02-20 16:52:25 -08:00
Prasanna Rajaperumal
d5a5f2ddff [maven-release-plugin] prepare release hoodie-0.3.0 2017-02-20 16:52:04 -08:00
Prasanna Rajaperumal
0e234ac0ef Moving to Spark 2.1.0 2017-02-20 16:47:52 -08:00
Prasanna Rajaperumal
be1dd9444f [maven-release-plugin] prepare for next development iteration 2017-02-20 16:09:05 -08:00
Prasanna Rajaperumal
47583e280f [maven-release-plugin] prepare release hoodie-0.2.14 2017-02-20 16:08:45 -08:00
Prasanna Rajaperumal
2d49711cce Changing the current development version to 0.2.14-SNAPSHOT 2017-02-20 16:01:24 -08:00
Prasanna Rajaperumal
4a47d26818 Fixing a javadoc lint issue 2017-02-20 15:57:58 -08:00
Prasanna Rajaperumal
cc58a4c3e0 [maven-release-plugin] prepare for next development iteration 2017-02-20 15:49:45 -08:00
Prasanna Rajaperumal
dd03038254 [maven-release-plugin] prepare release hoodie-0.2.13 2017-02-20 15:49:20 -08:00
Prasanna Rajaperumal
7178cb5a3f Fixing a javadoc lint issue 2017-02-20 15:41:32 -08:00
Prasanna Rajaperumal
57a0b7a781 [maven-release-plugin] prepare for next development iteration 2017-02-20 15:35:19 -08:00
Prasanna Rajaperumal
9828bd8019 [maven-release-plugin] prepare release hoodie-0.2.12 2017-02-20 15:35:03 -08:00
Prasanna Rajaperumal
8f12163166 [maven-release-plugin] prepare for next development iteration 2017-02-20 15:00:35 -08:00