1
0
Commit Graph

23 Commits

Author SHA1 Message Date
ovj
21898907c1 tool for importing hive tables (in parquet format) into hoodie dataset (#89)
* tool for importing hive tables (in parquet format) into hoodie dataset

* review fixes

* review fixes

* review fixes
2017-03-21 14:42:13 -07:00
vinoth chandar
69d3950a32 Revamped Deltastreamer (#93)
* Add analytics to site

* Fix ugly favicon

* New & Improved HoodieDeltaStreamer

 - Can incrementally consume from HDFS or Kafka, with exactly-once semantics!
 - Supports Json/Avro data, Source can also do custom things
 - Source is totally pluggable, via reflection
 - Key generation is pluggable, currently added SimpleKeyGenerator
 - Schema provider is pluggable, currently Filebased schemas
 - Configurable field to break ties during preCombine
 - Finally, can also plugin the HoodieRecordPayload, to get other merge types than overwriting
 - Handles efficient avro serialization in Spark

 Pending :
 - Rewriting of HiveIncrPullSource
 - Hive sync via hoodie-hive
 - Cleanup & tests

* Minor fixes from master rebase

* Implementation of HiveIncrPullSource
 - Copies commit by commit from source to target

* Adding TimestampBasedKeyGenerator
 - Supports unix time & date strings
2017-03-13 12:41:29 -07:00
Prasanna Rajaperumal
48fbb0f425 Implement reliable log file management for Merge on read, which is fault tolerant and allows random block level access on avro file 2017-02-21 16:23:53 -08:00
Prasanna Rajaperumal
8ee777a9bb Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0
The following is the gist of changes done

- All low-level operation of creating a commit code was in HoodieClient which made it hard to share code if there was a compaction commit.
- HoodieTableMetadata contained a mix of metadata and filtering files. (Also few operations required FileSystem to be passed in because those were called from TaskExecutors and others had FileSystem as a global variable). Since merge-on-read requires a lot of that code, but will have to change slightly on how it operates on the metadata and how it filters the files. The two set of operation are split into HoodieTableMetaClient and TableFileSystemView.
- Everything (active commits, archived commits, cleaner log, save point log and in future delta and compaction commits) in HoodieTableMetaClient is a HoodieTimeline. Timeline is a series of instants, which has an in-built concept of inflight and completed commit markers.
- A timeline can be queries for ranges, contains and also use to create new datapoint (create a new commit etc). Commit (and all the above metadata) creation/deletion is streamlined in a timeline
- Multiple timelines can be merged into a single timeline, giving us an audit timeline to whatever happened in a hoodie dataset. This also helps with #55.
- Move to java 8 and introduce java 8 succinct syntax in refactored code
2017-02-21 16:23:53 -08:00
Prasanna Rajaperumal
283269e57f [maven-release-plugin] prepare for next development iteration 2017-02-20 16:52:25 -08:00
Prasanna Rajaperumal
d5a5f2ddff [maven-release-plugin] prepare release hoodie-0.3.0 2017-02-20 16:52:04 -08:00
Prasanna Rajaperumal
be1dd9444f [maven-release-plugin] prepare for next development iteration 2017-02-20 16:09:05 -08:00
Prasanna Rajaperumal
47583e280f [maven-release-plugin] prepare release hoodie-0.2.14 2017-02-20 16:08:45 -08:00
Prasanna Rajaperumal
2d49711cce Changing the current development version to 0.2.14-SNAPSHOT 2017-02-20 16:01:24 -08:00
Prasanna Rajaperumal
cc58a4c3e0 [maven-release-plugin] prepare for next development iteration 2017-02-20 15:49:45 -08:00
Prasanna Rajaperumal
dd03038254 [maven-release-plugin] prepare release hoodie-0.2.13 2017-02-20 15:49:20 -08:00
Prasanna Rajaperumal
57a0b7a781 [maven-release-plugin] prepare for next development iteration 2017-02-20 15:35:19 -08:00
Prasanna Rajaperumal
9828bd8019 [maven-release-plugin] prepare release hoodie-0.2.12 2017-02-20 15:35:03 -08:00
Prasanna Rajaperumal
8f12163166 [maven-release-plugin] prepare for next development iteration 2017-02-20 15:00:35 -08:00
Prasanna Rajaperumal
6e6f6efb94 [maven-release-plugin] prepare release hoodie-0.2.11 2017-02-20 15:00:16 -08:00
Prasanna Rajaperumal
693d751506 [maven-release-plugin] prepare for next development iteration 2017-01-10 22:27:35 -08:00
Prasanna Rajaperumal
e9866bb4d9 [maven-release-plugin] prepare release hoodie-0.2.10 2017-01-10 22:27:28 -08:00
Prasanna Rajaperumal
1ced46ab3e [maven-release-plugin] prepare for next development iteration 2017-01-05 20:04:35 -08:00
Prasanna Rajaperumal
e9f0d4d0bf [maven-release-plugin] prepare release hoodie-0.2.9 2017-01-05 20:04:28 -08:00
Prasanna Rajaperumal
7171ea6909 [maven-release-plugin] prepare for next development iteration 2017-01-05 19:43:31 -08:00
Prasanna Rajaperumal
c1f2d1e456 [maven-release-plugin] prepare release hoodie-0.2.8 2017-01-05 19:43:25 -08:00
Vinoth Chandar
84e0a7f68a Fixing version in hoodie-utilities pom.xml 2016-12-28 16:43:16 -08:00
Vinoth Chandar
0c854faebe Adding hoodie-utilities module 2016-12-28 15:42:30 -08:00