lanyuanxiaoyao/hudi: 内部版本 - hudi - Gitea: Git with a cup of tea

Go to file

Prasanna Rajaperumal 8ee777a9bb Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0

The following is the gist of changes done

- All low-level operation of creating a commit code was in HoodieClient which made it hard to share code if there was a compaction commit.
- HoodieTableMetadata contained a mix of metadata and filtering files. (Also few operations required FileSystem to be passed in because those were called from TaskExecutors and others had FileSystem as a global variable). Since merge-on-read requires a lot of that code, but will have to change slightly on how it operates on the metadata and how it filters the files. The two set of operation are split into HoodieTableMetaClient and TableFileSystemView.
- Everything (active commits, archived commits, cleaner log, save point log and in future delta and compaction commits) in HoodieTableMetaClient is a HoodieTimeline. Timeline is a series of instants, which has an in-built concept of inflight and completed commit markers.
- A timeline can be queries for ranges, contains and also use to create new datapoint (create a new commit etc). Commit (and all the above metadata) creation/deletion is streamlined in a timeline
- Multiple timelines can be merged into a single timeline, giving us an audit timeline to whatever happened in a hoodie dataset. This also helps with #55.
- Move to java 8 and introduce java 8 succinct syntax in refactored code

2017-02-21 16:23:53 -08:00

deploy

Add ossrh profile to publish maven artifacts to oss.sonatype.org (synced with maven central)

2016-12-21 14:17:35 -08:00

docs

Adding admin guide, guide for sql queries and incr processing

2017-02-19 20:33:21 -08:00

hoodie-cli

Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0

2017-02-21 16:23:53 -08:00

hoodie-client

Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0

2017-02-21 16:23:53 -08:00

hoodie-common

Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0

2017-02-21 16:23:53 -08:00

hoodie-hadoop-mr

Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0

2017-02-21 16:23:53 -08:00

hoodie-hive

[maven-release-plugin] prepare for next development iteration

2017-02-20 16:52:25 -08:00

hoodie-utilities

Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0

2017-02-21 16:23:53 -08:00

_config.yml

Set theme jekyll-theme-minimal

2016-12-29 16:53:39 -08:00

.gitignore

Importing Hoodie Client from internal repo

2016-12-16 14:34:42 -08:00

.travis.yml

Make hoodie run on travis-ci

2016-12-20 19:26:48 -08:00

LICENSE.txt

Importing Hoodie Client from internal repo

2016-12-16 14:34:42 -08:00

pom.xml

Refactor hoodie-common and create right abstractions for Hoodie Storage V2.0

2017-02-21 16:23:53 -08:00

README.md

Shorten README and point to site

2017-01-09 11:30:46 -08:00

README.md

Hoodie manages storage of large analytical datasets on HDFS and serve them out via two types of tables

Read Optimized Table - Provides excellent query performance via purely columnar storage (e.g. Parquet)
Near-Real time Table (WIP) - Provides queries on real-time data, using a combination of columnar & row based storage (e.g Parquet + Avro)

For more, head over here

Languages

Java 81.4%

Scala 16.7%

ANTLR 0.9%

Shell 0.8%

Dockerfile 0.2%