1
0

Making minor documentation fixes

This commit is contained in:
prazanna
2017-03-02 11:35:09 -08:00
committed by GitHub
parent 116a78094f
commit 8a2a9ae764

View File

@@ -18,12 +18,9 @@ Hoodie manages storage of large analytical datasets on [HDFS](http://hadoop.apac
{% include image.html file="hoodie_intro_1.png" alt="hoodie_intro_1.png" %}
By carefully managing how data is laid out on storage & how its exposed to queries, Hoodie is able to power a rich data ecosystem where external sources can be ingested into Hadoop in near-real time.
The ingested data is then available for interactive SQL Engines like [Presto](https://prestodb.io) & [Spark](https://spark.apache.org/sql/),
while at the same time capable of being consumed incrementally from processing/ETL frameoworks like [Hive](https://hive.apache.org/) & [Spark](https://spark.apache.org/docs/latest/) to build derived (hoodie) datasets.
By carefully managing how data is laid out in storage & how its exposed to queries, Hoodie is able to power a rich data ecosystem where external sources can be ingested into Hadoop in near real-time. The ingested data is then available for interactive SQL Engines like [Presto](https://prestodb.io) & [Spark](https://spark.apache.org/sql/), while at the same time capable of being consumed incrementally from processing/ETL frameworks like [Hive](https://hive.apache.org/) & [Spark](https://spark.apache.org/docs/latest/) to build derived (Hoodie) datasets.
Hoodie broadly consists of a self contained Spark library to build datasets and integrations with existing query engines for data access.
{% include callout.html content="Hoodie is a young project. Near-Real time Table implementation is currently underway. Get involved [here](https://github.com/uber/hoodie/projects/1)" type="info" %}
{% include callout.html content="Hoodie is a new project. Near Real-Time Table implementation is currently underway. Get involved [here](https://github.com/uber/hoodie/projects/1)" type="info" %}