- Overview, Use Cases, Powered By are very detailed - Cleaned up QuickStart - Redistribute the content from README to correct pages to be improved upon - Switch to blue theme
1.7 KiB
title, keywords, tags, sidebar, permalink, summary
| title | keywords | tags | sidebar | permalink | summary | |
|---|---|---|---|---|---|---|
| Hoodie Overview | homepage |
|
mydoc_sidebar | index.html | Hoodie lowers data latency across the board, while simultaenously achieving orders of magnitude of efficiency over traditional batch processing. |
Hoodie manages storage of large analytical datasets on HDFS and serve them out via two types of tables
- Read Optimized Table - Provides excellent query performance via purely columnar storage (e.g. Parquet)
- Near-Real time Table - Provides queries on real-time data, using a combination of columnar & row based storage (e.g Parquet + Avro)
{% include image.html file="hoodie_intro_1.png" alt="hoodie_intro_1.png" %}
By carefully managing how data is laid out on storage & how its exposed to queries, Hoodie is able to power a rich data ecosystem where external sources can be ingested into Hadoop in near-real time. The ingested data is then available for interactive SQL Engines like Presto & Spark, while at the same time capable of being consumed incrementally from processing/ETL frameoworks like Hive & Spark to build derived (hoodie) datasets.
Hoodie broadly consists of a self contained Spark library to build datasets and integrations with existing query engines for data access.
{% include callout.html content="Hoodie is a young project. Near-Real time Table implementation is currently underway. Get involved here" type="info" %}