hudi/docs/index.md at 8a39730273e0dbd3f9197de7a116d925f7ac13f9

Files

Vinoth Chandar 958f7ceda6 Adding Documentation for Getting Started Section

- Overview, Use Cases, Powered By  are very detailed
 - Cleaned up QuickStart
 - Redistribute the content from README to correct pages to be improved upon
 - Switch to blue theme

2017-01-04 20:50:44 -08:00

1.7 KiB

Raw Blame History

title, keywords, tags, sidebar, permalink, summary

title

keywords

tags

sidebar

permalink

summary

Hoodie Overview

homepage

getting_started

mydoc_sidebar

index.html

Hoodie lowers data latency across the board, while simultaenously achieving orders of magnitude of efficiency over traditional batch processing.

Hoodie manages storage of large analytical datasets on HDFS and serve them out via two types of tables

Read Optimized Table - Provides excellent query performance via purely columnar storage (e.g. Parquet)
Near-Real time Table - Provides queries on real-time data, using a combination of columnar & row based storage (e.g Parquet + Avro)

{% include image.html file="hoodie_intro_1.png" alt="hoodie_intro_1.png" %}

By carefully managing how data is laid out on storage & how its exposed to queries, Hoodie is able to power a rich data ecosystem where external sources can be ingested into Hadoop in near-real time. The ingested data is then available for interactive SQL Engines like Presto & Spark, while at the same time capable of being consumed incrementally from processing/ETL frameoworks like Hive & Spark to build derived (hoodie) datasets.

Hoodie broadly consists of a self contained Spark library to build datasets and integrations with existing query engines for data access.

{% include callout.html content="Hoodie is a young project. Near-Real time Table implementation is currently underway. Get involved here" type="info" %}

1.7 KiB Raw Blame History

1.7 KiB

Raw Blame History