1
0
Files
hudi/docs/index.md
2017-03-02 11:39:40 -08:00

27 lines
1.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: Hoodie Overview
keywords: homepage
tags: [getting_started]
sidebar: mydoc_sidebar
permalink: index.html
summary: "Hoodie lowers data latency across the board, while simultaneously achieving orders of magnitude of efficiency over traditional batch processing."
---
Hoodie manages storage of large analytical datasets on [HDFS](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) and serve them out via two types of tables
* **Read Optimized Table** - Provides excellent query performance via purely columnar storage (e.g. [Parquet](https://parquet.apache.org/))
* **Near-Real time Table** - Provides queries on real-time data, using a combination of columnar & row based storage (e.g Parquet + [Avro](http://avro.apache.org/docs/current/mr.html))
{% include image.html file="hoodie_intro_1.png" alt="hoodie_intro_1.png" %}
By carefully managing how data is laid out in storage & how its exposed to queries, Hoodie is able to power a rich data ecosystem where external sources can be ingested into Hadoop in near real-time. The ingested data is then available for interactive SQL Engines like [Presto](https://prestodb.io) & [Spark](https://spark.apache.org/sql/), while at the same time capable of being consumed incrementally from processing/ETL frameworks like [Hive](https://hive.apache.org/) & [Spark](https://spark.apache.org/docs/latest/) to build derived (Hoodie) datasets.
Hoodie broadly consists of a self contained Spark library to build datasets and integrations with existing query engines for data access.
{% include callout.html content="Hoodie is a new project. Near Real-Time Table implementation is currently underway. Get involved [here](https://github.com/uber/hoodie/projects/1)" type="info" %}