Go to file

Balaji Varadarajan 1032fc3e54 [HUDI-137] Hudi cleaning state changes should be consistent with compaction actions

Before this change, Cleaner performs cleaning of old file versions and then stores the deleted files in .clean files.
With this setup, we will not be able to track file deletions if a cleaner fails after deleting files but before writing .clean metadata.
This is fine for regular file-system view generation but Incremental timeline syncing relies on clean/commit/compaction metadata to keep a consistent file-system view.

Cleaner state transitions is now similar to that of compaction.

1. Requested : HoodieWriteClient.scheduleClean() selects the list of files that needs to be deleted and stores them in metadata
2. Inflight : HoodieWriteClient marks the state to be inflight before it starts deleting
3. Completed : HoodieWriteClient marks the state after completing the deletion according to the cleaner plan

2019-11-11 10:40:16 -08:00

docker

[HUDI-218] Adding Presto support to Integration Test (#1003 )

2019-11-11 06:21:49 -08:00

hudi-cli

[HUDI-137] Hudi cleaning state changes should be consistent with compaction actions

2019-11-11 10:40:16 -08:00

hudi-client

[HUDI-137] Hudi cleaning state changes should be consistent with compaction actions

2019-11-11 10:40:16 -08:00

hudi-common

[HUDI-137] Hudi cleaning state changes should be consistent with compaction actions

2019-11-11 10:40:16 -08:00

hudi-hadoop-mr

[HUDI-314] Fix multi partition keys error when querying a realtime table

2019-11-02 19:49:04 -07:00

hudi-hive

[HUDI-137] Hudi cleaning state changes should be consistent with compaction actions

2019-11-11 10:40:16 -08:00

hudi-integ-test

[HUDI-218] Adding Presto support to Integration Test (#1003 )

2019-11-11 06:21:49 -08:00

hudi-spark

[HUDI-290] Normalize test class name of all test classes (#951 )

2019-10-22 20:19:11 -07:00

hudi-timeline-service

[HUDI-137] Hudi cleaning state changes should be consistent with compaction actions

2019-11-11 10:40:16 -08:00

hudi-utilities

[HUDI-253]: added validations for schema provider class (#995 )

2019-11-11 06:03:44 -08:00

packaging

[MINOR] Move all repository declarations to parent pom (#966 )

2019-10-22 20:17:13 -07:00

scripts

[HUDI-121] Fix issues in release scripts

2019-10-16 03:33:57 -07:00

style

[HUDI-121] Fix licensing issues found during RC voting by general incubator group

2019-10-16 02:09:02 -07:00

_config.yml

[HUDI-230] Add missing Apache License in some files

2019-08-30 09:38:28 -07:00

.gitignore

[HUDI-68] Pom cleanup & demo automation (#846 )

2019-08-22 20:18:50 -07:00

.travis.yml

[MINOR] Fix no output in travis (#984 )

2019-10-29 21:17:45 -07:00

DISCLAIMER-WIP

[HUDI-121] Fix licensing issues found during RC voting by general incubator group

2019-10-16 02:09:02 -07:00

LICENSE

[HUDI-121] Fix licensing issues found during RC voting by general incubator group

2019-10-16 02:09:02 -07:00

NOTICE

[MINOR] Add incubating to NOTICE and README.md

2019-10-09 21:42:29 -07:00

pom.xml

Bump httpclient from 4.3.2 to 4.3.6 (#980 )

2019-11-01 05:22:31 -07:00

README.md

[DOCS] Change Hudi acronyms to plural

2019-11-10 12:39:58 -08:00

README.md

Hudi

Apache Hudi (Incubating) (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage).

Features

Upsert support with fast, pluggable indexing
Atomically publish data with rollback support
Snapshot isolation between writer & queries
Savepoints for data recovery
Manages file sizes, layout using statistics
Async compaction of row & columnar data
Timeline metadata to track lineage

Hudi provides the ability to query via three types of views:

Read Optimized View - Provides excellent snapshot query performance via purely columnar storage (e.g. Parquet)
Incremental View - Provides a change stream with records inserted or updated after a point in time.
Real-time View - Provides snapshot queries on real-time data, using a combination of columnar & row-based storage (e.g Parquet + Avro)

Learn more about Hudi at https://hudi.apache.org

Building Apache Hudi from source

Hudi requires Java 8 to be installed on a *nix system. Check out code and normally build the maven project, from command line:

# Checkout code and build
git clone https://github.com/apache/incubator-hudi.git && cd incubator-hudi
mvn clean package -DskipTests -DskipITs

Quickstart

Try https://hudi.apache.org/quickstart.html to quickly explore Hudi's capabilities using spark-shell.

Languages

Java 81.4%

Scala 16.7%

ANTLR 0.9%

Shell 0.8%

Dockerfile 0.2%