Go to file

Balaji Varadarajan 3a0044216c New Features in DeltaStreamer :

(1) Apply transformation when using delta-streamer to ingest data.
 (2) Add Hudi Incremental Source for Delta Streamer
 (3) Allow delta-streamer config-property to be passed as command-line
 (4) Add Hive Integration to Delta-Streamer and address Review comments
 (5) Ensure MultiPartKeysValueExtractor  handle hive style partition description
 (6) Reuse same spark session on both source and transformer
 (7) Support extracting partition fields from _hoodie_partition_path for HoodieIncrSource
 (8) Reuse Binary Avro coders
 (9) Add push down filter for Incremental source
 (10) Add Hoodie DeltaStreamer metrics to track total time taken

2019-02-11 18:22:05 -08:00

deploy

Add ossrh profile to publish maven artifacts to oss.sonatype.org (synced with maven central)

2016-12-21 14:17:35 -08:00

docker

Docker Container Build and Run setup with foundations for adding docker integration tests. Docker images built with Hadoop 2.8.4 Hive 2.3.3 and Spark 2.3.1 and published to docker-hub

2018-10-02 09:28:21 +05:30

docs

New Features in DeltaStreamer :

2019-02-11 18:22:05 -08:00

hoodie-cli

New Features in DeltaStreamer :

2019-02-11 18:22:05 -08:00

hoodie-client

New Features in DeltaStreamer :

2019-02-11 18:22:05 -08:00

hoodie-common

New Features in DeltaStreamer :

2019-02-11 18:22:05 -08:00

hoodie-hadoop-mr

New Features in DeltaStreamer :

2019-02-11 18:22:05 -08:00

hoodie-hive

New Features in DeltaStreamer :

2019-02-11 18:22:05 -08:00

hoodie-integ-test

Ensure Hoodie works for non-partitioned Hive table

2018-12-12 13:35:16 -08:00

hoodie-spark

New Features in DeltaStreamer :

2019-02-11 18:22:05 -08:00

hoodie-utilities

New Features in DeltaStreamer :

2019-02-11 18:22:05 -08:00

packaging

New Features in DeltaStreamer :

2019-02-11 18:22:05 -08:00

style

General enhancements

2018-12-18 12:52:39 -08:00

_config.yml

Set theme jekyll-theme-minimal

2016-12-29 16:53:39 -08:00

.gitignore

Importing Hoodie Client from internal repo

2016-12-16 14:34:42 -08:00

.travis.yml

Add m2 directory to Travis cache

2018-12-31 10:31:12 -08:00

CHANGELOG.md

Added CHANGELOG.md and updated community contributions guideline

2017-06-16 10:48:37 -07:00

LICENSE.txt

Importing Hoodie Client from internal repo

2016-12-16 14:34:42 -08:00

pom.xml

New Features in DeltaStreamer :

2019-02-11 18:22:05 -08:00

README.md

Update README.md

2017-12-10 07:50:37 -08:00

RELEASE_NOTES.md

Update RELEASE_NOTES for release 0.4.4

2018-09-28 11:05:24 +05:30

README.md

Hudi

Hudi (pronounced Hoodie) stands for Hadoop Upserts anD Incrementals. Hudi manages storage of large analytical datasets on HDFS and serve them out via two types of tables

Read Optimized Table - Provides excellent query performance via purely columnar storage (e.g. Parquet)
Near-Real time Table (WIP) - Provides queries on real-time data, using a combination of columnar & row based storage (e.g Parquet + Avro)

For more, head over here

Languages

Java 81.4%

Scala 16.7%

ANTLR 0.9%

Shell 0.8%

Dockerfile 0.2%