Go to file

Vinoth Chandar 85dd265b7b Improving out of box experience for data source

- Fixes #246
 - Bump up default parallelism to 1500, to handle large upserts
 - Add docs on s3 confuration & tuning tips with tested spark knobs
 - Fix bug to not duplicate hoodie metadata fields when input dataframe is another hoodie dataset
 - Improve speed of ROTablePathFilter by removing directory check
 - Move to spark-avro 4.0 to handle issue with nested fields with same name
 - Keep AvroConversionUtils in sync with spark-avro 4.0

2018-06-10 19:16:44 -07:00

deploy

Add ossrh profile to publish maven artifacts to oss.sonatype.org (synced with maven central)

2016-12-21 14:17:35 -08:00

docs

Improving out of box experience for data source

2018-06-10 19:16:44 -07:00

hoodie-cli

Support union mode in HoodieRealtimeRecordReader for pure insert workloads

2018-05-10 17:39:56 -07:00

hoodie-client

Improving out of box experience for data source

2018-06-10 19:16:44 -07:00

hoodie-common

Improving out of box experience for data source

2018-06-10 19:16:44 -07:00

hoodie-hadoop-mr

Improving out of box experience for data source

2018-06-10 19:16:44 -07:00

hoodie-hive

CodeStyle formatting to conform to basic Checkstyle rules.

2018-03-30 11:09:40 -07:00

hoodie-spark

Improving out of box experience for data source

2018-06-10 19:16:44 -07:00

hoodie-utilities

CodeStyle formatting to conform to basic Checkstyle rules.

2018-03-30 11:09:40 -07:00

style

CodeStyle formatting to conform to basic Checkstyle rules.

2018-03-30 11:09:40 -07:00

_config.yml

Set theme jekyll-theme-minimal

2016-12-29 16:53:39 -08:00

.gitignore

Importing Hoodie Client from internal repo

2016-12-16 14:34:42 -08:00

.travis.yml

Update java version to 8 in travis.yml

2017-05-17 13:43:11 -07:00

CHANGELOG.md

Added CHANGELOG.md and updated community contributions guideline

2017-06-16 10:48:37 -07:00

LICENSE.txt

Importing Hoodie Client from internal repo

2016-12-16 14:34:42 -08:00

pom.xml

CodeStyle formatting to conform to basic Checkstyle rules.

2018-03-30 11:09:40 -07:00

README.md

Update README.md

2017-12-10 07:50:37 -08:00

RELEASE_NOTES.md

Update release notes for 0.4.1 (post)

2018-04-02 09:31:01 -07:00

README.md

Hudi

Hudi (pronounced Hoodie) stands for Hadoop Upserts anD Incrementals. Hudi manages storage of large analytical datasets on HDFS and serve them out via two types of tables

Read Optimized Table - Provides excellent query performance via purely columnar storage (e.g. Parquet)
Near-Real time Table (WIP) - Provides queries on real-time data, using a combination of columnar & row based storage (e.g Parquet + Avro)

For more, head over here

Languages

Java 81.4%

Scala 16.7%

ANTLR 0.9%

Shell 0.8%

Dockerfile 0.2%