Go to file

Vinoth Chandar f1410bfdcd Fixes HUDI-38: Reduce memory overhead of WriteStatus

- For implicit indexes (e.g BloomIndex), don't buffer up written records
 - By default, only collect 10% of failing records to avoid OOMs
 - Improves debuggability via above, since data errors can now show up in collect()
 - Unit tests & fixing subclasses & adjusting tests

2019-03-28 10:32:59 -07:00

deploy

Add ossrh profile to publish maven artifacts to oss.sonatype.org (synced with maven central)

2016-12-21 14:17:35 -08:00

docker

add a script that shuts down demo cluster gracefully

2019-03-19 11:01:06 -07:00

hoodie-cli

FileSystem View must treat same fileIds present in different partitions as different file-groups and handle pending compaction correctly

2019-03-01 10:49:04 -08:00

hoodie-client

Fixes HUDI-38: Reduce memory overhead of WriteStatus

2019-03-28 10:32:59 -07:00

hoodie-common

Fixing source schema and writer schema distinction in payloads

2019-03-26 19:44:27 -07:00

hoodie-hadoop-mr

[maven-release-plugin] prepare for next development iteration

2019-02-27 07:16:27 -08:00

hoodie-hive

run_hive_sync tool must be able to handle case where there are multiple standalone jdbc jars in hive installation dir

2019-03-21 09:58:20 -07:00

hoodie-integ-test

[maven-release-plugin] prepare for next development iteration

2019-02-27 07:16:27 -08:00

hoodie-spark

Fixed HUDI-87 : Remove schemastr from BaseAvroPayload

2019-03-27 23:03:25 -07:00

hoodie-utilities

Replacing Apache commons-lang3 object serializer with Kryo serializer

2019-03-18 14:12:25 -07:00

packaging

Replacing Apache commons-lang3 object serializer with Kryo serializer

2019-03-18 14:12:25 -07:00

style

General enhancements

2018-12-18 12:52:39 -08:00

_config.yml

Set theme jekyll-theme-minimal

2016-12-29 16:53:39 -08:00

.gitignore

Importing Hoodie Client from internal repo

2016-12-16 14:34:42 -08:00

.travis.yml

Add m2 directory to Travis cache

2018-12-31 10:31:12 -08:00

CHANGELOG.md

Added CHANGELOG.md and updated community contributions guideline

2017-06-16 10:48:37 -07:00

KEYS

HUDI-75: Add KEYS

2019-03-18 07:46:25 -07:00

LICENSE.txt

Importing Hoodie Client from internal repo

2016-12-16 14:34:42 -08:00

pom.xml

Fix hive sync (libfb version mismatch) and deltastreamer issue (missing cmdline argument) in demo

2019-03-13 16:14:32 -07:00

README.md

Update site url in README

2019-02-15 21:28:39 -08:00

RELEASE_NOTES.md

Update RELEASE_NOTES for 0.4.5

2019-02-27 06:47:56 -08:00

README.md

Hudi

Hudi (pronounced Hoodie) stands for Hadoop Upserts anD Incrementals. Hudi manages storage of large analytical datasets on HDFS and serve them out via two types of tables

Read Optimized Table - Provides excellent query performance via purely columnar storage (e.g. Parquet)
Near-Real time Table (WIP) - Provides queries on real-time data, using a combination of columnar & row based storage (e.g Parquet + Avro)

For more, head over here

Languages

Java 81.4%

Scala 16.7%

ANTLR 0.9%

Shell 0.8%

Dockerfile 0.2%