Go to file

Vinoth Chandar 86209640f7 Adding range based pruning to bloom index

- keys compared lexicographically using String::compareTo
 - Range metadata additionally written into parquet file footers
 - Trim fat & few optimizations to speed up indexing
 - Add param to control whether input shall be cached, to speed up lookup
 - Add param to turn on/off range pruning
 - Auto compute of parallelism now simply factors in amount of comparisons done
 - More accurate parallelism computation when range pruning is on
 - tests added & hardened, docs updated

2017-08-04 13:22:13 -07:00

deploy

Add ossrh profile to publish maven artifacts to oss.sonatype.org (synced with maven central)

2016-12-21 14:17:35 -08:00

docs

Adding range based pruning to bloom index

2017-08-04 13:22:13 -07:00

hoodie-cli

Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR)

2017-06-30 11:41:23 -07:00

hoodie-client

Adding range based pruning to bloom index

2017-08-04 13:22:13 -07:00

hoodie-common

Adding range based pruning to bloom index

2017-08-04 13:22:13 -07:00

hoodie-hadoop-mr

Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR)

2017-06-30 11:41:23 -07:00

hoodie-hive

1. Use HoodieLogFormat to archive commits and other actions 2. Introduced avro schema for commits and compactions and an avro wrapper schema

2017-07-26 14:27:44 -07:00

hoodie-utilities

Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR)

2017-06-30 11:41:23 -07:00

_config.yml

Set theme jekyll-theme-minimal

2016-12-29 16:53:39 -08:00

.gitignore

Importing Hoodie Client from internal repo

2016-12-16 14:34:42 -08:00

.travis.yml

Update java version to 8 in travis.yml

2017-05-17 13:43:11 -07:00

CHANGELOG.md

Added CHANGELOG.md and updated community contributions guideline

2017-06-16 10:48:37 -07:00

LICENSE.txt

Importing Hoodie Client from internal repo

2016-12-16 14:34:42 -08:00

pom.xml

Pushing master to 0.4.0 as we continue to make minor releases over 0.3.8 (MVP for MOR)

2017-06-30 11:41:23 -07:00

README.md

Hoodie operability with S3

2017-03-28 05:08:54 -07:00

README.md

Hoodie

Hoodie manages storage of large analytical datasets on HDFS and serve them out via two types of tables

Read Optimized Table - Provides excellent query performance via purely columnar storage (e.g. Parquet)
Near-Real time Table (WIP) - Provides queries on real-time data, using a combination of columnar & row based storage (e.g Parquet + Avro)

For more, head over here

Languages

Java 81.4%

Scala 16.7%

ANTLR 0.9%

Shell 0.8%

Dockerfile 0.2%