69d3950a325a9b3cb22b6948abba1b22145783cf
* Add analytics to site * Fix ugly favicon * New & Improved HoodieDeltaStreamer - Can incrementally consume from HDFS or Kafka, with exactly-once semantics! - Supports Json/Avro data, Source can also do custom things - Source is totally pluggable, via reflection - Key generation is pluggable, currently added SimpleKeyGenerator - Schema provider is pluggable, currently Filebased schemas - Configurable field to break ties during preCombine - Finally, can also plugin the HoodieRecordPayload, to get other merge types than overwriting - Handles efficient avro serialization in Spark Pending : - Rewriting of HiveIncrPullSource - Hive sync via hoodie-hive - Cleanup & tests * Minor fixes from master rebase * Implementation of HiveIncrPullSource - Copies commit by commit from source to target * Adding TimestampBasedKeyGenerator - Supports unix time & date strings
Hoodie manages storage of large analytical datasets on HDFS and serve them out via two types of tables
- Read Optimized Table - Provides excellent query performance via purely columnar storage (e.g. Parquet)
- Near-Real time Table (WIP) - Provides queries on real-time data, using a combination of columnar & row based storage (e.g Parquet + Avro)
For more, head over here
Description
Languages
Java
81.4%
Scala
16.7%
ANTLR
0.9%
Shell
0.8%
Dockerfile
0.2%