Introducing HoodieLogFormat V2 with versioning support
- HoodieLogFormat V2 has support for LogFormat evolution through versioning - LogVersion is associated with a LogBlock not a LogFile - Based on a version for a LogBlock, approporiate code path is executed - Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff - Implemented Reverse pointer to be able to traverse the log in reverse - Introduce new MAGIC for backwards compatibility with logs without versions
This commit is contained in:
committed by
vinoth chandar
parent
dfd1979c51
commit
5405a6287b
38
docs/code_and_design.md
Normal file
38
docs/code_and_design.md
Normal file
@@ -0,0 +1,38 @@
|
||||
---
|
||||
title: Code Structure
|
||||
keywords: usecases
|
||||
sidebar: mydoc_sidebar
|
||||
permalink: code_and_design.html
|
||||
---
|
||||
|
||||
## Code & Project Structure
|
||||
|
||||
* hoodie-client : Spark client library to take a bunch of inserts + updates and apply them to a Hoodie table
|
||||
* hoodie-common : Common code shared between different artifacts of Hoodie
|
||||
|
||||
## HoodieLogFormat
|
||||
|
||||
The following diagram depicts the LogFormat for Hoodie MergeOnRead. Each logfile consists of one or more log blocks.
|
||||
Each logblock follows the format shown below.
|
||||
|
||||
| Field | Description |
|
||||
|-------------- |------------------|
|
||||
| MAGIC | A magic header that marks the start of a block |
|
||||
| VERSION | The version of the LogFormat, this helps define how to switch between different log format as it evolves |
|
||||
| TYPE | The type of the log block |
|
||||
| HEADER LENGTH | The length of the headers, 0 if no headers |
|
||||
| HEADER | Metadata needed for a log block. For eg. INSTANT_TIME, TARGET_INSTANT_TIME, SCHEMA etc. |
|
||||
| CONTENT LENGTH | The length of the content of the log block |
|
||||
| CONTENT | The content of the log block, for example, for a DATA_BLOCK, the content is (number of records + actual records) in byte [] |
|
||||
| FOOTER LENGTH | The length of the footers, 0 if no footers |
|
||||
| FOOTER | Metadata needed for a log block. For eg. index entries, a bloom filter for records in a DATA_BLOCK etc. |
|
||||
| LOGBLOCK LENGTH | The total number of bytes written for a log block, typically the SUM(everything_above). This is a LONG. This acts as a reverse pointer to be able to traverse the log in reverse.|
|
||||
|
||||
|
||||
{% include image.html file="hoodie_log_format_v2.png" alt="hoodie_log_format_v2.png" %}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user