1
0
Files
hudi/docs/code_and_design.md
Nishith Agarwal 5405a6287b Introducing HoodieLogFormat V2 with versioning support
- HoodieLogFormat V2 has support for LogFormat evolution through versioning
			- LogVersion is associated with a LogBlock not a LogFile
			- Based on a version for a LogBlock, approporiate code path is executed
		- Implemented LazyReading of Hoodie Log Blocks with Memory / IO tradeoff
		- Implemented Reverse pointer to be able to traverse the log in reverse
		- Introduce new MAGIC for backwards compatibility with logs without versions
2018-03-06 21:14:11 -08:00

39 lines
1.6 KiB
Markdown

---
title: Code Structure
keywords: usecases
sidebar: mydoc_sidebar
permalink: code_and_design.html
---
## Code & Project Structure
* hoodie-client : Spark client library to take a bunch of inserts + updates and apply them to a Hoodie table
* hoodie-common : Common code shared between different artifacts of Hoodie
## HoodieLogFormat
The following diagram depicts the LogFormat for Hoodie MergeOnRead. Each logfile consists of one or more log blocks.
Each logblock follows the format shown below.
| Field | Description |
|-------------- |------------------|
| MAGIC | A magic header that marks the start of a block |
| VERSION | The version of the LogFormat, this helps define how to switch between different log format as it evolves |
| TYPE | The type of the log block |
| HEADER LENGTH | The length of the headers, 0 if no headers |
| HEADER | Metadata needed for a log block. For eg. INSTANT_TIME, TARGET_INSTANT_TIME, SCHEMA etc. |
| CONTENT LENGTH | The length of the content of the log block |
| CONTENT | The content of the log block, for example, for a DATA_BLOCK, the content is (number of records + actual records) in byte [] |
| FOOTER LENGTH | The length of the footers, 0 if no footers |
| FOOTER | Metadata needed for a log block. For eg. index entries, a bloom filter for records in a DATA_BLOCK etc. |
| LOGBLOCK LENGTH | The total number of bytes written for a log block, typically the SUM(everything_above). This is a LONG. This acts as a reverse pointer to be able to traverse the log in reverse.|
{% include image.html file="hoodie_log_format_v2.png" alt="hoodie_log_format_v2.png" %}