docs: 补充同步和压缩模块文档
This commit is contained in:
79
README.md
79
README.md
@@ -61,6 +61,8 @@
|
||||
* [开发](#开发)
|
||||
* [Hudi 运行代码](#hudi-运行代码)
|
||||
* [sync](#sync)
|
||||
* [Synchronizer](#synchronizer)
|
||||
* [Compaction](#compaction)
|
||||
* [运维服务](#运维服务)
|
||||
* [部署工具](#部署工具)
|
||||
<!-- TOC -->
|
||||
@@ -935,7 +937,82 @@ query <-- yarn2: 集群信息
|
||||
|
||||
### sync
|
||||
|
||||
sync 模块包含 Hudi 运行的全部业务逻辑,Hudi on flink 的运行模式为
|
||||
sync模块包含hudi运行的全部业务逻辑。
|
||||
|
||||
项目采用flink作为hudi的运行平台。flink的运行模式为`flink on yarn application`,这个模式下每次调度服务向yarn提交任务都会在yarn上启动一个单独的flink集群(一个flink集群由一个jobmanager和多个taskmanager组成),再在这个独立的flink集群中运行sync模块。
|
||||
|
||||
```plantuml
|
||||
@startuml
|
||||
'skinparam Linetype ortho
|
||||
'skinparam Dpi 500
|
||||
|
||||
title sync模块和相关模块的关系
|
||||
|
||||
database HDFS as hdfs {
|
||||
component sync模块 as s1
|
||||
}
|
||||
|
||||
rectangle Hudi服务群 as hudi_services {
|
||||
package 调度服务 as scheduler {
|
||||
component flink运行依赖 as f1
|
||||
component hudi运行依赖 as h1
|
||||
|
||||
f1 .[hidden] h1
|
||||
}
|
||||
}
|
||||
|
||||
cloud yarn集群 as yarn {
|
||||
package Hudi任务 as hudi_task {
|
||||
component flink运行依赖 as f2
|
||||
component hudi运行依赖 as h2
|
||||
component sync模块 as s2
|
||||
|
||||
f2 .[hidden] h2
|
||||
h2 .[hidden] s2
|
||||
}
|
||||
}
|
||||
|
||||
f1 .. f2:运行依赖提交到集群
|
||||
h1 .. h2:运行依赖提交到集群
|
||||
s1 .. s2:从hdfs将sync模块下载到任务中
|
||||
@enduml
|
||||
```
|
||||
|
||||
### Synchronizer
|
||||
|
||||
数据同步模块,用于同步数据的业务代码,`com.lanyuanxiaoyao.service.sync.Synchronizer#main`包含所有的业务流程。
|
||||
|
||||
```plantuml
|
||||
@startuml
|
||||
title 主要逻辑业务流程
|
||||
start
|
||||
:接收传入参数;
|
||||
if (注册zk锁节点) is (成功) then
|
||||
:构造flink基本参数;
|
||||
:判断同步任务类型,构造不同的hudi参数;
|
||||
:启动flink任务;
|
||||
else (失败)
|
||||
stop
|
||||
endif
|
||||
partition flink任务流程 {
|
||||
repeat :接收pulsar消息;
|
||||
:转换pulsar json消息为Record对象;
|
||||
if (是否为跨天消息) is (是) then
|
||||
else (不是)
|
||||
:记录操作时间;
|
||||
endif
|
||||
repeat while (消息是否合法) is (否) not (是)
|
||||
:构造Hudi Schema;
|
||||
:Record对象转换为flink row;
|
||||
|
||||
}
|
||||
stop
|
||||
@enduml
|
||||
```
|
||||
|
||||
### Compaction
|
||||
|
||||
数据压缩模块,用于压缩数据的业务代码,`com.lanyuanxiaoyao.service.sync.Compaction#main`包含所有的业务流程。
|
||||
|
||||
## 运维服务
|
||||
|
||||
|
||||
Reference in New Issue
Block a user