feat: 添加多平台依赖支持
为不同平台提供特定的依赖 extras,解决 macOS x86_64 的依赖兼容性问题。 - 添加平台特定的 PDF 解析 extras:pdf-win, pdf-macos-intel, pdf-macos-arm, pdf-linux - 添加平台特定的 Office 文档 extras:office-win, office-macos-intel, office-macos-arm, office-linux - macOS x86_64 使用硬编码版本:docling==2.40.0, docling-parse==4.0.0 - 移除通用的 pdf 和 office extras,强制用户选择平台 - 更新 SKILL.md 添加详细的多平台依赖安装指南 - 更新 README.md 添加平台特定安装说明 - 在 .gitignore 中添加 uv.lock - 删除现有的 uv.lock 文件 - 创建 multi-platform-dependencies 规范文档
This commit is contained in:
3
.gitignore
vendored
3
.gitignore
vendored
@@ -174,6 +174,9 @@ ipython_config.py
|
||||
# pipenv
|
||||
Pipfile.lock
|
||||
|
||||
# uv
|
||||
uv.lock
|
||||
|
||||
# PEP 582
|
||||
__pypackages__/
|
||||
|
||||
|
||||
13
README.md
13
README.md
@@ -6,7 +6,11 @@
|
||||
|
||||
- 使用 uv 管理依赖,禁用主机 Python
|
||||
- 依赖声明:pyproject.toml
|
||||
- 安装:uv sync
|
||||
- 安装:根据平台选择对应的 extras(详见 SKILL.md)
|
||||
- macOS x86_64 (Intel): `uv pip install -e ".[pdf-macos-intel]"`
|
||||
- macOS arm64 (Apple Silicon): `uv pip install -e ".[pdf-macos-arm]"`
|
||||
- Windows: `uv pip install -e ".[pdf-win]"`
|
||||
- Linux: `uv pip install -e ".[pdf-linux]"`
|
||||
|
||||
## 项目结构
|
||||
|
||||
@@ -57,9 +61,12 @@ uv run mypy .
|
||||
- 编码测试(GBK、UTF-8 BOM 等)
|
||||
- 一致性测试(验证不同 Reader 解析结果的一致性)
|
||||
|
||||
运行测试前确保已安装所有依赖:
|
||||
运行测试前确保已安装所有依赖(根据你的平台选择对应的 extras):
|
||||
```bash
|
||||
uv sync
|
||||
# macOS x86_64 (Intel) 示例
|
||||
uv pip install -e ".[office-macos-intel]"
|
||||
|
||||
# 其他平台请参考 SKILL.md 的"多平台依赖安装指南"
|
||||
```
|
||||
|
||||
## 代码规范
|
||||
|
||||
83
SKILL.md
83
SKILL.md
@@ -5,7 +5,7 @@ license: MIT
|
||||
metadata:
|
||||
version: "1.0"
|
||||
author: lyxy
|
||||
compatibility: Requires Python 3.11+. 优先使用 lyxy-runner-python skill 执行(自动管理依赖)。回退到主机 Python 时需手动安装依赖:DOCX(docling unstructured markitdown pypandoc-binary python-docx markdownify chardet) / XLSX(docling unstructured markitdown pandas tabulate chardet) / PPTX(docling unstructured markitdown python-pptx markdownify chardet) / PDF(docling unstructured unstructured-paddleocr markitdown pypdf markdownify chardet) / HTML(trafilatura domscribe markitdown html2text beautifulsoup4 httpx chardet) / HTTP增强(pyppeteer selenium)
|
||||
compatibility: Requires Python 3.11+. 优先使用 lyxy-runner-python skill 执行(自动管理依赖)。回退到主机 Python 时需根据平台手动安装依赖:Windows(pdf-win/office-win) / macOS Intel(pdf-macos-intel/office-macos-intel,需Python 3.12) / macOS ARM(pdf-macos-arm/office-macos-arm) / Linux(pdf-linux/office-linux)。详见"多平台依赖安装指南"章节。
|
||||
---
|
||||
|
||||
# 统一文档解析 Skill
|
||||
@@ -117,6 +117,87 @@ python scripts/lyxy_document_reader.py document.docx -s "\d{4}-\d{2}-\d{2}"
|
||||
python scripts/lyxy_document_reader.py document.docx -s "关键词" -n 5
|
||||
```
|
||||
|
||||
### 多平台依赖安装指南
|
||||
|
||||
**重要说明**:本项目为不同平台提供特定的依赖配置,请根据你的平台选择对应的 extra。
|
||||
|
||||
#### 平台检测
|
||||
|
||||
在使用前,请先检测你的平台:
|
||||
|
||||
```bash
|
||||
# macOS / Linux
|
||||
uname -m # 显示架构: x86_64 或 arm64
|
||||
uname -s # 显示系统: Darwin 或 Linux
|
||||
|
||||
# Windows PowerShell
|
||||
$env:OS # 或检查环境变量
|
||||
|
||||
# Python 跨平台检测
|
||||
python -c "import platform; print(f'{platform.system()}-{platform.machine()}')"
|
||||
```
|
||||
|
||||
#### PDF 解析依赖
|
||||
|
||||
根据你的平台选择对应的安装命令:
|
||||
|
||||
**Windows x86_64**
|
||||
```bash
|
||||
uv run --with "lyxy-document[pdf-win]" scripts/lyxy_document_reader.py file.pdf
|
||||
```
|
||||
- 依赖:docling, unstructured, PaddleOCR
|
||||
- Python:>=3.11
|
||||
- 特殊说明:无
|
||||
|
||||
**macOS x86_64 (Intel)**
|
||||
⚠️ **特殊平台**:需要特定版本配置
|
||||
```bash
|
||||
uv run --python 3.12 --with "lyxy-document[pdf-macos-intel]" scripts/lyxy_document_reader.py file.pdf
|
||||
```
|
||||
- 依赖:docling==2.40.0, docling-parse==4.0.0, numpy<2
|
||||
- Python:**必须 3.12**
|
||||
- 特殊说明:
|
||||
- `docling-parse` 5.x 无 x86_64 wheel,必须使用 4.0.0
|
||||
- `easyocr`(docling 的 OCR 后端)与 NumPy 2.x 不兼容
|
||||
|
||||
**macOS arm64 (Apple Silicon)**
|
||||
```bash
|
||||
uv run --with "lyxy-document[pdf-macos-arm]" scripts/lyxy_document_reader.py file.pdf
|
||||
```
|
||||
- 依赖:docling, unstructured
|
||||
- Python:>=3.11
|
||||
- 特殊说明:无
|
||||
|
||||
**Linux**
|
||||
```bash
|
||||
uv run --with "lyxy-document[pdf-linux]" scripts/lyxy_document_reader.py file.pdf
|
||||
```
|
||||
- 依赖:docling, unstructured
|
||||
- Python:>=3.11
|
||||
- 特殊说明:无
|
||||
|
||||
#### Office 文档依赖
|
||||
|
||||
**Windows x86_64**
|
||||
```bash
|
||||
uv run --with "lyxy-document[office-win]" scripts/lyxy_document_reader.py file.docx
|
||||
```
|
||||
|
||||
**macOS x86_64 (Intel)**
|
||||
```bash
|
||||
uv run --python 3.12 --with "lyxy-document[office-macos-intel]" scripts/lyxy_document_reader.py file.docx
|
||||
```
|
||||
|
||||
**macOS arm64 (Apple Silicon)**
|
||||
```bash
|
||||
uv run --with "lyxy-document[office-macos-arm]" scripts/lyxy_document_reader.py file.docx
|
||||
```
|
||||
|
||||
**Linux**
|
||||
```bash
|
||||
uv run --with "lyxy-document[office-linux]" scripts/lyxy_document_reader.py file.docx
|
||||
```
|
||||
|
||||
### 主机 Python 环境依赖安装
|
||||
|
||||
当 lyxy-runner-python 不可用时,需要根据文档类型手动安装依赖:
|
||||
|
||||
@@ -3,11 +3,11 @@ schema: spec-driven
|
||||
context: |
|
||||
# 项目规范
|
||||
- 语言: 仅中文(交流/注释/文档/代码)
|
||||
- Python: 始终用uv运行(脚本/临时命令uv run python -c); 禁用主机python/禁主机安装包
|
||||
- Python: 当前项目始终用uv运行(脚本/临时命令uv run python -c); 禁用主机python/禁主机安装包
|
||||
- 依赖: pyproject.toml声明,使用uv安装
|
||||
- 主机环境: 禁止污染配置,需操作须请求用户
|
||||
- 开发文档: README.md,每次迭代按需更新开发文档; 禁emoji/特殊字符
|
||||
- skill文档: SKILL.md,每次迭代按需更新skill文档
|
||||
- skill文档: SKILL.md,每次迭代按需更新skill文档(面向AI且需按无uv环境的前提编写)
|
||||
- 测试: 所有需求必须设计全面测试
|
||||
- 任务: 禁止创建git变更任务(push/commit等); git读取允许(status/log/diff等)
|
||||
- 代码: 模块文件150-300行; 错误需自定义异常+清晰信息+位置上下文
|
||||
|
||||
86
openspec/specs/multi-platform-dependencies/spec.md
Normal file
86
openspec/specs/multi-platform-dependencies/spec.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# 多平台依赖管理
|
||||
|
||||
## Purpose
|
||||
|
||||
为不同平台提供特定的依赖配置,解决平台特定的依赖兼容性问题(如 macOS x86_64 的 docling-parse 版本限制)。通过强制用户选择平台特定的 extras,确保依赖在不同平台上都能正常安装和运行。
|
||||
|
||||
## Requirements
|
||||
|
||||
### Requirement: 平台特定的依赖配置
|
||||
系统必须在 `pyproject.toml` 中为不同平台提供特定的依赖 extras。
|
||||
|
||||
#### Scenario: PDF 解析的平台特定 extras
|
||||
- **WHEN** 用户查看 `pyproject.toml` 中的 `[project.optional-dependencies]` 配置
|
||||
- **THEN** 系统必须提供以下 PDF 解析 extras:
|
||||
- `pdf-win`: Windows x86_64 平台的 PDF 解析依赖
|
||||
- `pdf-macos-intel`: macOS x86_64 (Intel) 平台的 PDF 解析依赖
|
||||
- `pdf-macos-arm`: macOS arm64 (Apple Silicon) 平台的 PDF 解析依赖
|
||||
- `pdf-linux`: Linux 平台的 PDF 解析依赖
|
||||
|
||||
#### Scenario: macOS x86_64 的特殊版本约束
|
||||
- **WHEN** 用户安装 `pdf-macos-intel` extra
|
||||
- **THEN** 系统必须使用以下硬编码版本:
|
||||
- `docling==2.40.0`
|
||||
- `docling-parse==4.0.0`
|
||||
|
||||
#### Scenario: Office 文档的平台特定 extras
|
||||
- **WHEN** 用户查看 `pyproject.toml` 中的 Office 文档 extras
|
||||
- **THEN** 系统必须提供以下组合 extras:
|
||||
- `office-win`: Windows x86_64 平台的完整 Office 文档依赖
|
||||
- `office-macos-intel`: macOS x86_64 (Intel) 平台的完整 Office 文档依赖
|
||||
- `office-macos-arm`: macOS arm64 (Apple Silicon) 平台的完整 Office 文档依赖
|
||||
- `office-linux`: Linux 平台的完整 Office 文档依赖
|
||||
|
||||
### Requirement: 移除通用平台 extras
|
||||
系统必须移除通用的平台无关 extras,强制用户明确选择平台。
|
||||
|
||||
#### Scenario: PDF extra 不存在
|
||||
- **WHEN** 用户尝试安装 `lyxy-document[pdf]` extra
|
||||
- **THEN** 系统必须报错或提示用户选择平台特定的 extra
|
||||
|
||||
#### Scenario: Office extra 不存在
|
||||
- **WHEN** 用户尝试安装 `lyxy-document[office]` extra
|
||||
- **THEN** 系统必须报错或提示用户选择平台特定的 extra
|
||||
|
||||
### Requirement: 平台检测文档
|
||||
系统必须在 `SKILL.md` 中提供平台检测方法和平台特定的安装指南。
|
||||
|
||||
#### Scenario: 平台检测命令
|
||||
- **WHEN** 用户阅读 `SKILL.md` 中的多平台依赖安装指南
|
||||
- **THEN** 系统必须提供以下平台的检测命令:
|
||||
- macOS / Linux: `uname -m` 和 `uname -s`
|
||||
- Windows: PowerShell 环境变量检测
|
||||
- Python 跨平台检测: `import platform; print(f'{platform.system()}-{platform.machine()}')`
|
||||
|
||||
#### Scenario: macOS x86_64 特殊说明
|
||||
- **WHEN** 用户在 macOS x86_64 平台阅读 PDF 解析依赖的安装说明
|
||||
- **THEN** 系统必须明确说明以下特殊要求:
|
||||
- 必须使用 Python 3.12
|
||||
- `docling-parse` 5.x 无 x86_64 wheel,必须使用 4.0.0
|
||||
|
||||
#### Scenario: 每个平台的安装命令
|
||||
- **WHEN** 用户阅读 `SKILL.md`
|
||||
- **THEN** 系统必须为每个平台(Windows/macOS Intel/macOS ARM/Linux)提供清晰的 `uv run` 命令示例
|
||||
|
||||
### Requirement: Lock 文件管理
|
||||
系统必须忽略 `uv.lock` 文件,不将其提交到版本控制。
|
||||
|
||||
#### Scenario: gitignore 配置
|
||||
- **WHEN** 用户查看项目的 `.gitignore` 文件
|
||||
- **THEN** 系统必须在文件中包含 `uv.lock` 条目
|
||||
|
||||
#### Scenario: 依赖安装灵活性
|
||||
- **WHEN** 用户使用 `uv run --with` 安装依赖
|
||||
- **THEN** 系统必须能够根据当前平台动态解析依赖,而不依赖预先锁定的 lock 文件
|
||||
|
||||
### Requirement: 依赖重复但清晰
|
||||
系统允许在多个平台 extras 中重复声明相同的依赖,以保持清晰和简单。
|
||||
|
||||
#### Scenario: 重复声明基础依赖
|
||||
- **WHEN** 用户查看不同的平台 extras
|
||||
- **THEN** 系统可以在每个 extra 中重复声明基础依赖(如 `markitdown[pdf]`、`pypdf`、`markdownify`)
|
||||
- **AND** 这些重复声明必须版本一致
|
||||
|
||||
#### Scenario: 维护简单性优先
|
||||
- **WHEN** 开发者需要修改依赖版本
|
||||
- **THEN** 系统优先选择简单清晰的重复声明,而不是复杂的依赖引用或约束文件
|
||||
@@ -9,36 +9,104 @@ dependencies = [
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
docx = [
|
||||
# 平台特定的 DOCX 解析 extras
|
||||
docx-win = [
|
||||
"docling>=2.0.0",
|
||||
"unstructured>=0.12.0",
|
||||
"markitdown>=0.1.0",
|
||||
"unstructured[docx]>=0.12.0",
|
||||
"markitdown[docx]>=0.1.0",
|
||||
"pypandoc-binary>=1.13.0",
|
||||
"python-docx>=1.1.0",
|
||||
"markdownify>=0.12.0",
|
||||
]
|
||||
xlsx = [
|
||||
docx-unix = [
|
||||
"docling>=2.0.0",
|
||||
"unstructured>=0.12.0",
|
||||
"markitdown>=0.1.0",
|
||||
"unstructured[docx]>=0.12.0",
|
||||
"markitdown[docx]>=0.1.0",
|
||||
"pypandoc-binary>=1.13.0",
|
||||
"python-docx>=1.1.0",
|
||||
"markdownify>=0.12.0",
|
||||
]
|
||||
|
||||
# 平台特定的 XLSX 解析 extras
|
||||
xlsx-win = [
|
||||
"docling>=2.0.0",
|
||||
"unstructured[xlsx]>=0.12.0",
|
||||
"markitdown[xlsx]>=0.1.0",
|
||||
"pandas>=2.0.0",
|
||||
"tabulate>=0.9.0",
|
||||
]
|
||||
pptx = [
|
||||
xlsx-unix = [
|
||||
"docling>=2.0.0",
|
||||
"unstructured>=0.12.0",
|
||||
"markitdown>=0.1.0",
|
||||
"unstructured[xlsx]>=0.12.0",
|
||||
"markitdown[xlsx]>=0.1.0",
|
||||
"pandas>=2.0.0",
|
||||
"tabulate>=0.9.0",
|
||||
]
|
||||
|
||||
# 平台特定的 PPTX 解析 extras
|
||||
pptx-win = [
|
||||
"docling>=2.0.0",
|
||||
"unstructured[pptx]>=0.12.0",
|
||||
"markitdown[pptx]>=0.1.0",
|
||||
"python-pptx>=0.6.0",
|
||||
"markdownify>=0.12.0",
|
||||
]
|
||||
pdf = [
|
||||
pptx-unix = [
|
||||
"docling>=2.0.0",
|
||||
"unstructured>=0.12.0",
|
||||
"unstructured[pptx]>=0.12.0",
|
||||
"markitdown[pptx]>=0.1.0",
|
||||
"python-pptx>=0.6.0",
|
||||
"markdownify>=0.12.0",
|
||||
]
|
||||
|
||||
# 平台特定的 PDF 解析 extras
|
||||
pdf-win = [
|
||||
"docling>=2.0.0",
|
||||
"unstructured[pdf]>=0.12.0",
|
||||
"unstructured-paddleocr>=0.1.0",
|
||||
"markitdown>=0.1.0",
|
||||
"paddlepaddle==2.6.2",
|
||||
"ml-dtypes>=0.3.0",
|
||||
"markitdown[pdf]>=0.1.0",
|
||||
"pypdf>=4.0.0",
|
||||
"markdownify>=0.12.0",
|
||||
]
|
||||
pdf-macos-intel = [
|
||||
"docling==2.40.0",
|
||||
"docling-parse==4.0.0",
|
||||
"markitdown[pdf]>=0.1.0",
|
||||
"pypdf>=4.0.0",
|
||||
"markdownify>=0.12.0",
|
||||
]
|
||||
pdf-macos-arm = [
|
||||
"docling>=2.0.0",
|
||||
"unstructured[pdf]>=0.12.0",
|
||||
"markitdown[pdf]>=0.1.0",
|
||||
"pypdf>=4.0.0",
|
||||
"markdownify>=0.12.0",
|
||||
]
|
||||
pdf-linux = [
|
||||
"docling>=2.0.0",
|
||||
"unstructured[pdf]>=0.12.0",
|
||||
"markitdown[pdf]>=0.1.0",
|
||||
"pypdf>=4.0.0",
|
||||
"markdownify>=0.12.0",
|
||||
]
|
||||
|
||||
# 平台特定的 Office 文档组合 extras
|
||||
office-win = [
|
||||
"lyxy-document[docx-win,xlsx-win,pptx-win,pdf-win]",
|
||||
]
|
||||
office-macos-intel = [
|
||||
"lyxy-document[docx-unix,xlsx-unix,pptx-unix,pdf-macos-intel]",
|
||||
]
|
||||
office-macos-arm = [
|
||||
"lyxy-document[docx-unix,xlsx-unix,pptx-unix,pdf-macos-arm]",
|
||||
]
|
||||
office-linux = [
|
||||
"lyxy-document[docx-unix,xlsx-unix,pptx-unix,pdf-linux]",
|
||||
]
|
||||
|
||||
# 其他 extras(非平台特定)
|
||||
html = [
|
||||
"trafilatura>=1.10.0",
|
||||
"domscribe>=0.1.0",
|
||||
@@ -51,14 +119,11 @@ http = [
|
||||
"pyppeteer>=2.0.0",
|
||||
"selenium>=4.18.0",
|
||||
]
|
||||
office = [
|
||||
"lyxy-document[docx,xlsx,pptx,pdf]",
|
||||
]
|
||||
web = [
|
||||
"lyxy-document[html,http]",
|
||||
]
|
||||
full = [
|
||||
"lyxy-document[office,web]",
|
||||
"lyxy-document[office-macos-arm,web]",
|
||||
]
|
||||
dev = [
|
||||
"pytest>=8.0.0",
|
||||
|
||||
Reference in New Issue
Block a user