refactor: 优化 chardet 依赖配置,仅保留在 HTML reader 中
- 从 pdf/docx/xlsx/pptx reader 的依赖列表中移除 chardet - 保留 chardet 在 html reader 的依赖配置中(唯一实际使用方) - 更新 README.md 文档,移除不必要的 chardet 依赖说明 - 简化测试命令,移除非 HTML reader 测试中的 chardet
This commit is contained in:
22
README.md
22
README.md
@@ -77,8 +77,8 @@ DEPENDENCIES = {
|
||||
首先验证项目可以正常运行:
|
||||
|
||||
```bash
|
||||
# 测试 --advice 功能(仅需 chardet)
|
||||
uv run --with chardet python scripts/lyxy_document_reader.py test.pdf --advice
|
||||
# 测试 --advice 功能(无需额外依赖)
|
||||
uv run python scripts/lyxy_document_reader.py test.pdf --advice
|
||||
```
|
||||
|
||||
### 运行基础测试
|
||||
@@ -87,7 +87,6 @@ uv run --with chardet python scripts/lyxy_document_reader.py test.pdf --advice
|
||||
# 运行 CLI 测试(验证项目基本功能)
|
||||
uv run \
|
||||
--with pytest \
|
||||
--with chardet \
|
||||
pytest tests/test_cli/test_main.py::TestCLIAdviceOption -v
|
||||
```
|
||||
|
||||
@@ -95,10 +94,9 @@ uv run \
|
||||
|
||||
### 测试前置依赖说明
|
||||
|
||||
由于 `HtmlReader` 模块在导入时会加载 `cleaner.py`,但 `cleaner.py` 中的第三方库已改为动态导入,因此仅需基础依赖:
|
||||
- **chardet**:编码检测
|
||||
由于 `HtmlReader` 模块在导入时会加载 `cleaner.py`,但 `cleaner.py` 中的第三方库已改为动态导入,因此无需额外依赖。
|
||||
|
||||
`beautifulsoup4` 仅在实际使用 HTML 清理功能时才需要,模块导入时不依赖。
|
||||
`beautifulsoup4` 和 `chardet` 仅在实际使用 HTML 功能时才需要,模块导入时不依赖。
|
||||
|
||||
### 如何添加新的 Reader
|
||||
|
||||
@@ -116,7 +114,6 @@ uv run \
|
||||
uv run \
|
||||
--with pytest \
|
||||
--with pytest-cov \
|
||||
--with chardet \
|
||||
pytest
|
||||
```
|
||||
|
||||
@@ -130,7 +127,6 @@ uv run \
|
||||
--with pypandoc-binary \
|
||||
--with python-docx \
|
||||
--with markdownify \
|
||||
--with chardet \
|
||||
pytest tests/test_readers/test_docx/
|
||||
```
|
||||
|
||||
@@ -143,7 +139,6 @@ uv run \
|
||||
--with "markitdown[xlsx]" \
|
||||
--with pandas \
|
||||
--with tabulate \
|
||||
--with chardet \
|
||||
pytest tests/test_readers/test_xlsx/
|
||||
```
|
||||
|
||||
@@ -156,7 +151,6 @@ uv run \
|
||||
--with "markitdown[pptx]" \
|
||||
--with python-pptx \
|
||||
--with markdownify \
|
||||
--with chardet \
|
||||
pytest tests/test_readers/test_pptx/
|
||||
```
|
||||
|
||||
@@ -170,7 +164,6 @@ uv run \
|
||||
--with "markitdown[pdf]" \
|
||||
--with pypdf \
|
||||
--with markdownify \
|
||||
--with chardet \
|
||||
--with reportlab \
|
||||
pytest tests/test_readers/test_pdf/
|
||||
|
||||
@@ -184,7 +177,6 @@ uv run \
|
||||
--with "markitdown[pdf]" \
|
||||
--with pypdf \
|
||||
--with markdownify \
|
||||
--with chardet \
|
||||
--with reportlab \
|
||||
pytest tests/test_readers/test_pdf/
|
||||
```
|
||||
@@ -205,23 +197,20 @@ uv run \
|
||||
|
||||
#### 运行特定测试文件或方法
|
||||
```bash
|
||||
# 运行特定测试文件(CLI 测试仅需 chardet)
|
||||
# 运行特定测试文件(CLI 测试无需额外依赖)
|
||||
uv run \
|
||||
--with pytest \
|
||||
--with chardet \
|
||||
pytest tests/test_cli/test_main.py
|
||||
|
||||
# 仅运行 --advice 相关测试(不需要额外依赖)
|
||||
uv run \
|
||||
--with pytest \
|
||||
--with chardet \
|
||||
pytest tests/test_cli/test_main.py::TestCLIAdviceOption
|
||||
|
||||
# 运行特定测试类或方法
|
||||
uv run \
|
||||
--with pytest \
|
||||
--with docling \
|
||||
--with chardet \
|
||||
pytest tests/test_cli/test_main.py::TestCLIDefaultOutput::test_default_output_docx
|
||||
```
|
||||
|
||||
@@ -230,7 +219,6 @@ uv run \
|
||||
uv run \
|
||||
--with pytest \
|
||||
--with pytest-cov \
|
||||
--with chardet \
|
||||
pytest --cov=scripts --cov-report=term-missing
|
||||
```
|
||||
|
||||
|
||||
@@ -30,8 +30,7 @@ DEPENDENCIES = {
|
||||
"unstructured[pdf]",
|
||||
"markitdown[pdf]",
|
||||
"pypdf",
|
||||
"markdownify",
|
||||
"chardet"
|
||||
"markdownify"
|
||||
]
|
||||
},
|
||||
"Darwin-x86_64": {
|
||||
@@ -42,8 +41,7 @@ DEPENDENCIES = {
|
||||
"numpy<2",
|
||||
"markitdown[pdf]",
|
||||
"pypdf",
|
||||
"markdownify",
|
||||
"chardet"
|
||||
"markdownify"
|
||||
]
|
||||
}
|
||||
},
|
||||
@@ -56,8 +54,7 @@ DEPENDENCIES = {
|
||||
"markitdown[docx]",
|
||||
"pypandoc-binary",
|
||||
"python-docx",
|
||||
"markdownify",
|
||||
"chardet"
|
||||
"markdownify"
|
||||
]
|
||||
}
|
||||
},
|
||||
@@ -69,8 +66,7 @@ DEPENDENCIES = {
|
||||
"unstructured[xlsx]",
|
||||
"markitdown[xlsx]",
|
||||
"pandas",
|
||||
"tabulate",
|
||||
"chardet"
|
||||
"tabulate"
|
||||
]
|
||||
}
|
||||
},
|
||||
@@ -82,8 +78,7 @@ DEPENDENCIES = {
|
||||
"unstructured[pptx]",
|
||||
"markitdown[pptx]",
|
||||
"python-pptx",
|
||||
"markdownify",
|
||||
"chardet"
|
||||
"markdownify"
|
||||
]
|
||||
}
|
||||
},
|
||||
|
||||
Reference in New Issue
Block a user