test: 添加全面的测试套件,覆盖所有 Reader 实现

- 测试数量从 83 个增加到 193 个 (+132%)
- 代码覆盖率从 48% 提升到 69% (+44%)
- 为每种文档格式的所有 Reader 实现创建独立测试
- 添加跨 Reader 的一致性验证测试
- 新增 4 个测试规范 (cli-testing, exception-testing, reader-testing, test-fixtures)
- 更新 README 测试统计信息

测试覆盖:
- DOCX: python-docx, markitdown, docling, native-xml, pypandoc, unstructured
- PDF: pypdf, markitdown, docling, docling-ocr, unstructured, unstructured-ocr
- HTML: html2text, markitdown, trafilatura, domscribe
- PPTX: python-pptx, markitdown, docling, native-xml, unstructured
- XLSX: pandas, markitdown, docling, native-xml, unstructured
- CLI: 所有命令行选项和错误处理

所有 193 个测试通过。
This commit is contained in:
2026-03-08 22:20:21 +08:00
parent c35bbc90b5
commit 7eab1dcef1
53 changed files with 3094 additions and 259 deletions

View File

@@ -50,13 +50,13 @@ def output_result(
elif args.lines:
print(len(content.split("\n")))
elif args.titles:
from core.markdown import extract_titles
from scripts.core.markdown import extract_titles
titles = extract_titles(content)
for title in titles:
print(title)
elif args.title_content:
from core.markdown import extract_title_content
from scripts.core.markdown import extract_title_content
title_content = extract_title_content(content, args.title_content)
if title_content is None:
@@ -64,7 +64,7 @@ def output_result(
sys.exit(1)
print(title_content, end="")
elif args.search:
from core.markdown import search_markdown
from scripts.core.markdown import search_markdown
search_result = search_markdown(content, args.search, args.context)
if search_result is None: