1
0
Files
Skill/skills/lyxy-reader-html/references/examples.md
lanyuanxiaoyao 6b4fcf2647 创建 lyxy-reader-html skill
- 新增 skill: lyxy-reader-html,用于解析 HTML 文件和 URL 网页内容
- 支持 URL 下载(pyppeteer → selenium → httpx → urllib 优先级回退)
- 支持 HTML 解析(trafilatura → domscribe → MarkItDown → html2text 优先级回退)
- 支持查询功能:全文提取、字数统计、行数统计、标题提取、章节提取、正则搜索
- 新增 spec: html-document-parsing
- 归档 change: create-lyxy-reader-html-skill
2026-03-08 02:02:03 +08:00

60 lines
1.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 示例
## URL 输入 - 提取完整文档内容
```bash
# 使用 uv推荐
uv run --with trafilatura --with domscribe --with markitdown --with html2text --with httpx --with beautifulsoup4 scripts/parser.py https://example.com
# 直接使用 Python
python scripts/parser.py https://example.com
```
## HTML 文件输入 - 提取完整文档内容
```bash
# 使用 uv推荐
uv run --with trafilatura --with domscribe --with markitdown --with html2text --with beautifulsoup4 scripts/parser.py page.html
# 直接使用 Python
python scripts/parser.py page.html
```
## 获取文档字数
```bash
uv run --with trafilatura --with html2text --with beautifulsoup4 scripts/parser.py -c https://example.com
```
## 获取文档行数
```bash
uv run --with trafilatura --with html2text --with beautifulsoup4 scripts/parser.py -l https://example.com
```
## 提取所有标题
```bash
uv run --with trafilatura --with html2text --with beautifulsoup4 scripts/parser.py -t https://example.com
```
## 提取指定章节
```bash
uv run --with trafilatura --with html2text --with beautifulsoup4 scripts/parser.py -tc "关于我们" https://example.com
```
## 搜索关键词
```bash
uv run --with trafilatura --with html2text --with beautifulsoup4 scripts/parser.py -s "关键词" -n 3 https://example.com
```
## 降级到直接 Python 执行
仅当 lyxy-runner-python skill 不存在时使用:
```bash
python3 scripts/parser.py https://example.com
```