创建 lyxy-reader-html skill
- 新增 skill: lyxy-reader-html,用于解析 HTML 文件和 URL 网页内容 - 支持 URL 下载(pyppeteer → selenium → httpx → urllib 优先级回退) - 支持 HTML 解析(trafilatura → domscribe → MarkItDown → html2text 优先级回退) - 支持查询功能:全文提取、字数统计、行数统计、标题提取、章节提取、正则搜索 - 新增 spec: html-document-parsing - 归档 change: create-lyxy-reader-html-skill
This commit is contained in:
59
skills/lyxy-reader-html/references/examples.md
Normal file
59
skills/lyxy-reader-html/references/examples.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# 示例
|
||||
|
||||
## URL 输入 - 提取完整文档内容
|
||||
|
||||
```bash
|
||||
# 使用 uv(推荐)
|
||||
uv run --with trafilatura --with domscribe --with markitdown --with html2text --with httpx --with beautifulsoup4 scripts/parser.py https://example.com
|
||||
|
||||
# 直接使用 Python
|
||||
python scripts/parser.py https://example.com
|
||||
```
|
||||
|
||||
## HTML 文件输入 - 提取完整文档内容
|
||||
|
||||
```bash
|
||||
# 使用 uv(推荐)
|
||||
uv run --with trafilatura --with domscribe --with markitdown --with html2text --with beautifulsoup4 scripts/parser.py page.html
|
||||
|
||||
# 直接使用 Python
|
||||
python scripts/parser.py page.html
|
||||
```
|
||||
|
||||
## 获取文档字数
|
||||
|
||||
```bash
|
||||
uv run --with trafilatura --with html2text --with beautifulsoup4 scripts/parser.py -c https://example.com
|
||||
```
|
||||
|
||||
## 获取文档行数
|
||||
|
||||
```bash
|
||||
uv run --with trafilatura --with html2text --with beautifulsoup4 scripts/parser.py -l https://example.com
|
||||
```
|
||||
|
||||
## 提取所有标题
|
||||
|
||||
```bash
|
||||
uv run --with trafilatura --with html2text --with beautifulsoup4 scripts/parser.py -t https://example.com
|
||||
```
|
||||
|
||||
## 提取指定章节
|
||||
|
||||
```bash
|
||||
uv run --with trafilatura --with html2text --with beautifulsoup4 scripts/parser.py -tc "关于我们" https://example.com
|
||||
```
|
||||
|
||||
## 搜索关键词
|
||||
|
||||
```bash
|
||||
uv run --with trafilatura --with html2text --with beautifulsoup4 scripts/parser.py -s "关键词" -n 3 https://example.com
|
||||
```
|
||||
|
||||
## 降级到直接 Python 执行
|
||||
|
||||
仅当 lyxy-runner-python skill 不存在时使用:
|
||||
|
||||
```bash
|
||||
python3 scripts/parser.py https://example.com
|
||||
```
|
||||
Reference in New Issue
Block a user