Files
lanyuanxiaoyao 47038475d4 refactor: 将 HTML 下载器拆分为子包结构
将 scripts/readers/html/downloader.py (263行) 拆分为 downloader/ 子包,各下载器独立维护:

- 创建 downloader/ 子包,包含 __init__.py、common.py 和 4 个下载器模块
- common.py 集中管理公共配置(USER_AGENT、CHROME_ARGS 等)
- 各下载器统一接口 download(url: str) -> Tuple[Optional[str], Optional[str]]
- 在 __init__.py 定义 DOWNLOADERS 列表显式注册,参考 parser 模式
- 更新 html/__init__.py 导入语句,从 .downloader import download_html
- 添加完整的类型注解,提升代码可维护性
2026-03-09 01:13:42 +08:00

66 lines
2.4 KiB
Python
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""下载器公共配置"""
# 公共配置
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
WINDOW_SIZE = "1920,1080"
LANGUAGE_SETTING = "zh-CN,zh"
# Chrome 浏览器启动参数pyppeteer 和 selenium 共用)
CHROME_ARGS = [
"--no-sandbox",
"--disable-dev-shm-usage",
"--disable-gpu",
"--disable-software-rasterizer",
"--disable-extensions",
"--disable-background-networking",
"--disable-default-apps",
"--disable-sync",
"--disable-translate",
"--hide-scrollbars",
"--metrics-recording-only",
"--mute-audio",
"--no-first-run",
"--safebrowsing-disable-auto-update",
"--blink-settings=imagesEnabled=false",
"--disable-plugins",
"--disable-ipc-flooding-protection",
"--disable-renderer-backgrounding",
"--disable-background-timer-throttling",
"--disable-hang-monitor",
"--disable-prompt-on-repost",
"--disable-client-side-phishing-detection",
"--disable-component-update",
"--disable-domain-reliability",
"--disable-features=site-per-process",
"--disable-features=IsolateOrigins",
"--disable-features=VizDisplayCompositor",
"--disable-features=WebRTC",
f"--window-size={WINDOW_SIZE}",
f"--lang={LANGUAGE_SETTING}",
f"--user-agent={USER_AGENT}",
]
# 隐藏自动化特征的脚本pyppeteer 和 selenium 共用)
HIDE_AUTOMATION_SCRIPT = """
() => {
Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
Object.defineProperty(navigator, 'plugins', { get: () => [1, 2, 3, 4, 5] });
Object.defineProperty(navigator, 'languages', { get: () => ['zh-CN', 'zh'] });
}
"""
# pyppeteer 额外的隐藏自动化脚本(包含 notifications 处理)
HIDE_AUTOMATION_SCRIPT_PUPPETEER = """
() => {
Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
Object.defineProperty(navigator, 'plugins', { get: () => [1, 2, 3, 4, 5] });
Object.defineProperty(navigator, 'languages', { get: () => ['zh-CN', 'zh'] });
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (parameters) => (
parameters.name === 'notifications' ?
Promise.resolve({ state: Notification.permission }) :
originalQuery(parameters)
);
}
"""