当前位置:   article > 正文

Python爬虫-Cloudflare五秒盾-绕过TLS指纹_curl_cffi

curl_cffi

什么是TLS指纹

TLS指纹是一种用于识别和验证TLS(传输层安全)通信的技术。

TLS指纹可以通过检查TLS握手过程中使用的密码套件、协议版本和加密算法等信息来确定TLS通信的特征。由于每个TLS实现使用的密码套件、协议版本和加密算法不同,因此可以通过比较TLS指纹来判断通信是否来自预期的源或目标。

TLS指纹可以用于检测网络欺骗、中间人攻击、间谍活动等安全威胁,也可以用于识别和管理设备和应用程序。

简单来说,就是伪装ja3_text值,让其不被拦截即可,以修改支持的加密算法为主。

使用Python的curl_cffi库,主打的就是模拟各种指纹

pip install --upgrade curl_cffi

支持的模拟版本,由curl-impersonate支持

  1. edge99 = "edge99"
  2. edge101 = "edge101"
  3. chrome99 = "chrome99"
  4. chrome100 = "chrome100"
  5. chrome101 = "chrome101"
  6. chrome104 = "chrome104"
  7. chrome107 = "chrome107"
  8. chrome110 = "chrome110"
  9. chrome99_android = "chrome99_android"
  10. safari15_3 = "safari15_3"
  11. safari15_5 = "safari15_5"

curl_cffi 测试请求

  1. from curl_cffi import requests
  2. # Notice the impersonate parameter
  3. r = requests.get("https://tls.browserleaks.com/json", impersonate="chrome110")
  4. print(r.json())
  5. # output: {..., "ja3n_hash": "aa56c057ad164ec4fdcb7a5a283be9fc", ...}
  6. # the js3n fingerprint should be the same as target browser
  7. # http/socks proxies are supported
  8. proxies = {"https": "http://localhost:3128"}
  9. r = requests.get("https://tls.browserleaks.com/json", impersonate="chrome110", proxies=proxies)
  10. proxies = {"https": "socks://localhost:3128"}
  11. r = requests.get("https://tls.browserleaks.com/json", impersonate="chrome110", proxies=proxies)

以下由某文献网站爬取测试为例: https://onlinelibrary.wiley.com/

  1. from curl_cffi import requests
  2. url = "https://onlinelibrary.wiley.com/action/doSearch?AllField=ADC&sortBy=Earliest&startPage=0&pageSize=10"
  3. headers = {
  4. "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
  5. "Accept-Encoding": "gzip, deflate, br",
  6. "Accept-Language": "zh,zh-CN;q=0.9",
  7. "Cache-Control": "no-cache",
  8. "Pragma": "no-cache",
  9. "Referer": "https://onlinelibrary.wiley.com/",
  10. "Sec-Ch-Ua-Mobile": "?0",
  11. "Sec-Ch-Ua-Platform": "\"macOS\"",
  12. "Sec-Fetch-Dest": "document",
  13. "Sec-Fetch-Mode": "navigate",
  14. "Sec-Fetch-Site": "same-site",
  15. "Sec-Fetch-User": "?1",
  16. "Upgrade-Insecure-Requests": "1",
  17. "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
  18. }
  19. s = requests.Session()
  20. response = s.get(url, impersonate="chrome110", headers=headers, verify=False)
  21. if response.status_code == 200:
  22. result = response.content.decode()
  23. print(result)
  24. else:
  25. print(response.status_code)

可以完整请求原始网页,成功打印绕过指纹验证。

分享来源:

GitHub - yifeikong/curl_cffi: Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/很楠不爱3/article/detail/541868
推荐阅读
相关标签
  

闽ICP备14008679号