当前位置:   article > 正文

爬虫系列,(3),达盖尔图片抓取

爬虫系列,(3),达盖尔图片抓取
import re
import requests
from bs4 import BeautifulSoup


# 第一步得到代理
def proxy():
    with open(r'ip_proxies\有效ip.txt', 'r', encoding='utf-8') as f:
        r = f.readlines()
        for ip in r:
            try:
                proxies = eval(ip)
                if requests.get('http://t66y.com/index.php', proxies=proxies, timeout=2).status_code == 200:
                    return proxies
            except:
                pass


proxies = proxy()
print(proxies)

# 第二步得到网页链接池
url = 'http://t66y.com/index.php'
url2 = 'http://t66y.com/thread0806.php?fid=16'
headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,\
image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
           'Accept-Encoding': 'gzip, deflate',
           'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,zh-TW;q=0.7&
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/知新_RL/article/detail/559412
推荐阅读
相关标签
  

闽ICP备14008679号