当前位置:   article > 正文

python selenium 爬虫,使用代理 ,和 python 爬取代理网站,保存ip_selenium设置ip代理后但还是本地ip

selenium设置ip代理后但还是本地ip

selenium 使用代理

  1. import time
  2. from selenium import webdriver
  3. # 使用代理
  4. options = webdriver.ChromeOptions()
  5. options.add_argument("--proxy-server=http://101.37.79.125:3128")
  6. driver = webdriver.Chrome(chrome_options=options)
  7. driver.maximize_window()
  8. driver.get('url')

python  爬取代理服务器 保存本地(本人亲测,可用)

  1. import requests
  2. from bs4 import BeautifulSoup
  3. import random
  4. headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36'}
  5. def xici_ip(page):
  6. for num_page in range(1,page+1):
  7. url_part = "http://www.xicidaili.com/wn/" # 爬取西刺代理的IP,此处选的是国内https
  8. url = url_part + str(num_page) # 构建爬取的页面URL
  9. r = requests.get(url, headers=headers)
  10. if r.status_code == 200:
  11. soup = BeautifulSoup(r.text,'lxml')
  12. trs = soup.find_all('tr')
  13. for i in range(1,len(trs)):
  14. tr = trs[i]
  15. tds = tr.find_all('td')
  16. ip_item = tds[1].text + ':' + tds[2].text
  17. # print('抓取第'+ str(page) + '页第' + str(i) +'个:' + ip_item)
  18. with open(r'D:\ip.txt', 'a', encoding='utf-8') as f:
  19. f.writelines(ip_item + '\n')
  20. # time.sleep(1)
  21. return ('存储成功')
  22. def get_ip():
  23. with open(r'D:\ip.txt', 'r', encoding='utf-8') as f:
  24. lines = f.readlines()
  25. return random.choice(lines)
  26. def check_ip():
  27. proxies = {'HTTPS': 'HTTPS://' + get_ip().replace('\n', '')}
  28. try:
  29. r = requests.get('http://httpbin.org/ip', headers=headers, proxies=proxies, timeout=10)
  30. if r.status_code == 200:
  31. return proxies
  32. except Exception as e:
  33. print(e)
  34. def main():
  35. xici_ip(1)
  36. try:
  37. return check_ip()
  38. except Exception as e:
  39. print(e)
  40. check_ip()
  41. if __name__ == '__main__':
  42. main()

声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop】
推荐阅读
相关标签
  

闽ICP备14008679号