- wpsshop博客
当前位置:   article > 正文

写python爬虫的第一天,拿百度练手遇到 “ 被反爬遇到<title>百度安全验证</title> ” 的解决方案_百度安全验证

百度安全验证

博主第一次写博文,第一次学爬虫,就是想分享,大家见怪不怪,

首先我设置了一个自定义UA代理池并没有采用插件pip install fake-useragent形式进行随机获取print(ua.ie)

下面是我修改了第一个错误之后的程序,我第一次写的是

  1. ua={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:65.0) Gecko/20100101 Firefox/65.0",
  2. "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:65.0) Gecko/20100101 Firefox/65.0"
  3. }
  1. url = 'http://www.baidu.com/'
  2. headers = ua_info.a
  3. req = request.Request(url=url, headers=headers)
  4. res = urllib.request.urlopen(req)
  5. #html = res.read().decode('utf-8')
  6. print(html)

遇到的第一个问题:

  1. Traceback (most recent call last):
  2. File "C:\Programs\Python\pythonProject\main.py", line 25, in
  3. req = request.Request(url=url, headers=headers)
  4. File "C:\Programs\Python\Python39\lib\urllib\request.py", line 326, in init
  5. for key, value in headers.items():
  6. AttributeError: 'str' object has no attribute 'items'
  7. Process finished with exit code 1

改好第一个问题之后的程序

  1. ua_list = [
  2. 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0',
  3. 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11',
  4. 'User-Agent:Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11',
  5. 'Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1',
  6. 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)',
  7. 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50',
  8. 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0',
  9. 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1',
  10. 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1',
  11. 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1',
  12. ]
  13. a = random.choice(ua_list)
  14. print(a)
  1. url = 'http://www.baidu.com/'
  2. rs1 = ua_info.a
  3. headers = {'User-Agent': rs1}
  4. # 1、创建请求对象,包装ua信息
  5. # req = request.Request(url=url, headers=headers)
  6. query_string = {
  7. 'wd': '爬虫'
  8. }
  9. result = parse.urlencode(query_string)
  10. url1 = 'http://www.baidu.com/s?{}'.format(result)
  11. req = request.Request(url=url1, headers=headers)
  12. res = urllib.request.urlopen(req)
  13. html = res.read().decode('utf-8')
  14. print(html)

 爬个五次吧,出现了下面结果

  1. <!DOCTYPE html>
  2. <html lang="zh-CN">
  3. <head>
  4. <meta charset="utf-8">
  5. <title>百度安全验证</title>
  6. <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  7. <meta name="apple-mobile-web-app-capable" content="yes">
  8. <meta name="apple-mobile-web-app-status-bar-style" content="black">
  9. <meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0">
  10. <meta name="format-detection" content="telephone=no, email=no">
  11. <link rel="shortcut icon" href="https://www.baidu.com/favicon.ico" type="image/x-icon">
  12. <link rel="icon" sizes="any" mask href="https://www.baidu.com/img/baidu.svg">
  13. <meta http-equiv="X-UA-Compatible" content="IE=Edge">
  14. <meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">
  15. <link rel="stylesheet" href="https://wappass.bdimg.com/static/touch/css/api/mkdjump_0635445.css" />
  16. </head>
  17. <body>
  18. <div class="timeout hide">
  19. <div class="timeout-img"></div>
  20. <div class="timeout-title">网络不给力,请稍后重试</div>
  21. <button type="button" class="timeout-button">返回首页</button>
  22. </div>
  23. <div class="timeout-feedback hide">
  24. <div class="timeout-feedback-icon"></div>
  25. <p class="timeout-feedback-title">问题反馈</p>
  26. </div>

查百度解决方案让我在headers中加个参数,并说明找到的位置,并且已经得到了解决,

  1. headers = {
  2. 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36 Edg/83.0.478.50',
  3. 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
  4. }

好奇之下我查了爬虫与反爬的对抗,如下

文章链接:反爬虫策略及破解方法 - 特洛伊-Micro - 博客园反爬虫策略及破解方法 作者出蜘蛛网了 反爬虫策略及破解方法 作者出蜘蛛网了 反爬虫策略及破解方法 作者出蜘蛛网了 反爬虫策略及破解方法爬虫和反爬的对抗一直在进行着…为了帮助更好的进行爬虫行为以及反爬,https://www.cnblogs.com/micro-chen/p/8676312.html

 试了 试下面的代码,也是可以的,但是会报警告

headers={'User-Agent':'Baiduspider'}

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/398141
推荐阅读
相关标签
  

闽ICP备14008679号