当前位置:   article > 正文

爬取图片网站_可以爬图片的网站

可以爬图片的网站

图片网址:图片 / 国内cos-Cosplay中国|Cosplay图片|COS图片|动漫图片图片 / 国内cos_Cosplay中国|动漫图片|模型图片|美女图片|私房写真icon-default.png?t=L892http://www.cosplay8.com/pic/chinacos/

网站介绍:是一个 Cos 网站,该类网站很容易 消失 在互联网中,为了让数据存储下来,我们盘它。 

源代码:

  1. import urllib.request
  2. from urllib.parse import urljoin
  3. from lxml import etree
  4. import re
  5. import requests
  6. x1=re.compile(r'<li><a href="(.*?).html">')
  7. c1= re.compile(r"<img src='(.*?)' id='bigimg'",re.S)
  8. d1=re.compile(r'<title>(.*?)</title>')
  9. i=1
  10. for i in range(8,98):
  11. baseurl="http://www.cosplay8.com/pic/chinacos/list_22_"
  12. i=i+1
  13. url=baseurl+str(i)+".html"
  14. headers = {
  15. 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36'}
  16. res = urllib.request.Request(url=url, headers=headers)
  17. try:
  18. respone = urllib.request.urlopen(res, timeout=1000)
  19. except Exception as err:
  20. print("出现异常" + str(err))
  21. respones = respone.read().decode('utf-8')
  22. # print(respones)
  23. x1 = re.compile(r'<li><a href="(.*?).html">')
  24. x2 = re.findall(x1, respones) # 一页中所以详情页链接
  25. # print(x2)
  26. for x3 in x2:
  27. # print(1)
  28. x4 = x3
  29. a1 = 1
  30. for a1 in range(1, 10):
  31. baseurl = 'http://www.cosplay8.com'
  32. a1 = a1 + 1
  33. lasturl = baseurl + x4 + '_' + str(a1) + '.html' # 詳情頁的url
  34. url = lasturl
  35. # print(lasturl)
  36. # print('开始下载图片---请稍后')
  37. # respones1 = requests.get(lasturl).content
  38. # 获取图片链接
  39. headers = {
  40. 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36'}
  41. res = urllib.request.Request(url=url, headers=headers)
  42. try:
  43. respone = urllib.request.urlopen(res, timeout=100)
  44. except Exception as err:
  45. print("出现异常" + str(err))
  46. respones = respone.read().decode('utf-8')
  47. c2 = re.findall(c1, respones) # 一半的图片链接
  48. d2 = re.findall(d1, respones) # 标题
  49. # print(c2)
  50. # print(d2)
  51. for c3 in c2:
  52. c4 = c3
  53. url1 = "http://www.cosplay8.com" + c4
  54. print(url1)
  55. for d3 in d2:
  56. d4 = d3
  57. try:
  58. respones1 = requests.get(url1, timeout=5).content
  59. except Exception as err:
  60. print("出现异常" + str(err))
  61. try:
  62. with open('cosplay\\' + d4 + '.jpg', mode='wb') as f:#保存路径自己设置
  63. f.write(respones1)
  64. print('正在保存壁纸')
  65. print('图片下载' + str(a1) + '张')
  66. except Exception as err:
  67. print("出现异常" + str(err))
  68. print('图片下载结束')
  69. print('打印'+str(i)+'页')

 下面展示一些成果:

 

 

 

 

 

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Monodyee/article/detail/437336
推荐阅读
相关标签
  

闽ICP备14008679号