当前位置:   article > 正文

爬虫爬取百度图片

爬虫爬取百度图片

具体分析过程就不写了,给出我学习的链接:https://blog.csdn.net/qq_35371031/article/details/81207966
上代码

import requests
import os
import threading
import urllib.parse
import time
import re
import hashlib
class picture:
    """
    爬取百度图片
    """
    def __init__(self, picture_name,picture_number=100 ,path = 'picture'):
        self.save_path = picture_name
        self.picture_number = int(picture_number)
        self.start_time = time.time()
        self.picture_name = picture_name
        self.header = {
            'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36',
            'accept-language': 'zh-CN,zh;q=0.9',
            'cache-control': 'max-age=0',
            'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'
        }
        if self.save_path not in os.listdir('.'):
            os.makedirs(self.save_path)
        self.start()
    def start(self):
        for i in range(0,self.picture_number,60):
            self.get_picture_content(i)
    def get_picture_content(self,count):
        url = 'https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&rn=60&word={0}&pn={1}'.format(urllib.parse.quote(self.picture_name),str(count))
        print(url)
        r = requests.get(url,headers = self.header)
        if r.status_code != 200:
            exit("访问百度图库错误")
        else:
            link_url = re.findall('(?<=thumbURL":").*?.jpg',r.text)
            new_count = 60 if count+60 < self.picture_number else count + 60 - self.picture_number
            for i in range(new_count):
                res = requests.get(link_url[i],headers=self.header)
                if res.status_code != 200:
                    exit('访问图片链接错误')
                else:
                    self.save_picture(res.content,link_url[i])
    def save_picture(self,content,picture_name):
        
        with open("{0}/{1}.jpg".format(self.save_path,hashlib.md5(picture_name.encode()).hexdigest()),'wb') as f:
            f.write(content)
    def __del__(self):
              print("花费了{}s时间".format(str(time.time()-self.start_time)))
if __name__ == "__main__":
    picture_name = input("输入你要爬取的图片类型    ")
    number  = input('输入你想爬取的数量   ')
    pic = picture(picture_name,number)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53

我没有写多线程,在我本地测试中了1000张
在这里插入图片描述

PS:
就这(狗头)??
在这里插入图片描述

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/801082
推荐阅读
相关标签
  

闽ICP备14008679号