当前位置:   article > 正文

Python实战 | 爬取当当网 TOP500 畅销书

Python实战 | 爬取当当网 TOP500 畅销书

目标网页:当当网书籍畅销榜 http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-24hours-0-0-1-1

爬取结果:

代码:

  1. import requests,re,json
  2. def request_dangdang(url):
  3. try:
  4. response = requests.get(url)
  5. if response.status_code == 200:
  6. return response.text
  7. except requests.RequestException:
  8. return None
  9. def parse_result(html):
  10. pattern = re.compile('<li>.*?list_num.*?(\d+).</div>.*?<img src="(.*?)".*?class="name".*?title="(.*?)">.*?class="star">.*?class="tuijian">(.*?)</span>.*?class="publisher_info">.*?target="_blank">(.*?)</a>.*?class="biaosheng">.*?<span>(.*?)</span></div>.*?<p><span\sclass="price_n">&yen;(.*?)</span>.*?</li>',re.S)
  11. items = re.findall(pattern,html)
  12. for item in items:
  13. yield{ #写入为字典类型
  14. 'range':item[0],
  15. 'iamge':item[1],
  16. 'title':item[2],
  17. 'recommend':item[3],
  18. 'author':item[4],
  19. 'times':item[5],
  20. 'price':item[6]
  21. }
  22. print(1)
  23. for item in items:
  24. print(item)
  25. def write_item_to_file(item):
  26. #print("写入数据===》" + str(item))
  27. with open('book.txt','a',encoding='utf-8') as f:
  28. f.write(json.dumps(item,ensure_ascii=False) + '\n') #将字典类型转化为字符串写入文件
  29. f.close()
  30. def main(page):
  31. url = 'http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-'+str(page)
  32. html = request_dangdang(url)
  33. items = parse_result(html) #解析过滤想要的信息
  34. for item in items:
  35. write_item_to_file(item)
  36. if __name__ == "__main__":
  37. for i in range(1,26):
  38. main(i)

参考网址:https://blog.csdn.net/weixin_42469142/article/details/89856325

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/825074
推荐阅读
相关标签
  

闽ICP备14008679号