当前位置:   article > 正文

利用python爬虫爬取旅游网信息_在哪能爬到旅游相关数据

在哪能爬到旅游相关数据

一、准备需要的库

  1. import requests
  2. from lxml import html
  3. from openpyxl import Workbook

二、爬取的网站

url = 'https://place.qyer.com/china/citylist-0-0-1/'

三、对网站进行抓包分析

四、源码

  1. # -- coding: utf-8 --
  2. import requests
  3. from lxml import html
  4. from openpyxl import Workbook
  5. # 创建Excel
  6. wb = Workbook()
  7. ws = wb.active
  8. # 获取数据
  9. url = 'https://place.qyer.com/china/citylist-0-0-1/'
  10. def getpage(url):
  11. # 请求头,模拟浏览器登录
  12. headers = {
  13. 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36'}
  14. # 访问链接,获取HTML
  15. r = requests.get(url, headers=headers)
  16. retext = r.text
  17. # 解析数据
  18. ht = html.fromstring(retext)
  19. # 使用xpath获取
  20. city = ht.xpath('/html/body/div[5]/div/div[1]/ul/li')
  21. for i in city:
  22. name = i.xpath('./h3/a/text()')[0]
  23. beento = i.xpath('./p[@class="beento"]/text()')[0]
  24. list = i.xpath('./p[@class="pois"]/a/text()')
  25. list2 = ''
  26. # for j in list:
  27. # list2=list2+','+j.strip()
  28. # print(name,beento,list2[1:])
  29. list = [place.strip() for place in list]
  30. list2 = ','.join(list)
  31. datalist = [name, beento, list2]
  32. ws.append(datalist)
  33. for i in range(1, 10):
  34. url = 'https://place.qyer.com/china/citylist-0-0-{}/'.format(i)
  35. getpage(url)
  36. # Excel保存
  37. fileanme = "D:\software\pycharm\pythonProject\04-大三下\课后作业\第3部分爬虫" # 路径可以自己设置,我这里是python源文件同级目录
  38. wb.save("旅游景点.xlsx")

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小小林熬夜学编程/article/detail/372151
推荐阅读
相关标签
  

闽ICP备14008679号