当前位置:   article > 正文

爬取豆瓣电影评论内容、星级、评论时间、支持人数_怎么爬取星级

怎么爬取星级

      大家好,我是带我去滑雪,每天教你一个小技巧!

      本期爬取豆瓣电影评论人、评论时间、星级、支持人数、评论内容。话不多说,直接上代码:

  1. import requests
  2. from bs4 import BeautifulSoup
  3. import pandas as pd
  4. import time
  5. items=[]
  6. for i in range(0,25):
  7. url=f'https://movie.douban.com/subject/30334073/comments?start={20*i}&limit=20=P&sort=new_score'
  8. headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36',
  9. 'Referer':'https://movie.douban.com/subject/30334073/comments?sort=time&status=P',
  10. 'Cookie':'bid=4HaXgwTES9U; __gads=ID=85e62e18d05513eb-2291e0501ccb00d5:T=1629877067:RT=1629877067:S=ALNI_MZYsnYWOu5VfO1vceNcKg66gwaMZQ; ll="118209"; __yadk_uid=ccg5plgEoNnVKRg6YOB3aKAChcQneXdk; _vwo_uuid_v2=DD8C0C94BE8722E387E94ECAB6722025A|642230c75b7a8e04a58060320d542d9e; ct=y; push_doumail_num=0; push_noty_num=0; _ga=GA1.2.637371737.1629877067; UM_distinctid=17bd361c41028e-096ad5aa89803-a7d193d-1fa400-17bd361c411840; Hm_lvt_19fc7b106453f97b6a84d64302f21a04=1631339005; __utmv=30149280.6183; ap_v=0,6.0; __utmc=30149280; __utmz=30149280.1632719355.16.2.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; __utmc=223695111; __utmz=223695111.1632719356.13.5.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; __utma=30149280.637371737.1629877067.1632719355.1632722102.17; __utma=223695111.1603523566.1629877067.1632719356.1632722102.14; __utmb=223695111.0.10.1632722102; _pk_ref.100001.4cf6=%5B%22%22%2C%22%22%2C1632722102%2C%22https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3DubNOD-vH_WgE_3tx3fkI3PF0djcVWGVrXh1AaMJu2SH2-5ojOwvOmXLUmvW-Sk2R%26wd%3D%26eqid%3D97dfe06d000c888d00000003615151f6%22%5D; _pk_ses.100001.4cf6=*; __utmb=30149280.3.10.1632722102; dbcl2="150297594:qnZRek3HTwI"; ck=_D-k; _pk_id.100001.4cf6=6a177a97f3dfd6a4.1629877067.14.1632724817.1632719534.'}
  11. r=requests.get(url,headers=headers)
  12. time.sleep(1)
  13. text=r.text
  14. soup=BeautifulSoup(r.text,'html.parser')
  15. comments_list=soup.find_all('div',class_="comment-item")
  16. for comment in comments_list:
  17. votes=comment.find('span',class_='votes vote-count').text
  18. content=comment.find('span',class_='short').text
  19. author=comment.find('span',class_="comment-info").find('a').text
  20. comment_time=comment.find('span',class_="comment-time").get('title')
  21. star=comment.find('span',class_="comment-info").find_all('span')[1].get('class')[0][-2]
  22. item=[author,comment_time,star,votes,content]
  23. items.append(item)
  24. df=pd.DataFrame(items,columns=['评论人','评论时间','星级','支持人数','评论内容'])
  25. df.to_csv('调音师.csv',encoding='utf_8_sig')

输出结果展示:

83b1e15c8fbf4f3b9a373047b2d5e143.png

需要数据集的家人们可以去百度网盘(永久有效)获取:

链接:https://pan.baidu.com/s/173deLlgLYUz789M3KHYw-Q?pwd=0ly6
提取码:2138 


更多优质内容持续发布中,请移步主页查看。

若有问题可邮箱联系:1736732074@qq.com 

博主的WeChat:TCB1736732074

   点赞+关注,下次不迷路!

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小小林熬夜学编程/article/detail/687009
推荐阅读
相关标签
  

闽ICP备14008679号