赞
踩
大家好,我是带我去滑雪,每天教你一个小技巧!
本期爬取豆瓣电影评论人、评论时间、星级、支持人数、评论内容。话不多说,直接上代码:
- import requests
- from bs4 import BeautifulSoup
- import pandas as pd
- import time
-
- items=[]
-
- for i in range(0,25):
- url=f'https://movie.douban.com/subject/30334073/comments?start={20*i}&limit=20=P&sort=new_score'
- headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36',
- 'Referer':'https://movie.douban.com/subject/30334073/comments?sort=time&status=P',
- 'Cookie':'bid=4HaXgwTES9U; __gads=ID=85e62e18d05513eb-2291e0501ccb00d5:T=1629877067:RT=1629877067:S=ALNI_MZYsnYWOu5VfO1vceNcKg66gwaMZQ; ll="118209"; __yadk_uid=ccg5plgEoNnVKRg6YOB3aKAChcQneXdk; _vwo_uuid_v2=DD8C0C94BE8722E387E94ECAB6722025A|642230c75b7a8e04a58060320d542d9e; ct=y; push_doumail_num=0; push_noty_num=0; _ga=GA1.2.637371737.1629877067; UM_distinctid=17bd361c41028e-096ad5aa89803-a7d193d-1fa400-17bd361c411840; Hm_lvt_19fc7b106453f97b6a84d64302f21a04=1631339005; __utmv=30149280.6183; ap_v=0,6.0; __utmc=30149280; __utmz=30149280.1632719355.16.2.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; __utmc=223695111; __utmz=223695111.1632719356.13.5.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; __utma=30149280.637371737.1629877067.1632719355.1632722102.17; __utma=223695111.1603523566.1629877067.1632719356.1632722102.14; __utmb=223695111.0.10.1632722102; _pk_ref.100001.4cf6=%5B%22%22%2C%22%22%2C1632722102%2C%22https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3DubNOD-vH_WgE_3tx3fkI3PF0djcVWGVrXh1AaMJu2SH2-5ojOwvOmXLUmvW-Sk2R%26wd%3D%26eqid%3D97dfe06d000c888d00000003615151f6%22%5D; _pk_ses.100001.4cf6=*; __utmb=30149280.3.10.1632722102; dbcl2="150297594:qnZRek3HTwI"; ck=_D-k; _pk_id.100001.4cf6=6a177a97f3dfd6a4.1629877067.14.1632724817.1632719534.'}
- r=requests.get(url,headers=headers)
- time.sleep(1)
- text=r.text
-
- soup=BeautifulSoup(r.text,'html.parser')
- comments_list=soup.find_all('div',class_="comment-item")
- for comment in comments_list:
- votes=comment.find('span',class_='votes vote-count').text
- content=comment.find('span',class_='short').text
- author=comment.find('span',class_="comment-info").find('a').text
- comment_time=comment.find('span',class_="comment-time").get('title')
- star=comment.find('span',class_="comment-info").find_all('span')[1].get('class')[0][-2]
- item=[author,comment_time,star,votes,content]
- items.append(item)
-
- df=pd.DataFrame(items,columns=['评论人','评论时间','星级','支持人数','评论内容'])
- df.to_csv('调音师.csv',encoding='utf_8_sig')
输出结果展示:
需要数据集的家人们可以去百度网盘(永久有效)获取:
链接:https://pan.baidu.com/s/173deLlgLYUz789M3KHYw-Q?pwd=0ly6
提取码:2138
更多优质内容持续发布中,请移步主页查看。
若有问题可邮箱联系:1736732074@qq.com
博主的WeChat:TCB1736732074
点赞+关注,下次不迷路!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。