赞
踩
1、爬取评论的发布者
2、爬取评论发布的时间
3、爬取评论的内容
在爬取这个携程数据时,将使用selenium自动化的去获取网页数据将网页数据下载下来,使用的是chrom驱动程序,打开网页,如果不会配置,请在评论区提出,我会补录此段:望本文对您有所帮助:
from scrapy import Selector from selenium import webdriver import time # 声明浏览器 browser = webdriver.Chrome () browser.get ("URL(请自行补充携程网页地址)") def parse_page(): sel = Selector (text=browser.page_source) time.sleep (1) authors = sel.xpath ('//div[@class="user-date"]/span/text()').extract () # write_times=sel.xpath('//div[@class="user-date"]/span/text()').extract()[i] comments = sel.xpath (' //ul[@class="comments"]/li/p/text()').extract () # print (authors) # # print(write_times) # print (comments) author = authors[::3] # print (author) time_comments = authors[2::3] for author, time_comment, comment in zip (author, time_comments, comments): with open ('评论.txt', 'a+', encoding='utf-8') as f: f.write ( "评论人:" + author + '\t' + "评论时间" + time_comment + '\t' + "评论内容:" + comment.strip ( '\n') + '\n') bonwon = browser.find_element_by_xpath ('//ul[@class="pkg_page"]/a[last()]') bonwon.click () for i in range (0, 15): parse_page () if __name__ == '__main__': parse_page ()
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。