赞
踩
import requests #headers = 网页右键->Network->最下面的User-Agent复制。 headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"} #你想要的网址 url = "https://www.xinpianchang.com/discover/article?from=navigator" response = requests.get(url, headers=headers) print(response)# 打印200 说明访问成功 #下面是正式的数据提取,构建xpath的对象 from lxml import etree tree = etree.HTML(response.text)#页面的元素树 ##示例1 标题 elements = tree.xpath('//h2[@class="truncate block"]')#在Elements寻找你想要的元素,可以在页面移动#光标 for element in elements: print(element.text) ##也可以在elements中右键copy xpath,这里需要分析一下,将复制的xpath删除一部分 print(tree.xpath('//*[@id="__next"]/section/main/div/div/div/div/div/a/div/ul/li[1]/span[2]')[0].text) print(tree.xpath('//*[@id="__next"]/section/main/div/div/div/div/div/a/div/ul/li[2]/span[2]')[0].text) print(tree.xpath('/html/body/div/section/main/div/div/div/div/div/div/a/h2')[1].text)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。