赞
踩
最近做的一些项目需要获得国内某地的天气,所以写了一个爬取天气的程序。
使用的库有requests、lxml
爬取天气的网站为https://www.weatherol.cn/
json解析网站https://www.json.cn/
cityid获取https://blog.csdn.net/li_and_li/article/details/79602686
打开网站https://www.weatherol.cn/,进入开发者选项,刷新一下,包过滤选择XHR,可以看到如下几个请求。
可以看到,请求天气信息的API应为。
https://www.weatherol.cn/api/home/getCurrAnd15dAnd24h?cityid=101180301
参数cityid为城市统一编码。
然后,我们将返回的json解析一下,看看里面都有什么信息。
可以看到,15天内的信息都在这里面,我们可以自行提取需要的信息
程序的主要结构为使用requests发出post请求,再使用parse_data()函数解析响应
代码如下:
import json from typing import Union import requests # API city_weather_url = 'http://www.weatherol.cn/api/home/getCurrAnd15dAnd24h' # ua user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36' def get_headers(): """ 获得请求头部 return: headers: dict """ headers = { 'user_agent': user_agent } return headers def parse_data(weather_data): """ 解析返回的数据,提取有用的内容 return: weather_dict: dict """ ret_dict = {} ret_dict['当前天气'] = weather_data['data']['current']['current']['weather'] ret_dict['当前温度'] = weather_data['data']['current']['current']['temperature'] high = weather_data['data']['forecast15d'][1]['temperature_am'] low = weather_data['data']['forecast15d'][1]['temperature_pm'] ret_dict['今日温度'] = low + ' - ' + high ret_dict['风向'] = weather_data['data']['current']['current']['winddir'] ret_dict['风速'] = weather_data['data']['current']['current']['windpower'] ret_dict['气压'] = weather_data['data']['current']['current']['airpressure'] + 'hpa' ret_dict['湿度'] = weather_data['data']['current']['current']['humidity'] + '%' aqi = weather_data['data']['current']['air']['AQI'] level = weather_data['data']['current']['air']['levelIndex'] ret_dict['空气质量'] = aqi + '/' +level ret_dict['小提示'] = weather_data['data']['current']['tips'] return ret_dict def get_weather(city_id) -> Union[None, dict]: """ 根据城市ID获取天气信息 """ params = { 'cityid': city_id } # 发出post请求 response = requests.get(url=city_weather_url, headers=get_headers(), params=params) weather_json = response.text # 转换返回的字符串为json并解析 weather_data = json.loads(weather_json) weather_dict = parse_data(weather_data) print(response) return weather_dict def test(): weather_dict = get_weather('101180301') print(weather_dict) if __name__ == '__main__': test()
运行一下
现在,我们已经可以通过cityid来获取15天内的所有天气信息,但是,我们怎么来获取cityid呢?
我们发现网上有很多人都已经汇总好了国内所有城市的ID,我们只需要将其解析储存到本地,使用的时候再去检索就好了。
通过百度,我找到了一个比较好爬取的博客。
https://blog.csdn.net/li_and_li/article/details/79602686
直接F12看下网页结构并用Xpath Helper这个工具解析一下xpath
代码如下
import requests import json from lxml import etree url = 'https://blog.csdn.net/li_and_li/article/details/79602686' headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36' } def parse_html(html): """ 解析网页,返回网页内所有城市的名字与ID """ et = etree.HTML(html) citys = et.xpath('//div[@id="content_views"]/p/text()') ret_dict = {} for city in citys: try: city_info = city.split(',') city_id = city_info[0] city_name = city_info[1] ret_dict[city_name] = city_id except: print('err str: ' + city) continue return ret_dict response = requests.get(url, headers=headers) html = response.text city_info = parse_html(html) with open('city_id.json', 'w', encoding='utf8') as fp: fp.write(json.dumps(city_info, ensure_ascii=False))
爬取后的结果
之后,我们就可以从这个文件里面得到cityid然后使用上面写的天气爬虫爬取天气了!!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。