爬虫使用Beautiful Soup爬取网页信息示例代码（酷dog音乐）_3、利用beautifulsoup爬取酷狗音乐top22的排名、歌名和播放时间,并保存到新建的“

作者：盐析白兔 | 2024-07-21 23:16:48

踩

3、利用beautifulsoup爬取酷狗音乐top22的排名、歌名和播放时间,并保存到新建的“

爬虫代码示例

（酷狗音乐top500榜单内容的爬取）

前言

综合利用Requests，Beautiful Soup等第三方库爬取网页信息，尤其是爬虫内容的筛选，利用strip，以及split方法来进行内容的选择。

提示：以下是本篇文章正文内容，下面案例可供参考

一、实验目的？

掌握综合运用Requests，Xpath以及Beautiful Soup等第三方库爬取网页信息的方法。

二、实验过程

1.实验环境（pycharm）

import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent

2.需求说明

1）爬取酷狗音乐网站中酷狗Top500榜单中的前5页信息。

2）酷狗Top500网址：http://www.kugou.com/yy/rank/home/1-8888.html

3）该处使用的url网络请求的数据。因为网页版酷狗不能手动翻页进行下一步浏览，可以通过观察第一页URL：

http://www.kugou.com/yy/rank/home/1-8888.html

将数字1换成2、3等，每页显示22首歌曲。

3.实验代码


import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
 
 
def spider_kugou():
    for i in range(1, 6):
        url = f"https://www.kugou.com/yy/rank/home/{i}-8888.html"
        ua = UserAgent()
        headers = {'User-Agent': ua.chrome}
        resp = requests.get(url, headers=headers)
        soup = BeautifulSoup(resp.text, 'lxml')
        ranks = soup.select('span.pc_temp_num')
        songs = soup.select('div.pc_temp_songlist > ul > li > a')
        times = soup.select('span.pc_temp_time')
        for rank, song, time in zip(ranks, songs, times):
            temp = song.get_text().split(' - ')
            data = {
                "rank": rank.get_text().strip(),
                "song": temp[0].strip(),
                "singer": temp[1].strip(),
                "time": time.get_text().strip()
            }
            print(data)
 
 
if __name__ == '__main__':
    spider_kugou()