赞
踩
在此记录一下自己的爬虫练习之路
我猜来看我这篇文章的基本都是小白或者是有一点基础想要进一步提高自己实力的'爬虫'们
让我们一起共同进步吧!
我会尽量将每一步都讲出来,尽量让大家看得明白,现在开始吧
网址:某狗音乐
- def guid():
- num = 1 + random.random()
- res = hex(int(65536 * num))[3:]
- return res
- GUID = guid() + guid() + "-" + guid() + "-" + guid() + "-" + guid() + "-" + guid() + guid() + guid()
然后我们再次查看e.Md5(),其实就是一个md5加密,有个简答的方法来判断md5加密是否有被魔改,就是直接用当前js的md5加密固定内容查看结果,再到别的加密平台加密同样的内容查看结果是否一致,俺是直接用python的hashlib库的md5方法.
- import hashlib
-
- def gen_md5(word):
- word = ''.join([x for x in word])
- encode_word = word.encode('utf-8')
- return hashlib.md5(encode_word).hexdigest()
一致,那就好说了,mid以及uuid加密方法已经到手,接下来是加密参数signature
- def gen_md5(word):
- word = ''.join([x for x in word])
- encode_word = word.encode('utf-8')
- return hashlib.md5(encode_word).hexdigest()
-
- def guid():
- num = 1 + random.random()
- res = hex(int(65536 * num))[3:]
- return res
- GUID = guid() + guid() + "-" + guid() + "-" + guid() + "-" + guid() + "-" + guid() + guid() + guid()
-
- mid = gen_md5(GUID)
再次全局搜索,打上断点,重新搜索歌曲,断点已经成功断住,此处就是signature的加密来源
经过多次请求,打印内容,发现s其实是个列表,而列表内容是请求接口所需的参数,拼接而成的字符串,我们不用那么麻烦,直接复制他的字符串就行,不过里面有几个地方需要修改
s解决了,我们再看看方法d,打上断点,重新请求,单步调试进入方法,变量t为刚刚传入的明文
这里说一下方法d,我当时直觉告诉我是md5加密,然后我就这么做了,而且成功了,确实是那样;至于方法d的js代码,调试的时候并不是直接的md5,而是将字符串转为列表然后拼接成的密文,着实没有头绪,如果有人知道是什么原理还请讲解一下~
那么现在,方法d我们就直接md5加密即可,完整请求代码如下
- headers = {
- "authority": "complexsearch.kugou.com",
- "accept": "*/*",
- "accept-language": "zh-CN,zh;q=0.9",
- "cache-control": "no-cache",
- "pragma": "no-cache",
- "referer": "https://www.kugou.com/",
- "sec-ch-ua": "\"Not.A/Brand\";v=\"8\", \"Chromium\";v=\"114\", \"Google Chrome\";v=\"114\"",
- "sec-ch-ua-mobile": "?0",
- "sec-ch-ua-platform": "\"Windows\"",
- "sec-fetch-dest": "script",
- "sec-fetch-mode": "no-cors",
- "sec-fetch-site": "same-site",
- "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
- }
-
- def gen_md5(word):
- word = ''.join([x for x in word])
- encode_word = word.encode('utf-8')
- return hashlib.md5(encode_word).hexdigest()
-
- url = "https://complexsearch.kugou.com/v2/search/song"
-
- params = {
- "callback": "callback123",
- "srcappid": "2919",
- "clientver": "1000",
- "clienttime": "", # 时间戳
- "mid": "",
- "uuid": "",
- "dfid": "",
- "keyword": "", # 歌名
- "page": "1",
- "pagesize": "30",
- "bitrate": "0",
- "isfuzzy": "0",
- "inputtype": "0",
- "platform": "WebFilter",
- "userid": "0",
- "iscorrection": "1",
- "privilege_filter": "0",
- "filter": "10",
- "token": "",
- "appid": "1014",
- "signature": "" # 加密
- }
- def guid():
- num = 1 + random.random()
- res = hex(int(65536 * num))[3:]
- return res
- GUID = guid() + guid() + "-" + guid() + "-" + guid() + "-" + guid() + "-" + guid() + guid() + guid()
-
- def gen_params(word):
- timestamp = int(time.time() * 1000)
- dfid = '-' # dfid经过本人多次测试发现为-即可
- keyword = word
- mid = gen_md5(GUID)
- t = f'NVPh5oo715z5DIWAeQlhMDsWXXQV4hwtappid=1014bitrate=0callback=callback123clienttime={timestamp}clientver=1000dfid={dfid}filter=10inputtype=0iscorrection=1isfuzzy=0keyword={keyword}mid={mid}page=1pagesize=30platform=WebFilterprivilege_filter=0srcappid=2919token=userid=0uuid={mid}NVPh5oo715z5DIWAeQlhMDsWXXQV4hwt'
- signature = gen_md5(t)
- params['clienttime'] = timestamp
- params['dfid'] = dfid
- params['keyword'] = keyword
- params['mid'] = mid
- params['uuid'] = mid
- params['signature'] = signature
- return params
-
- response = requests.get(url, headers=headers, params=gen_params(word)).text
我的第一篇博客,终于写完了,嘎嘎
如果什么地方有疑问可以在评论区留言,我看到会回复的
如果什么地方有问题,还请各位大佬帮忙指出
感谢
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。