赞
踩
1. 什么是请求头请求体,响应头响应体 2. URL地址包括什么 3. get请求和post请求到底是什么 4. Content-Type是什么
HTTP协议是Hyper Text Transfer Protocol(超文本传输协议)的缩写,是用于万维网(WWW:World Wide Web )服务器与本地浏览器之间传输超文本的传送协议。HTTP是一个属于应用层的面向对象的协议,由于其简捷、快速的方式,适用于分布式超媒体信息系统。它于1990年提出,经过几年的使用与发展,得到不断地完善和扩展。HTTP协议工作于客户端-服务端架构为上。浏览器作为HTTP客户端通过URL向HTTP服务端即WEB服务器发送所有请求。Web服务器根据接收到的请求后,向客户端发送响应信息。
http协议包含由浏览器发送数据到服务器需要遵循的请求协议与服务器发送数据到浏览器需要遵循的请求协议。用于HTTP协议交互的信被为HTTP报文。请求端(客户端)的HTTP报文 做请求报文,响应端(服务器端)的 做响应报文。HTTP报文本身是由多行数据构成的字文本。
一个完整的URL包括:协议、ip、端口、路径、参数
例如: 百度安全验证 其中https是协议,www.baidu.com 是IP,端口默认80,/s是路径,参数是wd=yuan
请求方式: get与post请求
GET提交的数据会放在URL之后,以?分割URL和传输数据,参数之间以&相连,如EditBook?name=test1&id=123456. POST方法是把提交的数据放在HTTP包的请求体中.
GET提交的数据大小有限制(因为浏览器对URL的长度有限制),而POST方法提交的数据没有限制
响应状态码:状态码的职 是当客户端向服务器端发送请求时, 返回的请求 结果。借助状态码,用户可以知道服务器端是正常 理了请求,还是出 现了 。状态码如200 OK,以3位数字和原因组成。
import requests headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36", } res = requests.get( "https://www.baidu.com/", # headers=headers ) # 解析数据 with open("baidu.html", "w") as f: f.write(res.text)
import requests headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36", "Referer": "https://movie.douban.com/explore", } res = requests.get( "https://m.douban.com/rexxar/api/v2/movie/recommend?refresh=0&start=0&count=20&selected_categories=%7B%7D&uncollect=false&tags=", headers=headers ) # 解析数据 print(res.text)
import requests url = "https://stock.xueqiu.com/v5/stock/screener/quote/list.json?page=1&size=30&order=desc&orderby=percent&order_by=percent&market=CN&type=sh_sz" cookie = 'xq_a_token=a0f5e0d91bc0846f43452e89ae79e08167c42068; xqat=a0f5e0d91bc0846f43452e89ae79e08167c42068; xq_r_token=76ed99965d5bffa08531a6a47501f096f61108e8; xq_id_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOi0xLCJpc3MiOiJ1YyIsImV4cCI6MTY5NTUxNTc5NCwiY3RtIjoxNjkzMjAzODIzMzAwLCJjaWQiOiJkOWQwbjRBWnVwIn0.MCIGGTGaSPe9nVuXkyrXQTlCthdURSnDtqm8dGttO2XYHeaMPSKmHQvsJmbw3OJTRnkf0KHZvgF0W3Rv-9uYe4P2Wizt0g2QzQonONjUmExABmZX0e3ara8BzBQ3b96H7dm0LV4pdBlnOW0A9PUmGRouWM7kVUOGPvd3X7GkB7M_th8pV8SZo9Iz4nzjrwQzxPBa0DlS7whbeNeXMnbnmAPp7z-eG75vdE2Pb3OyZ5Gv-FINhpQtAWo95lTxZVw5C5VHSzbR_-z8uqH6DD0xop4_wvKw5LIVwu6ZZ6TUnNFr3zGU9jWqAGgdzcKgO38dlL6uXNixa9mrKOd1OZnDig; cookiesu=431693203848858; u=431693203848858; Hm_lvt_1db88642e346389874251b5a1eded6e3=1693203851; device_id=7971eba10048692a91d87e3dad9eb9ca; s=bv11kb1wna; Hm_lpvt_1db88642e346389874251b5a1eded6e3=1693203857' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36', "referer": "https://xueqiu.com/", "cookie": cookie, } res = requests.get(url, headers=headers) print(res.text)
import requests headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36", "Referer": "https://movie.douban.com/explore", } res = requests.get( "https://m.douban.com/rexxar/api/v2/movie/recommend?refresh=0&start=0&count=20&selected_categories=%7B%7D&uncollect=false&tags=", headers=headers, # params={ # 查询 # "count": "20", # "tags": "悬疑" # } ) # 解析数据 print(res.text)
import requests while 1: wd = input("请输入翻译内容:") res = requests.post("https://aidemo.youdao.com/trans?", params={}, headers={}, data={ "q": wd, "from": "Auto", "to": "Auto" }) print(res.json().get("translation")[0])
import requests # (1)下载图片 url = "https://pic.netbian.com/uploads/allimg/230812/202108-16918428684ab5.jpg" res = requests.get(url) # 解析数据 with open("a.jpg", "wb") as f: f.write(res.content) # (2)下载视频 url = "https://vd3.bdstatic.com/mda-nadbjpk0hnxwyndu/720p/h264_delogo/1642148105214867253/mda-nadbjpk0hnxwyndu.mp4?v_from_s=hkapp-haokan-hbe&auth_key=1693223039-0-0-e2da819f15bfb93409ce23540f3b10fa&bcevod_channel=searchbox_feed&pd=1&cr=2&cd=0&pt=3&logid=2639522172&vid=5423681428712102654&klogid=2639522172&abtest=112162_5" res = requests.get(url) # 解析数据 with open("美女.mp4", "wb") as f: f.write(res.content)
import requests import re import os # (1)获取当页所有的img url start_url = "https://pic.netbian.com/4kmeinv/" res = requests.get(start_url) img_url_list = re.findall("uploads/allimg/.*?.jpg", res.text) print(img_url_list) # (2)循环下载所有图片 for img_url in img_url_list: res = requests.get("https://pic.netbian.com/" + img_url) img_name = os.path.basename(img_url) with open(img_name, "wb") as f: f.write(res.content)
import base64 import json import requests def base64_api(uname, pwd, img, typeid): with open(img, 'rb') as f: base64_data = base64.b64encode(f.read()) b64 = base64_data.decode() data = {"username": uname, "password": pwd, "typeid": typeid, "image": b64} result = json.loads(requests.post("http://api.ttshitu.com/predict", json=data).text) if result['success']: return result["data"]["result"] else: # !!!!!!!注意:返回 人工不足等 错误情况 请加逻辑处理防止脚本卡死 继续重新 识别 return result["message"] if __name__ == "__main__":` img_path = "./v_code.jpg" result = base64_api(uname='yuan0316', pwd='yuan0316', img=img_path, typeid=3) print(result)
https://www.douyin.com/user/MS4wLjABAAAAMbqnWxzUfZegt9vrNBDz7zyqwhvG6vXiKTDxVm2wUD0
通过点击翻译按钮,触发相应的ajax请求,通过对响应的分析,webtranslate
是我们的目标URL,接下来判断哪一个数据是逆向值
再点击一次翻译,重新打开新的webtranslate
请求,对比两次的请求体数据,哪些是变化的,即我们的逆向值。
通过对比,我们发现变化的有两个,第一个是sign值,第二个是mysticTime,这两个值每次发请求都不一样,如果直接copy使用,很有可能会失败,所以需要找到他们的生成位置,实现完整模拟。
那么本地关于这个网站的JS代码那么多,怎么能快速定位到逆向值生成位置就是整个逆向的关键。今天先交给大家逆向干货第一招:关键字搜索,比如sign值,我们思考,应该会有个函数最终返回该值,那么接下来呢,大概率应该要赋值给键sign
,最后组装到请求体对象中,所以我们可以猜测,在sign
关键字附近很大概率可能有sign的生成函数,当然这一招不是万能的,没有一招是屡试不爽的,都是综合分析应用。所以我们进行关键字搜索sign
。
通过搜索,我们发现sign的结果相当多(一般三五个比较理想),这时就不太方便进行确认具体哪一个是具体目标了。
所以,sign
关键字不友好,那就换mysticTime
,因为mysticTime
的键值和sign
的键值很近。
相对少一些,可以试着看一看,结果发现一个w函数包含mysticTime
的键值和sign
的键值,且sign
值是一个函数,很像是是我们的目标,那么如何确认呢?
干货第二招:加断点确认“嫌疑犯”
,在sign值行打上断点,再次点击翻译按钮,因为,点击事件一定会经过真正的构建sign值位置,如果“嫌疑位置”是目标入口,那么必然会被断点断住,如果没有断住,就一定是“被冤枉了”,不是我们的目标URL。结果如下:
sign
值行变绿色,上方出现断点调试,说明该位置就是sign的生成入口,接下来鼠标悬浮到k位置:
悬浮弹出的蓝色链接app.3b85caff.js:1
就是k函数的位置,点击即可直接进入k函数中。同时,注意k函数调用的两个实参o
和e
的值,上面代码,const o = (new Date).getTime();
,o
值即当前时间戳,下面的mysticTime: o
,所以mysticTime
也就是当前时间戳。那么e
的值说多少呢,其实在爬虫过程中经常要检测变量值,最简单的方式就是,在控制台直接打印:
这里一定一定要注意,一定是断点在哪个函数,打印该函数的位置,千万不要出现断点在A函数,打印B函数的变量,因为两个函数可能有相同变量名,但是值不同,由此造成的混淆。
好了接下来我们直接定位k函数:
其实就在w函数的上面。
现在我们已经找到创建sign值的位置,接下来就是将这套生成策略通过我们的代码实现,这里有两种思路:
将js的生成逻辑转为python实现
将js的代码直接拷贝到本地执行,实现JS逆向
通过k函数和j函数的阅读理解,首先通过几个变量组成一个字符串,然后对该字符串生成md5值,本身md5算法是固定参数生成固定值,但是因为参数中有个时间戳,导致sign每次都会发生变化,所以Python的逆向实现:
import requests import time import hashlib def get_md5(val, is_hex=True): md5 = hashlib.md5() md5.update(val.encode()) if is_hex: return md5.hexdigest() else: return md5.digest() url = "https://dict.xxx.com/webtranslate" # (1)构建逆向动态值 mysticTime = str(int(time.time() * 1000)) d = 'fanyideskweb' e = mysticTime u = 'webfanyi' t = 'fsdsogkndfokasodnaso' s = f"client={d}&mysticTime={e}&product={u}&key={t}" sign = get_md5(s) print("sign:::",sign) # (2)请求模拟 data = { "i": "apple", "from": "auto", "to": "", "dictResult": "true", "keyid": "webfanyi", "sign": sign, "client": "fanyideskweb", "product": "webfanyi", "appVersion": "1.0.0", "vendor": "web", "pointParam": "client,mysticTime,product", "mysticTime": mysticTime, "keyfrom": "fanyi.web", "mid": 1, "screen": 1, "model": 1, "network": "wifi", "abtest": 0, "yduuid": "abcdefg", } my_headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36", "Referer": "https://fanyi.xxx.com/", "Cookie": "OUTFOX_SEARCH_USER_ID_NCOO=1837136861.99783; OUTFOX_SEARCH_USER_ID=2039883963@103.156.184.202; UM_distinctid=18acc0c423c8a-067b7d9f92c33d-18525634-1d73c0-18acc0c423d1300; P_INFO=golang13121758648; ANTICSRF=cleared; NTES_OSESS=cleared; S_OINFO=" } res = requests.post(url, data=data, headers=my_headers) print(res.text)
结果:
sign::: 7bd3a03476323b1866ef981bbcd4f300 Z21kD9ZK1ke6ugku2ccWu-MeDWh3z252xRTQv-wZ6jddVo3tJLe7gIXz4PyxGl73nSfLAADyElSjjvrYdCvEP4pfohVVEX1DxoI0yhm36ytQNvu-WLU94qULZQ72aml6JKK7ArS9fJXAcsG7ufBIE0gd6fbnhFcsGmdXspZe-8whVFbRB_8Fc9JlMHh8DDXnskDhGfEscN_rfi-A-AHB3F9Vets82vIYpkGNaJOft_JA-m5cGEjo-UNRDDpkTz_NIAvo5PbATpkh7PSna2tHcE6Hou9GBtPLB67vjScwplB96-zqZKXJJEzU5HGF0oPDY_weAkXArzXyGLBPXFCnn_IWJDkGD4vqBQQAh2n52f48GD_cb-PSCT_8b-ESsKUI9NJa11XsdaUZxAc8TzrYnXwdcQbtl_kZGKhS6_rCtuNEBouA_lvM2CbS7TTtV2U4zVmJKpp-c6nt3yZePK3Av01GWn1pH_3sZbaPEx8DUjSbdp4i4iK-Mj4p2HPoph67DR7B9MFETYku_28SgP9xsKRRvFH4aHBHESWX4FDbwaU=
很多JS代码的实现逻辑并不会像这个案例那么简单,所以绝大多数的逆向要依靠扣JS
代码实现,这种方式不需要理解实现逻辑,只需要转换环境模拟。
我们需要将构建逆向值的代码拷贝到本地js文件中,下载一个node.js
来解释运行这部分拷贝代码,缺什么补什么,报什么错定向解决,目的就是能和浏览器执行这段js代码一样顺利生成逆向值。代码如下:
这里涉及到算法相关的库时,不需要再去网站中找,直接调用即可
const cryptoJs = require("crypto") const u = "fanyideskweb" , d = "webfanyi" , m = "client,mysticTime,product" , p = "1.0.0" , A = "web" , g = "fanyi.web" , b = 1 , h = 1 , f = 1 , v = "wifi" , O = 0; function j(e) { return cryptoJs.createHash("md5").update(e.toString()).digest("hex") } function k(e, t) { return j(`client=${u}&mysticTime=${e}&product=${d}&key=${t}`) } function get_sign() { let e = (new Date).getTime(); let t = 'fsdsogkndfokasodnaso' let sign = k(e, t) return [sign,e] } console.log(get_sign())
接下来python通过execjs
库调用js的方法,实现最终的JS逆向:
import execjs import requests url = "https://dict.xxx.com/webtranslate" # (1)获取JS逆向动态值 with open("xxx.js") as f: js_code = f.read() js_compile = execjs.compile(js_code) sign,mysticTime = js_compile.call("get_sign") print("sign:::",sign,mysticTime) # (2)请求模拟 data = { "i": "apple", "from": "auto", "to": "", "dictResult": "true", "keyid": "webfanyi", "sign": sign, "client": "fanyideskweb", "product": "webfanyi", "appVersion": "1.0.0", "vendor": "web", "pointParam": "client,mysticTime,product", "mysticTime": mysticTime, "keyfrom": "fanyi.web", "mid": 1, "screen": 1, "model": 1, "network": "wifi", "abtest": 0, "yduuid": "abcdefg", } my_headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36", "Referer": "https://fanyi.xxx.com/", "Cookie": "OUTFOX_SEARCH_USER_ID_NCOO=1837136861.99783; OUTFOX_SEARCH_USER_ID=2039883963@103.156.184.202; UM_distinctid=18acc0c423c8a-067b7d9f92c33d-18525634-1d73c0-18acc0c423d1300; P_INFO=golang13121758648; ANTICSRF=cleared; NTES_OSESS=cleared; S_OINFO=" } res = requests.post(url, data=data, headers=my_headers) print(res.text)
结果:
sign::: 4a9052136bb96142a5b554cb6772938b 1700796415319 Z21kD9ZK1ke6ugku2ccWu-MeDWh3z252xRTQv-wZ6jddVo3tJLe7gIXz4PyxGl73nSfLAADyElSjjvrYdCvEP4pfohVVEX1DxoI0yhm36ytQNvu-WLU94qULZQ72aml6JKK7ArS9fJXAcsG7ufBIE0gd6fbnhFcsGmdXspZe-8whVFbRB_8Fc9JlMHh8DDXnskDhGfEscN_rfi-A-AHB3F9Vets82vIYpkGNaJOft_JA-m5cGEjo-UNRDDpkTz_NIAvo5PbATpkh7PSna2tHcE6Hou9GBtPLB67vjScwplB96-zqZKXJJEzU5HGF0oPDY_weAkXArzXyGLBPXFCnn_IWJDkGD4vqBQQAh2n52f48GD_cb-PSCT_8b-ESsKUI9NJa11XsdaUZxAc8TzrYnXwdcQbtl_kZGKhS6_rCtuNEBouA_lvM2CbS7TTtV2U4zVmJKpp-c6nt3yZePK3Av01GWn1pH_3sZbaPEx8DUjSbdp4i4iK-Mj4p2HPoph67DR7B9MFETYku_28SgP9xsKRRvFH4aHBHESWX4FDbwaU=
AES是一种对称加密,所谓对称加密就是加密与解密使用的秘钥是一个。
常见的对称加密: AES, DES, 3DES. 我们这里讨论AES。
安装:
pip install pycryptodome
AES 加密最常用的模式就是 CBC 模式和 ECB模式 ,当然还有很多其它模式,他们都属于AES加密。ECB模式和CBC 模式俩者区别就是 ECB 不需要 iv偏移量,而CBC需要。
""" 长度 16: *AES-128* 24: *AES-192* 32: *AES-256* MODE 加密模式. 常见的ECB, CBC ECB:是一种基础的加密方式,密文被分割成分组长度相等的块(不足补齐),然后单独一个个加密,一个个输出组成密文。 CBC:是一种循环模式,前一个分组的密文和当前分组的明文异或或操作后再加密,这样做的目的是增强破解难度。 """
CBC加密案例(选择aes-128):
from Crypto.Cipher import AES from Crypto.Util.Padding import pad import base64 key = '0123456789abcdef'.encode() # 秘钥: 因为aes-128模式,所以必须16字节 iv = b'abcdabcdabcdabcd' # 偏移量:因为aes-128模式,所以必须16字节 text = 'alex is a monkey!' # 加密内容,因为aes-128模式,所以字节长度必须是16的倍数 # while len(text.encode('utf-8')) % 16 != 0: # 如果text不足16位的倍数就用空格补足为16位 # text += '\0' text = pad(text.encode(), 16) print("完整text:", text) aes = AES.new(key, AES.MODE_CBC, iv) # 创建一个aes对象 en_text = aes.encrypt(text) # 加密明文 print("aes加密数据:::", en_text) # b"_\xf04\x7f/R\xef\xe9\x14#q\xd8A\x12\x8e\xe3\xa5\x93\x96'zOP\xc1\x85{\xad\xc2c\xddn\x86" en_text = base64.b64encode(en_text).decode() # 将返回的字节型数据转进行base64编码 print(en_text) # X/A0fy9S7+kUI3HYQRKO46WTlid6T1DBhXutwmPdboY=
CBC解密案例:
from Crypto.Cipher import AES import base64 from Crypto.Util.Padding import unpad key = '0123456789abcdef'.encode() iv = b'abcdabcdabcdabcd' aes = AES.new(key, AES.MODE_CBC, iv) text = 'X/A0fy9S7+kUI3HYQRKO46WTlid6T1DBhXutwmPdboY='.encode() # 需要解密的文本 ecrypted_base64 = base64.b64decode(text) # base64解码成字节流 source = aes.decrypt(ecrypted_base64) # 解密 print("aes解密数据:::", source.decode()) print("aes解密数据:::", unpad(source, 16).decode())
在Python中进行AES加密解密时,所传入的密文、明文、秘钥、iv偏移量、都需要是bytes(字节型)数据。python 在构建aes对象时也只能接受bytes类型数据。
当秘钥,iv偏移量,待加密的明文,字节长度不够16字节或者16字节倍数的时候需要进行补全。
CBC模式需要重新生成AES对象,为了防止这类错误,无论是什么模式都重新生成AES对象就可以了。
很多网站请求模拟实现获取到的就是明文数据了,但是这个网站对相应数据也做了加密,所以要相应的解密。那么思路也是和请求一样,先定位解密位置。解密位置破解相对比较容易,因为请求成功后都会有一个回调函数,用于处理相应数据,客户端解密代码一般在这个回调函数中可以快速追溯。
js执行中有一个很重要的概念叫调用堆栈,即a函数调用了b函数,b函数调用了c,那么执行过程就是a->b>c
,断点如果断在c函数中,此时的调用堆栈就是c->b->a
,就是可以显示c函数由b
调用,而b
由a
调用的这层关系。
此时断点位置是sign
构建,那么这个w
函数通过调用堆栈去找真正发送ajax请求时候的回调函数,在那里去找解密代码。所以发现代码如下:
可以通过断点调试,decodeData
前数据依然是加密的,执行完decodeData
,数据解密完成。
定位函数位置:
很明显,这里是aes解密,128-cbc模式,所以只要有key和iv即可破解,断点进入该函数,对代码分析,a和c分别就是key和iv,都是通过y函数对o,n计算出来,所以先打印确定o和n:
o和n分别是两个固定字符串,最后定位y函数:
所以确定key和iv分别是这两个字符串的md5值。
解密也是可以通过Python实现或copy JS完成
import requests import time import hashlib import base64 from Crypto.Cipher import AES def get_md5(val, is_hex=True): md5 = hashlib.md5() md5.update(val.encode()) if is_hex: return md5.hexdigest() else: return md5.digest() url = "https://dict.xxx.com/webtranslate" # (1)构建逆向动态值 mysticTime = str(int(time.time() * 1000)) print(mysticTime) d = 'fanyideskweb' e = mysticTime u = 'webfanyi' t = 'fsdsogkndfokasodnaso' s = f"client={d}&mysticTime={e}&product={u}&key={t}" print("s:::", s) sign = get_md5(s) # (2)请求模拟 data = { "i": "apple", "from": "auto", "to": "", "dictResult": "true", "keyid": "webfanyi", "sign": sign, "client": "fanyideskweb", "product": "webfanyi", "appVersion": "1.0.0", "vendor": "web", "pointParam": "client,mysticTime,product", "mysticTime": mysticTime, "keyfrom": "fanyi.web", "mid": 1, "screen": 1, "model": 1, "network": "wifi", "abtest": 0, "yduuid": "abcdefg", } my_headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36", "Referer": "https://fanyi.xxx.com/", "Cookie": "OUTFOX_SEARCH_USER_ID_NCOO=1837136861.99783; OUTFOX_SEARCH_USER_ID=2039883963@103.156.184.202; UM_distinctid=18acc0c423c8a-067b7d9f92c33d-18525634-1d73c0-18acc0c423d1300; P_INFO=golang13121758648; ANTICSRF=cleared; NTES_OSESS=cleared; S_OINFO=" } res = requests.post(url, data=data, headers=my_headers) # (3)解码和解密数据 res_encrypt_base64 = res.text.replace("-", "+").replace("_", "/") print("res_encrypt_base64:::", res_encrypt_base64) # 解码 res_encrypt = base64.b64decode(res_encrypt_base64) print("res_encrypt:::", res_encrypt) # AES解密 # 密钥 o = 'ydsecret://query/key/B*RGygVywfNBwpmBaZg*WT7SIOUP2T0C9WHMZN39j^DAdaZhAnxvGcCY6VYFwnHl' n = 'ydsecret://query/iv/C@lZe2YzHtZ2CYgaXKSVfsb7Y4QWHjITPPZ0nQp87fBeJ!Iv6v^6fvi2WN@bYpJ4' key = get_md5(o, is_hex=False) # 偏移量 iv = get_md5(n, is_hex=False) # 解密 # 构建aes算法对象 aes = AES.new(key, AES.MODE_CBC, iv) source_data = aes.decrypt(res_encrypt).decode() print("source_data:", source_data)
拷贝的JS代码整理,xxx.js
const cryptoJs = require("crypto") const u = "fanyideskweb" , d = "webfanyi" , m = "client,mysticTime,product" , p = "1.0.0" , A = "web" , g = "fanyi.web" , b = 1 , h = 1 , f = 1 , v = "wifi" , O = 0; function j(e) { return cryptoJs.createHash("md5").update(e.toString()).digest("hex") } function k(e, t) { return j(`client=${u}&mysticTime=${e}&product=${d}&key=${t}`) } function get_sign() { let e = (new Date).getTime(); let t = 'fsdsogkndfokasodnaso' let sign = k(e, t) return [sign, e] } console.log(get_sign()) // 解密 function y(e) { return cryptoJs.createHash("md5").update(e).digest() } function jieMi(t) { let o = 'ydsecret://query/key/B*RGygVywfNBwpmBaZg*WT7SIOUP2T0C9WHMZN39j^DAdaZhAnxvGcCY6VYFwnHl' let n = 'ydsecret://query/iv/C@lZe2YzHtZ2CYgaXKSVfsb7Y4QWHjITPPZ0nQp87fBeJ!Iv6v^6fvi2WN@bYpJ4' if (!t) return null; const a = y(o) , c = y(n) , r = cryptoJs.createDecipheriv("aes-128-cbc", a, c); let s = r.update(t, "base64", "utf-8"); return s += r.final("utf-8"), s } console.log(jieMi('Z21kD9ZK1ke6ugku2ccWu-MeDWh3z252xRTQv-wZ6jddVo3tJLe7gIXz4PyxGl73nSfLAADyElSjjvrYdCvEP4pfohVVEX1DxoI0yhm36ytQNvu-WLU94qULZQ72aml6JKK7ArS9fJXAcsG7ufBIE0gd6fbnhFcsGmdXspZe-8 whVFbRB_8Fc9JlMHh8DDXnskDhGfEscN_rfi-A-AHB3F9Vets82vIYpkGNaJOft_JA-m5cGEjo-UNRDDpkTz_NIAvo5PbATpkh7PSna2tHcE6Hou9GBtPLB67vjScwplB96-zqZKXJJEzU5HGF0oPDY_weAkXArzXyGLBPXFCnn_IWJDkGD4vqBQQAh2n52f48GD_cb-PSCT_8b-ESsKUI9NJa11XsdaUZxAc8TzrYnXwdcQbtl_kZGKhS6_rCtuNEBouA_lvM2CbS7TTtV2U4zVmJKpp-c6nt3yZePK3Av01GWn1pH_3sZbaPEx8DUjSbdp4i4iK-Mj4p2HPoph67DR7B9MFETYku_28SgP9xsKRRvFH4aHBHESWX4FDbwaU='))
Python调用JS完成:
import execjs import requests url = "https://dict.xxx.com/webtranslate" # (1)获取JS逆向动态值 with open("xxx.js") as f: js_code = f.read() js_compile = execjs.compile(js_code) sign,mysticTime = js_compile.call("get_sign") print("sign:::",sign,mysticTime) # (2)请求模拟 data = { "i": "apple", "from": "auto", "to": "", "dictResult": "true", "keyid": "webfanyi", "sign": sign, "client": "fanyideskweb", "product": "webfanyi", "appVersion": "1.0.0", "vendor": "web", "pointParam": "client,mysticTime,product", "mysticTime": mysticTime, "keyfrom": "fanyi.web", "mid": 1, "screen": 1, "model": 1, "network": "wifi", "abtest": 0, "yduuid": "abcdefg", } my_headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36", "Referer": "https://fanyi.xxx.com/", "Cookie": "OUTFOX_SEARCH_USER_ID_NCOO=1837136861.99783; OUTFOX_SEARCH_USER_ID=2039883963@103.156.184.202; UM_distinctid=18acc0c423c8a-067b7d9f92c33d-18525634-1d73c0-18acc0c423d1300; P_INFO=golang13121758648; ANTICSRF=cleared; NTES_OSESS=cleared; S_OINFO=" } res = requests.post(url, data=data, headers=my_headers) print(res.text) # 解密数据 data = js_compile.call("jieMi",res.text) print("解密后的数据:",data)
结果:
sign::: 554c48439c46e29931f54be1b724df3f 1700799183632 Z21kD9ZK1ke6ugku2ccWu-MeDWh3z252xRTQv-wZ6jddVo3tJLe7gIXz4PyxGl73nSfLAADyElSjjvrYdCvEP4pfohVVEX1DxoI0yhm36ytQNvu-WLU94qULZQ72aml6JKK7ArS9fJXAcsG7ufBIE0gd6fbnhFcsGmdXspZe-8whVFbRB_8Fc9JlMHh8DDXnskDhGfEscN_rfi-A-AHB3F9Vets82vIYpkGNaJOft_JA-m5cGEjo-UNRDDpkTz_NIAvo5PbATpkh7PSna2tHcE6Hou9GBtPLB67vjScwplB96-zqZKXJJEzU5HGF0oPDY_weAkXArzXyGLBPXFCnn_IWJDkGD4vqBQQAh2n52f48GD_cb-PSCT_8b-ESsKUI9NJa11XsdaUZxAc8TzrYnXwdcQbtl_kZGKhS6_rCtuNEBouA_lvM2CbS7TTtV2U4zVmJKpp-c6nt3yZePK3Av01GWn1pH_3sZbaPEx8DUjSbdp4i4iK-Mj4p2HPoph67DR7B9MFETYku_28SgP9xsKRRvFH4aHBHESWX4FDbwaU= 解密后的数据: {"code":0,"dictResult":{"ec":{"exam_type":["初中","高中","CET4","CET6","考研"],"word":{"usphone":"ˈæp(ə)l","ukphone":"ˈæp(ə)l","ukspeech":"apple&type=1","trs":[{"pos":"n.","tran":"苹果"}],"wfs":[{"wf":{"name":"复数","value":"apples"}}],"return-phrase":"apple","usspeech":"apple&type=2"}}},"translateResult":[[{"tgt":"苹果","src":"apple","tgtPronounce":"pín guŏ"}]],"type":"en2zh-CHS"}
到这整个案例就给大家介绍完了,希望大家能通过这个案例掌握爬虫逆向的基本思路,流程以及破解的技巧。
目标URL:
# https://www.douyin.com/aweme/v1/web/aweme/post/
import requests headers = { 'authority': 'www.douyin.com', 'accept': 'application/json, text/plain, */*', 'accept-language': 'zh-CN,zh;q=0.9', 'cache-control': 'no-cache', 'cookie': 'ttwid=1%7CvQ6QCiLyIG9SJypBIXRtIfGPJXv6br9a79NgmLfR-U4%7C1697436889%7C0ea69e384e5deb1dc65f4200190d8d2f33f9c4ca6c30e10208ee46af29a5015d; passport_csrf_token=86dff46d8a31bd2d89aba8859d6b9839; passport_csrf_token_default=86dff46d8a31bd2d89aba8859d6b9839; s_v_web_id=verify_lnsi3jm8_cqfdqTI4_fYoM_4wDc_8VNE_bb4xCKSzPapH; odin_tt=554cb19fe22b9da317001c2f9de95b4e1a7d360dfdbae35d1dfc361c0088b8f4af048efc12f6935fa5d5f6100e9cbbf7580b76969425528255343c9e13dc19c8b7c995d443abeafc2ab8a498b7cf9a4c; volume_info=%7B%22isUserMute%22%3Afalse%2C%22isMute%22%3Atrue%2C%22volume%22%3A0.314%7D; FORCE_LOGIN=%7B%22videoConsumedRemainSeconds%22%3A180%7D; SEARCH_RESULT_LIST_TYPE=%22single%22; download_guide=%223%2F20231107%2F0%22; pwa2=%220%7C0%7C3%7C0%22; douyin.com; device_web_cpu_core=10; device_web_memory_size=8; webcast_local_quality=null; csrf_session_id=15b2efd04735f33bf97e434c422e4381; __ac_nonce=0654ca6e9005f3a669ca5; __ac_signature=_02B4Z6wo00f011w8rhwAAIDBHWEjuua2TSdcHKqAALJOVl79BpvTTdq7UdfJ0ZJivSubfqR7DTSRhBMZXolqgQ91ptK9wlWted1ar-p7M0KTXx7UiiwDtyagOAPIbx.TSMA0.mEGXEAGx2qhae; VIDEO_FILTER_MEMO_SELECT=%7B%22expireTime%22%3A1700127086066%2C%22type%22%3A1%7D; strategyABtestKey=%221699522286.12%22; bd_ticket_guard_client_data=eyJiZC10aWNrZXQtZ3VhcmQtdmVyc2lvbiI6MiwiYmQtdGlja2V0LWd1YXJkLWl0ZXJhdGlvbi12ZXJzaW9uIjoxLCJiZC10aWNrZXQtZ3VhcmQtcmVlLXB1YmxpYy1rZXkiOiJCQnNtWm5qUEJ6SExwVlpzZjhzV1BoYWlOQzJZM0ZNNk9iTnNoOGRQMzFmRFVtOVdDLzhXWHJ4NVFDTXZvTWZLdFNuMVlKU2ZvclVETmZ6SEkrUkF5MVE9IiwiYmQtdGlja2V0LWd1YXJkLXdlYi12ZXJzaW9uIjoxfQ%3D%3D; tt_scid=Bij66aryo2u0w.o16gYX3caPsmVWrfe-6Pk1ulTwQgNWudwhGRTGfTUIW.V7QT1Kc2a9; msToken=7OYWMqm4cfsfI0e0Ll0FUACzSrrtw96Ey0A5o6DUTNI5FksUP8JJXo88BzQ4aujOB5foj2ADof_1URtCZEzghUfOi6D1Rs4YKHbRS-5rCQYNC3wEs71j3DNXCiLGKg==; msToken=Noy13zPl0zuSYJ9nJ8DHykX-W5olMoMC3OmO0sEuhxH3pojze2qYwurlsL9BtfsIgFDH6YbQDFF2v5ebPWrQyUUEa3d0xNqHTtzuBYj21dzd43p3pZi56bfy6xRx; stream_recommend_feed_params=%22%7B%5C%22cookie_enabled%5C%22%3Atrue%2C%5C%22screen_width%5C%22%3A1496%2C%5C%22screen_height%5C%22%3A967%2C%5C%22browser_online%5C%22%3Atrue%2C%5C%22cpu_core_num%5C%22%3A10%2C%5C%22device_memory%5C%22%3A8%2C%5C%22downlink%5C%22%3A10%2C%5C%22effective_type%5C%22%3A%5C%224g%5C%22%2C%5C%22round_trip_time%5C%22%3A100%7D%22; IsDouyinActive=true; home_can_add_dy_2_desktop=%221%22', 'pragma': 'no-cache', 'referer': 'https://www.douyin.com/user/MS4wLjABAAAA0HwZJN6-JDCSTjxiMk-czhyZWxes8XIDEjppFXExauK8-kQTLMEH9ZdfIXxnl9tS', 'sec-ch-ua': '"Google Chrome";v="119", "Chromium";v="119", "Not?A_Brand";v="24"', 'sec-ch-ua-mobile': '?0', 'sec-ch-ua-platform': '"macOS"', 'sec-fetch-dest': 'empty', 'sec-fetch-mode': 'cors', 'sec-fetch-site': 'same-origin', 'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36', } params = { 'device_platform': 'webapp', 'aid': '6383', 'channel': 'channel_pc_web', 'sec_user_id': 'MS4wLjABAAAA0HwZJN6-JDCSTjxiMk-czhyZWxes8XIDEjppFXExauK8-kQTLMEH9ZdfIXxnl9tS', 'max_cursor': '0', 'locate_query': 'false', 'show_live_replay_strategy': '1', 'need_time_list': '1', 'time_list_query': '0', 'whale_cut_token': '', 'cut_version': '1', 'count': '18', 'publish_video_strategy_type': '2', 'pc_client_type': '1', 'version_code': '170400', 'version_name': '17.4.0', 'cookie_enabled': 'true', 'screen_width': '1496', 'screen_height': '967', 'browser_language': 'zh-CN', 'browser_platform': 'MacIntel', 'browser_name': 'Chrome', 'browser_version': '119.0.0.0', 'browser_online': 'true', 'engine_name': 'Blink', 'engine_version': '119.0.0.0', 'os_name': 'Mac OS', 'os_version': '10.15.7', 'cpu_core_num': '10', 'device_memory': '8', 'platform': 'PC', 'downlink': '10', 'effective_type': '4g', 'round_trip_time': '100', 'webid': '7290435875619390995', 'msToken': 'Noy13zPl0zuSYJ9nJ8DHykX-W5olMoMC3OmO0sEuhxH3pojze2qYwurlsL9BtfsIgFDH6YbQDFF2v5ebPWrQyUUEa3d0xNqHTtzuBYj21dzd43p3pZi56bfy6xRx', 'X-Bogus': 'DFSzswVYIOtANx--tFiuUgHB7tIG', } response = requests.get('https://www.douyin.com/aweme/v1/web/aweme/post/', params=params, headers=headers) print(response.text)
XHR提取断点:/v1/web/aweme/post
条件断点:
_0xc26b5e.openArgs[1].search("/aweme/v1/web/aweme/post/")>=0
日志断点:
_0x2458f0['apply'](_0xc26b5e, _0x1f1790);
条件断点:
_0x2458f0['apply'](_0xc26b5e, _0x1f1790).length == 28
将整个webmssdk.es5.js
拷贝到本地js文件
全局设置:
window.yuan = _0x5a8f25
代码结构:
window = global document = {} document.addEventListener = function addEventListener() { } navigator = { userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36' } // 源码开始... window.yuan = _0x5a8f25 // 源码结束... // 测试 // var data = '123456' // console.log(window.yuan(data, null))
Python代码:
import requests import execjs import urllib.parse with open("douyin.js") as f: js_code = f.read() js_compile = execjs.compile(js_code) headers = { 'authority': 'www.douyin.com', 'accept': 'application/json, text/plain, */*', 'accept-language': 'zh-CN,zh;q=0.9', 'cache-control': 'no-cache', 'cookie': 'ttwid=1%7CvQ6QCiLyIG9SJypBIXRtIfGPJXv6br9a79NgmLfR-U4%7C1697436889%7C0ea69e384e5deb1dc65f4200190d8d2f33f9c4ca6c30e10208ee46af29a5015d; passport_csrf_token=86dff46d8a31bd2d89aba8859d6b9839; passport_csrf_token_default=86dff46d8a31bd2d89aba8859d6b9839; s_v_web_id=verify_lnsi3jm8_cqfdqTI4_fYoM_4wDc_8VNE_bb4xCKSzPapH; odin_tt=554cb19fe22b9da317001c2f9de95b4e1a7d360dfdbae35d1dfc361c0088b8f4af048efc12f6935fa5d5f6100e9cbbf7580b76969425528255343c9e13dc19c8b7c995d443abeafc2ab8a498b7cf9a4c; volume_info=%7B%22isUserMute%22%3Afalse%2C%22isMute%22%3Atrue%2C%22volume%22%3A0.314%7D; FORCE_LOGIN=%7B%22videoConsumedRemainSeconds%22%3A180%7D; SEARCH_RESULT_LIST_TYPE=%22single%22; download_guide=%223%2F20231107%2F0%22; pwa2=%220%7C0%7C3%7C0%22; douyin.com; device_web_cpu_core=10; device_web_memory_size=8; webcast_local_quality=null; csrf_session_id=15b2efd04735f33bf97e434c422e4381; __ac_nonce=0654ca6e9005f3a669ca5; __ac_signature=_02B4Z6wo00f011w8rhwAAIDBHWEjuua2TSdcHKqAALJOVl79BpvTTdq7UdfJ0ZJivSubfqR7DTSRhBMZXolqgQ91ptK9wlWted1ar-p7M0KTXx7UiiwDtyagOAPIbx.TSMA0.mEGXEAGx2qhae; VIDEO_FILTER_MEMO_SELECT=%7B%22expireTime%22%3A1700127086066%2C%22type%22%3A1%7D; strategyABtestKey=%221699522286.12%22; bd_ticket_guard_client_data=eyJiZC10aWNrZXQtZ3VhcmQtdmVyc2lvbiI6MiwiYmQtdGlja2V0LWd1YXJkLWl0ZXJhdGlvbi12ZXJzaW9uIjoxLCJiZC10aWNrZXQtZ3VhcmQtcmVlLXB1YmxpYy1rZXkiOiJCQnNtWm5qUEJ6SExwVlpzZjhzV1BoYWlOQzJZM0ZNNk9iTnNoOGRQMzFmRFVtOVdDLzhXWHJ4NVFDTXZvTWZLdFNuMVlKU2ZvclVETmZ6SEkrUkF5MVE9IiwiYmQtdGlja2V0LWd1YXJkLXdlYi12ZXJzaW9uIjoxfQ%3D%3D; tt_scid=Bij66aryo2u0w.o16gYX3caPsmVWrfe-6Pk1ulTwQgNWudwhGRTGfTUIW.V7QT1Kc2a9; msToken=7OYWMqm4cfsfI0e0Ll0FUACzSrrtw96Ey0A5o6DUTNI5FksUP8JJXo88BzQ4aujOB5foj2ADof_1URtCZEzghUfOi6D1Rs4YKHbRS-5rCQYNC3wEs71j3DNXCiLGKg==; msToken=Noy13zPl0zuSYJ9nJ8DHykX-W5olMoMC3OmO0sEuhxH3pojze2qYwurlsL9BtfsIgFDH6YbQDFF2v5ebPWrQyUUEa3d0xNqHTtzuBYj21dzd43p3pZi56bfy6xRx; stream_recommend_feed_params=%22%7B%5C%22cookie_enabled%5C%22%3Atrue%2C%5C%22screen_width%5C%22%3A1496%2C%5C%22screen_height%5C%22%3A967%2C%5C%22browser_online%5C%22%3Atrue%2C%5C%22cpu_core_num%5C%22%3A10%2C%5C%22device_memory%5C%22%3A8%2C%5C%22downlink%5C%22%3A10%2C%5C%22effective_type%5C%22%3A%5C%224g%5C%22%2C%5C%22round_trip_time%5C%22%3A100%7D%22; IsDouyinActive=true; home_can_add_dy_2_desktop=%221%22', 'pragma': 'no-cache', 'referer': 'https://www.douyin.com/user/MS4wLjABAAAA0HwZJN6-JDCSTjxiMk-czhyZWxes8XIDEjppFXExauK8-kQTLMEH9ZdfIXxnl9tS', 'sec-ch-ua': '"Google Chrome";v="119", "Chromium";v="119", "Not?A_Brand";v="24"', 'sec-ch-ua-mobile': '?0', 'sec-ch-ua-platform': '"macOS"', 'sec-fetch-dest': 'empty', 'sec-fetch-mode': 'cors', 'sec-fetch-site': 'same-origin', 'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36', } params = { 'device_platform': 'webapp', 'aid': '6383', 'channel': 'channel_pc_web', # 'sec_user_id': 'MS4wLjABAAAA0HwZJN6-JDCSTjxiMk-czhyZWxes8XIDEjppFXExauK8-kQTLMEH9ZdfIXxnl9tS', 'sec_user_id': "MS4wLjABAAAAMbqnWxzUfZegt9vrNBDz7zyqwhvG6vXiKTDxVm2wUD0", 'max_cursor': '0', 'locate_query': 'false', 'show_live_replay_strategy': '1', 'need_time_list': '1', 'time_list_query': '0', 'whale_cut_token': '', 'cut_version': '1', 'count': '18', 'publish_video_strategy_type': '2', 'pc_client_type': '1', 'version_code': '170400', 'version_name': '17.4.0', 'cookie_enabled': 'true', 'screen_width': '1496', 'screen_height': '967', 'browser_language': 'zh-CN', 'browser_platform': 'MacIntel', 'browser_name': 'Chrome', 'browser_version': '119.0.0.0', 'browser_online': 'true', 'engine_name': 'Blink', 'engine_version': '119.0.0.0', 'os_name': 'Mac OS', 'os_version': '10.15.7', 'cpu_core_num': '10', 'device_memory': '8', 'platform': 'PC', 'downlink': '10', 'effective_type': '4g', 'round_trip_time': '100', 'webid': '7290435875619390995', 'msToken': 'Noy13zPl0zuSYJ9nJ8DHykX-W5olMoMC3OmO0sEuhxH3pojze2qYwurlsL9BtfsIgFDH6YbQDFF2v5ebPWrQyUUEa3d0xNqHTtzuBYj21dzd43p3pZi56bfy6xRx', # 'X-Bogus': 'DFSzswVYIOtANx--tFiuUgHB7tIG', } params_str = urllib.parse.urlencode(params) X_B = js_compile.call("window.yuan", params_str) print("X_B:", X_B) params["X-Bogus"] = X_B response = requests.get('https://www.douyin.com/aweme/v1/web/aweme/post/', params=params, headers=headers) print(response.text)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。