当前位置:   article > 正文

爬取百度翻译_使用网络抓包工具,查看百度翻译的请求方式,并且将单词dog的参数找出来

使用网络抓包工具,查看百度翻译的请求方式,并且将单词dog的参数找出来

本文主要利用Google浏览器和Pychrm来进行操作。(本操作于2021.2.18

  1. 利用Google浏览器数据提交信息进行分析。

            

          

          由于是异步,我们仅看XHR部分就可以.

          看第二个包的Headers部分,获取请求头信息

  看第二个包可以看出是POST提交方式,提交的地址如上 https://fanyi.baidu.com/v2transapi?from=en&to=zh

提交的数据如下:

如上图:from: en(英语)

      to:zh(中文)

      query:dog(我们要查询的单词)

下面的token是一个固定值,本文的难点也就是sign值得获取

 

 

2.  sign值的分析与获取。

 

回顾一下我们提交的地址是:https://fanyi.baidu.com/v2transapi?from=en&to=zh,

 

在source下利用search搜索这个v2transapi  

设置断点调试,观察各个值的变化情况,

 

鼠标悬停,显示相关信息,sign值已经计算得出

 

 

点击sign中的f函数进一步一探究竟

 

 

将这个函数复制下来新建一个js文件,利用python的execjs(pip install PyExecJS)包执行js文件

 

新建code.js文件(此步骤为初始测试阶段,js代码不完整,主要记录当时调试思路,完整js代码见下部分)

 

  1. function e(r) {
  2. var o = r.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g);
  3. if (null === o) {
  4. var t = r.length;
  5. t > 30 && (r = "" + r.substr(0, 10) + r.substr(Math.floor(t / 2) - 5, 10) + r.substr(-10, 10))
  6. } else {
  7. for (var e = r.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/), C = 0, h = e.length, f = []; h > C; C++)
  8. "" !== e[C] && f.push.apply(f, a(e[C].split(""))),
  9. C !== h - 1 && f.push(o[C]);
  10. var g = f.length;
  11. g > 30 && (r = f.slice(0, 10).join("") + f.slice(Math.floor(g / 2) - 5, Math.floor(g / 2) + 5).join("") + f.slice(-10).join(""))
  12. }
  13. var u = void 0
  14. , l = "" + String.fromCharCode(103) + String.fromCharCode(116) + String.fromCharCode(107);
  15. u = null !== i ? i : (i = window[l] || "") || "";
  16. for (var d = u.split("."), m = Number(d[0]) || 0, s = Number(d[1]) || 0, S = [], c = 0, v = 0; v < r.length; v++) {
  17. var A = r.charCodeAt(v);
  18. 128 > A ? S[c++] = A : (2048 > A ? S[c++] = A >> 6 | 192 : (55296 === (64512 & A) && v + 1 < r.length && 56320 === (64512 & r.charCodeAt(v + 1)) ? (A = 65536 + ((1023 & A) << 10) + (1023 & r.charCodeAt(++v)),
  19. S[c++] = A >> 18 | 240,
  20. S[c++] = A >> 12 & 63 | 128) : S[c++] = A >> 12 | 224,
  21. S[c++] = A >> 6 & 63 | 128),
  22. S[c++] = 63 & A | 128)
  23. }
  24. for (var p = m, F = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(97) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(54)), D = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(51) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(98)) + ("" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(102)), b = 0; b < S.length; b++)
  25. p += S[b],
  26. p = n(p, F);
  27. return p = n(p, D),
  28. p ^= s,
  29. 0 > p && (p = (2147483647 & p) + 2147483648),
  30. p %= 1e6,
  31. p.toString() + "." + (p ^ m)
  32. }

下面为Python代码

  1. import execjs
  2.     2 if __name__ == '__main__':
  3.     3         with open("code.js") as f:
  4.     4                 jsData = f.read()
  5.     5         print(jsData)
  6.     6         sign = execjs.compile(jsData).call("e", 'java') # 调用js代码中的 e函数,传入参数为 inputData
  7.     7         print(sign)

显示i未定义,回谷歌浏览器进行进一步调试。

 

在sign 处设置断点,进入该函数,单步执行

可以得出i的值,经过多次实验,i是一个定值 ,我们直接在 js文件中定义i的值,定义玩完成后发现继续报错“xecjs._exceptions.ProgramError: TypeError: 对象不支持此属性或方法” 观察js代码可得到,

通过谷歌浏览器调试功能,补齐n函数到js文件中,经测试,任意输入,均可正确产生sign值,下面附上完整的代码。

完整js代码如下:

  1. function e(r) {
  2. var i="320305.131321201"
  3. var o = r.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g);
  4. if (null === o) {
  5. var t = r.length;
  6. t > 30 && (r = "" + r.substr(0, 10) + r.substr(Math.floor(t / 2) - 5, 10) + r.substr(-10, 10))
  7. } else {
  8. for (var e = r.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/), C = 0, h = e.length, f = []; h > C; C++)
  9. "" !== e[C] && f.push.apply(f, a(e[C].split(""))),
  10. C !== h - 1 && f.push(o[C]);
  11. var g = f.length;
  12. g > 30 && (r = f.slice(0, 10).join("") + f.slice(Math.floor(g / 2) - 5, Math.floor(g / 2) + 5).join("") + f.slice(-10).join(""))
  13. }
  14. var u = void 0
  15. , l = "" + String.fromCharCode(103) + String.fromCharCode(116) + String.fromCharCode(107);
  16. u = null !== i ? i : (i = window[l] || "") || "";
  17. for (var d = u.split("."), m = Number(d[0]) || 0, s = Number(d[1]) || 0, S = [], c = 0, v = 0; v < r.length; v++) {
  18. var A = r.charCodeAt(v);
  19. 128 > A ? S[c++] = A : (2048 > A ? S[c++] = A >> 6 | 192 : (55296 === (64512 & A) && v + 1 < r.length && 56320 === (64512 & r.charCodeAt(v + 1)) ? (A = 65536 + ((1023 & A) << 10) + (1023 & r.charCodeAt(++v)),
  20. S[c++] = A >> 18 | 240,
  21. S[c++] = A >> 12 & 63 | 128) : S[c++] = A >> 12 | 224,
  22. S[c++] = A >> 6 & 63 | 128),
  23. S[c++] = 63 & A | 128)
  24. }
  25. for (var p = m, F = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(97) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(54)), D = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(51) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(98)) + ("" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(102)), b = 0; b < S.length; b++)
  26. p += S[b],
  27. p = n(p, F);
  28. return p = n(p, D),
  29. p ^= s,
  30. 0 > p && (p = (2147483647 & p) + 2147483648),
  31. p %= 1e6,
  32. p.toString() + "." + (p ^ m)
  33. }
  34. function n(r, o) {
  35. for (var t = 0; t < o.length - 2; t += 3) {
  36. var a = o.charAt(t + 2);
  37. a = a >= "a" ? a.charCodeAt(0) - 87 : Number(a),
  38. a = "+" === o.charAt(t + 1) ? r >>> a : r << a,
  39. r = "+" === o.charAt(t) ? r + a & 4294967295 : r ^ a
  40. }
  41. return r
  42. }

下面是完整的Python代码:

  1. import requests
  2. import execjs
  3. def get_sign(imp):
  4. inputData = imp
  5. with open("code.js") as f:
  6. jsData = f.read()
  7. sign = execjs.compile(jsData).call("e", inputData) # 调用js代码中的 e函数,传入参数为 inputData
  8. print(sign)
  9. return sign
  10. if __name__ == '__main__':
  11. keyword = input("请输入要查询的单词:")
  12. sign=get_sign(keyword)
  13. post_url='https://fanyi.baidu.com/v2transapi?from=zh&to=en'
  14. #进行UA伪装
  15. headers={
  16. 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36',
  17. 'cookie':'BIDUPSID=E34AF9369DAEDBE07FF46AD86C12C25F; PSTM=1611745359; BAIDUID=E34AF9369DAEDBE012BBE049A73380B9:FG=1; __yjs_duid=1_18bb963bc9039c1e7e4a9cdf3d06ff831611745998081; BDUSS=F2cDhEalQ3ZHpURERofmRrWDRJWjNMUFFRZHFpNFpMU3RnaEkyMm5sSUdZRHRnRVFBQUFBJCQAAAAAAAAAAAEAAADHXttO5OzI99i8t-jSu7vYWQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAbTE2AG0xNgeE; BDUSS_BFESS=F2cDhEalQ3ZHpURERofmRrWDRJWjNMUFFRZHFpNFpMU3RnaEkyMm5sSUdZRHRnRVFBQUFBJCQAAAAAAAAAAAEAAADHXttO5OzI99i8t-jSu7vYWQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAbTE2AG0xNgeE; H_PS_PSSID=33423_33354_33272_31254_33595_33571_26350; delPer=0; PSINO=1; BAIDUID_BFESS=E34AF9369DAEDBE012BBE049A73380B9:FG=1; BDRCVFR[feWj1Vr5u3D]=I67x6TjHwwYf0; REALTIME_TRANS_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1613436636,1613437148,1613439320; ZD_ENTRY=baidu; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1613439532; __yjsv5_shitong=1.0_7_d4088b4cfa426bf358e5f521af89684893c8_300_1613439533486_111.14.147.253_3cd8ee0a; ab_sr=1.0.0_MzRhYjAyMWUxZjA5ZTRmOTMyMjhmY2Y3OTY4MTk2ZTlmMmZlMjQ4OGQ0ZjJiNDY4MDM1YjBmZTg0ZGE1YjM4ZDcxN2Q5MWMwZjJlNGM4NTIyMjE2MGIyNWM5NTg2Y2Ey; BA_HECTOR=8g2ha420a405240l5k1g2m8ns0q; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598'
  18. }
  19. #post请求参数处理
  20. datas={
  21. 'query':keyword,
  22. 'transtype':'enter',
  23. 'simple_means_flag':3,
  24. 'sign':sign,
  25. 'token':"7a831add74340d0eff30a183268ef9d0",
  26. 'domain':'common'
  27. }
  28. #4.请求发送
  29. respones=requests.post(url=post_url,data=datas,headers=headers)
  30. #5.获取响应数据:json()返回的是obj(如果确认返回的是json类型,才能使用json())
  31. dic_obj=respones.json()
  32. print(dic_obj['trans_result']['data'][0]['dst'])

总结:

本次主要的难点主要是对sign值得分析与获取,得到sign值后便可利用requests模块的post提交方式进行获取结果,此外在得到response后利用特有的json()方法对返回的数据进行处理。

 

 

 

 

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/盐析白兔/article/detail/352871
推荐阅读
相关标签
  

闽ICP备14008679号