当前位置:   article > 正文

python爬虫练习案例:最受欢迎的影评数据保存csv文件_python爬取抖音评论

python爬取抖音评论

代码如下

  1. import re
  2. from urllib.request import urlopen,Request
  3. import xlwt
  4. url="https://movie.douban.com/review/best/"
  5. headers = {'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Mobile Safari/537.36 Edg/92.0.902.55'}
  6. webSourceCode=urlopen(Request(url, headers=headers)).read().decode("utf-8","ignore")
  7. # 突出评论
  8. titleRe=re.compile(r'<h2><a href=".*?">(.*?)</a></h2>')
  9. # 作者
  10. authorRe=re.compile(r'<a href=".*?" class="name">(.*?)</a>')
  11. # 时间
  12. timeRe=re.compile(r'<span content=".*?" class="main-meta">(.*?)</span>')
  13. # 简要
  14. contentRe=re.compile(r'<div class="short-content">.*?(.*?)&nbsp',re.S)
  15. wordRe=re.compile(r'<a class="subject-img" href="(.*?)">(.*?)<img alt="(.*?)" title="(.*?)" src="(.*?)" rel="v:image" />(.*?)</a>')
  16. nameRe=re.compile(r'<a class="subject-img" href="(.*?)"><img alt="(.*?)" title="(.*?)"(.*?)>(.*?)</a>')
  17. names=nameRe.findall(webSourceCode)
  18. times=timeRe.findall(webSourceCode)
  19. words=wordRe.findall(webSourceCode)
  20. titles=titleRe.findall(webSourceCode)
  21. content=contentRe.findall(webSourceCode)
  22. authors=authorRe.findall(webSourceCode)
  23. print(nameRe)
  24. work_book = xlwt.Workbook(encoding='utf-8')
  25. sheet = work_book.add_sheet('豆瓣影评',cell_overwrite_ok=True)
  26. sheet.write(0, 0, '影片名称')
  27. sheet.write(0, 1, '时间')
  28. sheet.write(0, 2, '突然评论')
  29. sheet.write(0, 3, '网站地址')
  30. sheet.write(0, 4, '内容介绍')
  31. sheet.write(0, 5, '图片网站')
  32. row_num1 = 1
  33. row_num2 = 1
  34. row_num3 = 1
  35. row_num4 = 1
  36. row_num5 = 1
  37. row_num6 = 1
  38. print("时间==============================================================")
  39. for time in times:
  40. sheet.write(row_num1, 1, time)
  41. row_num1 += 1
  42. print("突出评论==============================================================")
  43. for title in titles:
  44. print(title)
  45. sheet.write(row_num2, 2, title)
  46. row_num2 += 1
  47. print("内容简介==============================================================")
  48. for c in content:
  49. print(c)
  50. sheet.write(row_num4, 4, c)
  51. row_num4 += 1
  52. print("网站==============================================================")
  53. for word in words:
  54. print(word[0])
  55. sheet.write(row_num3, 3, word[0])
  56. row_num3 += 1
  57. print("名称==============================================================")
  58. for name in words:
  59. print(name[3])
  60. sheet.write(row_num6, 0, name[3])
  61. row_num6 += 1
  62. print("图片==============================================================")
  63. for img in words:
  64. print(img[4])
  65. sheet.write(row_num5, 5, img[4])
  66. row_num5 += 1
  67. # 将工作表,excel文件,保存到本地路径
  68. file_name = r'C:\Users\admin\PycharmProjects\pythonProject\venv\Include\data2.xls'
  69. work_book.save(file_name.encode("utf-8").decode("utf-8"))

代码运行结果:

  1. 时间==============================================================
  2. 突出评论==============================================================
  3. 内容简介==============================================================
  4. 网站==============================================================
  5. 名称==============================================================
  6. 图片==============================================================
  7. Process finished with exit code 0

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家小花儿/article/detail/577393
推荐阅读
相关标签
  

闽ICP备14008679号