当前位置:   article > 正文

Django项目实践(爬取今日头条的头条热榜)_抓取今日头条app热榜

抓取今日头条app热榜

本文按照下列项目来进行说明。

mysite2

        - manage.py

        - mysite2

        - app01

1、打开今日头条,对网页进行分析并爬取

 获取请求URL

分析网站的数据来源后。

开始构造headers,对及今日头条进行爬取,并把数据JSON格式化。

其中的Url对应的就是当前新闻内容的网址,Title对应的就是新闻的标题。

  1. {"data":[
  2.                 {
  3.                     "ClusterId":7072942452532842023,
  4.                     "Title":"沙特和阿联酋领导人拒接拜登电话",
  5.                     "LabelUrl":"https://p26.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png",
  6.                     "Label":"hot",
  7.                     "Url":"https://www.toutiao.com/amos_land_page/?category_name=topic_innerflow\u0026event_type=hot_board\u0026log_pb=%7B%22category_name%22%3A%22topic_innerflow%22%2C%22cluster_type%22%3A%2210%22%2C%22enter_from%22%3A%22click_category%22%2C%22entrance_hotspot%22%3A%22outside%22%2C%22event_type%22%3A%22hot_board%22%2C%22hot_board_cluster_id%22%3A%227072942452532842023%22%2C%22hot_board_impr_id%22%3A%222022030918321201021216216025C743EE%22%2C%22jump_page%22%3A%22hot_board_page%22%2C%22location%22%3A%22news_hot_card%22%2C%22page_location%22%3A%22hot_board_page%22%2C%22rank%22%3A%221%22%2C%22source%22%3A%22trending_tab%22%2C%22style_id%22%3A%2240132%22%2C%22title%22%3A%22%E6%B2%99%E7%89%B9%E5%92%8C%E9%98%BF%E8%81%94%E9%85%8B%E9%A2%86%E5%AF%BC%E4%BA%BA%E6%8B%92%E6%8E%A5%E6%8B%9C%E7%99%BB%E7%94%B5%E8%AF%9D%22%7D\u0026rank=1\u0026style_id=40132\u0026topic_id=7072942452532842023",
  8.                     "HotValue":"6753999",
  9.                     "Schema":"",
  10.                     "LabelUri":{
  11.                         "uri":"mosaic-legacy/2b29200041b9c651e8148",
  12.                         "url":"https://p26.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png",
  13.                         "width":200,
  14.                         "height":200,
  15.                         "url_list":[
  16.                             {"url":"https://p26.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png"},
  17.                             {"url":"https://p3.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png"},
  18.                             {"url":"https://p9.toutiaoimg.com/img/mosaic-legacy/2b29200041b9c651e8148~cs_noop.png"}
  19.                         ],
  20.                         "image_type":1
  21.                         },
  22.                     "ClusterIdStr":"7072942452532842023",
  23.                     "ClusterType":10,
  24.                     "QueryWord":"沙特和阿联酋领导人拒接拜登电话",
  25.                     "InterestCategory":["international"],
  26.                     "Image":{
  27.                         "uri":"tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999",
  28.                         "url":"https://p6.toutiaoimg.com/img/tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999~cs_noop.png",
  29.                         "width":0,
  30.                         "height":0,
  31.                         "url_list":[
  32.                             {"url":"https://p6.toutiaoimg.com/img/tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999~cs_noop.png"},
  33.                             {"url":"https://p9.toutiaoimg.com/img/tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999~cs_noop.png"},
  34.                             {"url":"https://p3.toutiaoimg.com/img/tos-cn-i-qvj2lq49k0/a7e3f7e3e8c04c37bc7f88b2340ab999~cs_noop.png"}
  35.                             ],
  36.                         "image_type":1
  37.                         },
  38.                     "LabelDesc":"热门事件"
  39.                 },
  40.                 {
  41.                     
  42.                 },

 2、在app01/views.py文件中添加一个函数用来爬取新闻并进行展示

  1. #爬取今日头条的头条热榜,进行展示并附加链接
  2. def news(req):
  3. url = 'https://www.toutiao.com/hot-event/hot-board/?origin=toutiao_pc&_signature=_02B4Z6wo00f01yG9tdQAAIDCQrd1vxaJp9chmbFAAKpR4Dqk0c56dkhdlvNsoD3I03ygIjgUcxkM0VcFYKfO0a9iJRjnl1M9yxZvlq-pgzUXDOrpi1wKoYlCVC9.llzChJ7GmTYXIDMvE.c1a6'
  4. headers = {
  5. "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36",
  6. "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", }
  7. res = requests.get(url=url, headers=headers)
  8. data_all_dict = res.json()
  9. data_lists = dict(data_all_dict)['data']
  10. return render(
  11. req,
  12. 'news.html',
  13. {
  14. "news_dicts":data_lists
  15. }
  16. )

3、在app01/templates文件夹下新建一个news.html文件

其中style='text-decoration:none;color:black' ,作用是去掉超链接的下划线,并让超链接的颜色变成黑色。再使用Django的模板技术,对新闻字典进行遍历输出。

  1. <html lang="en">
  2.     <head>
  3.         <meta charset="UTF-8">
  4.         <title>Title</title>
  5.     </head>
  6.     <body>
  7.         <h1>今日头条</h1>
  8.         <ul>
  9.             {% for news in news_dicts %}
  10.                 <li>
  11.                     <a style='text-decoration:none;color:black' href = {{news.Url}} target="_blank">{{ news.Title }}</a><br>
  12.                 </li>
  13.             {% endfor %}
  14.     </ul>
  15.     </body>
  16. </html>

4、在mysite2/urls.py文件中构造url和函数的链接关系

path('news/',views.news)

5、启动服务python manage.py runserver 0.0.0.0:8000,在浏览器中输入http://127.0.0.1:8000/news/,查看是否成功。

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家小花儿/article/detail/958672
推荐阅读
相关标签
  

闽ICP备14008679号