当前位置:   article > 正文

Pytorch-中文文本分类

pytorch han 中文分类

1. 爬虫

JD.py

  1. import requests
  2. from urllib.parse import quote
  3. from urllib.parse import urlencode
  4. from lxml import etree
  5. import logging
  6. import json
  7. import time
  8. class JDSpider:
  9. # 爬虫实现类:传入商品类别(如手机、电脑),构造实例。然后调用getData爬取数据
  10. def __init__(self, categlory):
  11. self.startUrl = "https://search.jd.com/Search?keyword=%s&enc=utf-8" % (quote(categlory)) # jD起始搜索页面
  12. self.commentBaseUrl = "https://club.jd.com/comment/productPageComments.action?"
  13. self.headers = {
  14. "User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36"
  15. }
  16. self.productsId = self.getId()
  17. self.comtype = {0: "nagetive", 1: "medium
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/682958
推荐阅读
相关标签
  

闽ICP备14008679号