当前位置:   article > 正文

python 爬百度热搜并生成词云_搜索内容 词云

搜索内容 词云

 1、爬取百度body存入txt

  1. def get_baidu_hot():
  2. url = "https://top.baidu.com/board?tab=realtime"
  3. headers = {
  4. "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"}
  5. response = requests.get(url, headers=headers)
  6. response.encoding = "utf-8"
  7. soup = BeautifulSoup(response.text, "html.parser")
  8. txt=soup.find_all("body")
  9. print(txt)
  10. my_utils.write_file(txt)
'
运行

 2、读取txt正则匹配获取json

data=my_utils.read_file()

 3、将json存入数据库

json2=my_utils.ana_baidu(data)
  1. # 假设表名为 "users"
  2. table_name = "users"
  3. # 遍历 JSON 数据中的键值对,生成插入语句
  4. insert_statements = []
  5. for i in json2:
  6. str=f"INSERT INTO {table_name} ("
  7. for key, value in i.items():
  8. str= str+f"`{key}`, "
  9. print(str)
  10. str=str[:-2]+") VALUES ("
  11. str2=""
  12. for key, value in i.items():
  13. str2= str2+f"'{value}', "
  14. str3=str+str2[:-2]+");"
  15. print (str3)
  16. my_sql.exe_sql(str3)

 4、读取数据库信息生成词云

  1. result_content=my_sql.query_sql("select `desc` from users order by create_time desc limit 50")
  2. result_content=str(result_content)
  3. result_content=result_content.replace("的","")
  4. my_wcloud.create_cy(result_content)

生成词云:

代码:

javaDev/public_python

ssh:

git@gitee.com:wangchao_1/public_python.git

声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop】
推荐阅读
相关标签
  

闽ICP备14008679号