赞
踩
这篇博客将介绍如何使用Python构建CSDN历史博客文章列表,并生成目录。
pip install pyfreeproxy
2023/4/4 更新,之前的代理不太行,无法访问了,切换到freeproxy
# 2023/4/4 更新,之前的代理不太行,无法访问了,切换到freeproxy
# 使用Python爬取CSDN历史博客文章列表,并生成目录
# python pa_article.py
# 2022
## 202201
# - aaa
# - bbb
## 202202
# -ccc
# -ddd
# 2023
## 202301
# -eee
# -fff
import datetime
import json
import requests
def getCSDNTitleUrl(year, month, dict):
now_time = datetime.datetime.now().strftime("%Y%m")
if (year + month > now_time):
return
url = 'https://blog.csdn.net/community/home-api/v1/get-business-list?page=1&size=50&businessType=blog&orderby=&noMore=false&year=' + year + '&month=' + month + '&username=qq_40985985'
headers = {
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0'
}
response = requests.get(url, headers=headers)
# print(response.text)
results = json.loads(response.text)
dict[year + '年' + month + '月'] = results
dict = {}
for i in range(2023, 2024):
for j in range(1, 13):
if (j < 10):
# print(i, '0' + str(j))
getCSDNTitleUrl(str(i), '0' + str(j), dict)
else:
# print(i, str(j))
getCSDNTitleUrl(str(i), str(j), dict)
list = []
for item in dict.items():
key = item[0]
value = item[1]
# print('%s %s:%s' % (item, key, value))
data = value['data']['list']
if (len(data) == 0): continue
if ('01' in key):
print('\n# {}\n'.format(key[0:4]))
print('\n## {}\n'.format(key))
for obj in data:
print('- [{}]({})'.format(obj['title'].replace('[', '').replace(']', ''), obj['url']))
赞
踩
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。