python爬取保存txt_Python爬取新笔趣阁小说，并保存到TXT文件中

作者：从前慢现在也慢 | 2024-03-30 06:39:58

踩

笔趣阁小说怎么转txt文件

大概思路

1：获取网页源代码

2：获取每章的url

3：获取每章的内容

4：下载保存文件中

完整的代码


import requests
import re
'''
遇到不懂的问题？Python学习交流群：821460695满足你的需求，资料都已经上传群文件，可以自行下载！
'''
s = requests.Session()
url = 'https://www.xxbiquge.com/2_2634/'
html = s.get(url)
html.encoding = 'utf-8'
 
# 获取章节
caption_title_1 = re.findall(r'<a href="(/2_2634/.*?.html)">.*?</a>',html.text)
 
# 写文件
path = r'C:UsersAdministratorPycharmProjectsuntitledtitle.txt'     # 这是我存放的位置，你可以进行更改
file_name = open(path,'a',encoding='utf-8')
 
# 循环下载每一张
for i in caption_title_1:
   caption_title_1 = 'https://www.xxbiquge.com'+i
   # 网页源代码
   s1 = requests.Session()
   r1 = s1.get(caption_title_1)
   r1.encoding = 'utf-8'
 
   # 获取章节名
   name = re.findall(r'<meta name="keywords" content="(.*?)" />',r1.text)[0]
   print(name)
 
   file_name.write(name)
   file_name.write('n')
 
   # 获取章节内容
   chapters = re.findall(r'<div id="content">(.*?)</div>',r1.text,re.S)[0]
   chapters = chapters.replace(' ', '')
   chapters = chapters.replace('readx();', '')
   chapters = chapters.replace('& lt;!--go - - & gt;', '')
   chapters = chapters.replace('&lt;!--go--&gt;', '')
   chapters = chapters.replace('()', '')
   # 转换字符串
   s = str(chapters)
   s_replace = s.replace('<br/>',"n")
   while True:
       index_begin = s_replace.find("<")
       index_end = s_replace.find(">",index_begin+1)
       if index_begin == -1:
           break
       s_replace = s_replace.replace(s_replace[index_begin:index_end+1],"")
   pattern = re.compile(r'&nbsp;',re.I)
   fiction = pattern.sub(' ',s_replace)
   file_name.write(fiction)
   file_name.write('n')
 
file_name.close()
--------------------- 
作者：「风韵--伟」的原创文章
原文链接：https://blog.csdn.net/qq_37592047/article/details/83243723

本文内容由网友自发贡献，转载请注明出处：【wpsshop博客】