赞
踩
1:获取网页源代码
2:获取每章的url
3:获取每章的内容
4:下载保存文件中
- import requests
- import re
- '''
- 遇到不懂的问题?Python学习交流群:821460695满足你的需求,资料都已经上传群文件,可以自行下载!
- '''
- s = requests.Session()
- url = 'https://www.xxbiquge.com/2_2634/'
- html = s.get(url)
- html.encoding = 'utf-8'
-
- # 获取章节
- caption_title_1 = re.findall(r'<a href="(/2_2634/.*?.html)">.*?</a>',html.text)
-
- # 写文件
- path = r'C:UsersAdministratorPycharmProjectsuntitledtitle.txt' # 这是我存放的位置,你可以进行更改
- file_name = open(path,'a',encoding='utf-8')
-
- # 循环下载每一张
- for i in caption_title_1:
- caption_title_1 = 'https://www.xxbiquge.com'+i
- # 网页源代码
- s1 = requests.Session()
- r1 = s1.get(caption_title_1)
- r1.encoding = 'utf-8'
-
- # 获取章节名
- name = re.findall(r'<meta name="keywords" content="(.*?)" />',r1.text)[0]
- print(name)
-
- file_name.write(name)
- file_name.write('n')
-
- # 获取章节内容
- chapters = re.findall(r'<div id="content">(.*?)</div>',r1.text,re.S)[0]
- chapters = chapters.replace(' ', '')
- chapters = chapters.replace('readx();', '')
- chapters = chapters.replace('& lt;!--go - - & gt;', '')
- chapters = chapters.replace('<!--go-->', '')
- chapters = chapters.replace('()', '')
- # 转换字符串
- s = str(chapters)
- s_replace = s.replace('<br/>',"n")
- while True:
- index_begin = s_replace.find("<")
- index_end = s_replace.find(">",index_begin+1)
- if index_begin == -1:
- break
- s_replace = s_replace.replace(s_replace[index_begin:index_end+1],"")
- pattern = re.compile(r' ',re.I)
- fiction = pattern.sub(' ',s_replace)
- file_name.write(fiction)
- file_name.write('n')
-
- file_name.close()
- ---------------------
- 作者:「风韵--伟」的原创文章
- 原文链接:https://blog.csdn.net/qq_37592047/article/details/83243723

Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。