赞
踩
最近开源的ChatTTS十分的火爆,而且效果非常不错,普通的个人电脑就能跑得动,总是想拿它来做点什么,顺便看看效果是不是像网上说的那么理想。刚好最近《我的阿勒泰》也比较火爆,用它来朗读朗读李娟的文字看看效果如何?
那就开始吧。
《我所能带给你们的事物》是散文集《我的阿勒泰》的第一篇文章,讲得是从乌鲁木齐带东西回老家的一些事情,在网上找一本我的阿勒泰.epub,用程序把他读取出来,并把他分段。
# Load the EPUB file
book = epub.read_epub('a.epub')
book_text = ""
# Extract the text from each item in the EPUB
for item in book.get_items():
if item.get_type() == ebooklib.ITEM_DOCUMENT and item.id == "x_chapter002":
# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(item.get_content(), 'html.parser')
# Extract and print the text
book_text = soup.get_text()
print(soup.get_text())
为了字幕和语音效果,这里不仅做了分段,还限制每个块的长度:
def split_text(text, max_length=50): # 按段落分割 paragraphs = text.split('\n') result = [] for paragraph in paragraphs: # 按标点符号分割 sentences = re.split(r'([。!?;,,.!?;])', paragraph) chunk = '' for sentence in sentences: if sentence: # 检查是否超过最大长度 if len(chunk) + len(sentence) <= max_length: chunk += sentence else: result.append(chunk.strip()) chunk = sentence if chunk: result.append(chunk.strip()) return result
对每个块先进行语气的推理,用固定的女生,生成语音:
chat = ChatTTS.Chat() chat.load_models(compile=False) # Set to True for better performance # 输出结果 for i,chunk in enumerate(chunks): print(chunk) # Define the text input for inference (Support Batching) text = chunk torch.manual_seed(2) rand_spk = chat.sample_random_speaker() params_infer_code = { 'spk_emb': rand_spk, 'temperature': 0.3, 'top_P': 0.7, 'top_K': 20, } params_refine_text = {'prompt': '[oral_2][laugh_0][break_6]'} torch.manual_seed(42) if True: text = chat.infer(text, skip_refine_text=False, refine_text_only=True, params_refine_text=params_refine_text, params_infer_code=params_infer_code ) wavs = chat.infer(text, skip_refine_text=True, params_refine_text=params_refine_text, params_infer_code=params_infer_code ) # Assuming audio_data is a numpy array and the sample rate is 24000 Hz audio_data = np.array(wavs[0]).flatten() sample_rate = 24000 # Specify the output file name output_file = "wavs\\" +str(i)+'-output_audio.wav' # Save the audio data to a WAV file write(output_file, sample_rate, audio_data) print(f"Audio saved to {output_file}")
语音放在wavs
的目录下:
把wavs
的目录下的所有语音,连成mp3:
def merge_wav_to_mp3(input_folder, output_file): # 获取所有 WAV 文件的路径 wav_files = [os.path.join(input_folder, f) for f in os.listdir(input_folder) if f.endswith('.wav')] # 确保有 WAV 文件 if not wav_files: print("没有找到任何 WAV 文件。") return wav_files.sort(key=os.path.getmtime) # 初始化一个空的音频段 combined = AudioSegment.empty() # 依次加载并合并所有 WAV 文件 for wav_file in wav_files: print(wav_file) audio = AudioSegment.from_wav(wav_file) combined += audio # 导出合并后的音频为 MP3 文件 combined.export(output_file, format='mp3') print(f"合并后的 MP3 文件已保存到 {output_file}")
找一张图,用图和mp3生成一段mp4:
from moviepy.editor import *
# 加载音频文件
audio = AudioFileClip("combined_audio.mp3")
# 加载图像文件,并设置持续时间与音频长度相同
image = ImageClip("a.webp").set_duration(audio.duration)
# 设置图像的音频为加载的音频文件
video = image.set_audio(audio)
# 导出视频文件
video.write_videofile("a.mp4", codec="libx264", audio_codec="aac",fps=24)
公众号:每日AI新工具
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。