MuseTalk如何生成高质量视频（使用技巧）_musetalk 的步骤

作者：凡人多烦事01 | 2024-05-19 20:58:21

踩

musetalk 的步骤

环境：

MuseTalk 2024.4.2

GPU:英伟达4070 12G

问题描述：

MuseTalk如何生成高质量视频（使用技巧）

在这里插入图片描述

解决方案：

MuseTalk was trained in latent spaces, where the images were encoded by a freezed VAE. The audio was encoded by a freezed whisper-tiny model. The architecture of the generation network was borrowed from the UNet of the stable-diffusion-v1-4, where the audio embeddings were fused to the image embeddings by cross-attention.
MuseTalk在潜伏空间中进行训练，图像由冻结的VAE编码。音频由冻结 whisper-tiny 模型编码。生成网络的架构借鉴了 stable-diffusion-v1-4 的 UNet，其中音频嵌入通过交叉注意力融合到图像嵌入中。

Note that although we use a very similar architecture as Stable Diffusion, MuseTalk is distinct in that it is NOT a diffusion model. Instead, MuseTalk operates by inpainting in the latent space with a single

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/凡人多烦事01/article/detail/594627