赞
踩
刚刚,我花了10分钟,写了三行代码创建一个具有明星脸的虚拟主播
先看看效果:
定制数字人-虚拟主播
实现简易的虚拟数字人非常简单,需要调用三个模型:
(1)First Order Motion(表情迁移)
(2)Text to Speech(文本转语音)
(2)Wav2Lip(唇形合成)。
具体技术步骤如下:
1,把图像放入First Order Motion模型进行面部表情迁移,让虚拟主播的表情更加逼近真人,既然定位是一个主播,那表情都参考当然是要用“国家级标准”的,所以参考的对象选择了梓萌老师~
2,通过Text to Speech模型,将输入的文字转换成音频输出。
3,得到面部表情迁移的视频和音频之后,通过Wav2Lip模型,将音频和视频合并,并根据音频内容调整唇形,使得虚拟人更加接近真人效果。
In [ ]
- # 升级PaddleHub
- !pip install --upgrade paddlehub
In [ ]
- # 下载nltk_data
- !wget https://paddlespeech.bj.bcebos.com/Parakeet/tools/nltk_data.tar.gz
- !tar zxvf nltk_data.tar.gz
In [ ]
- # 安装ParaKeet
- %cd Parakeet/
- !pip install -e.
- %cd ..
In [ ]
- # 安装依赖
- !hub install first_order_motion==1.0.0
- !hub install wav2lip
- !hub install fastspeech2_baker==1.0.0
Download https://bj.bcebos.com/paddlehub/paddlehub_dev/first_order_motion.tar.gz [##################################################] 100.00% Decompress /home/aistudio/.paddlehub/tmp/tmphcuxe0xl/first_order_motion.tar.gz [##################################################] 100.00% [2022-01-07 15:32:45,388] [ INFO] - Installing dependent packages from /home/aistudio/.paddlehub/tmp/tmpvtfv5cjp/first_order_motion/requirements.txt: /
通过FOM模型,输入图像和驱动视频,让人像动起来!
In [6]
- import cv2
- import os
-
- files = os.listdir('input_data/img/')
-
- for f in files:
- img = cv2.imread('input_data/img/'+f)
- imgshape = img.shape
- resimg = cv2.resize(img,(int(img.shape[1]/2),int(img.shape[0]/2)))
- cv2.imwrite('input_data/'+f,resimg)
In [1]
- import paddlehub as hub
-
- FOM_Module = hub.Module(name="first_order_motion")
- FOM_Module.generate(source_image="input_data/t5.jpeg", # 输入图像
- driving_video="input_data/zimeng.mp4", # 输入驱动视频
- ratio=0.4,
- image_size=256,
- output_dir='./output/', # 输出文件夹
- filename='FOM.mp4', # 输出文件名
- use_gpu=True)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import MutableMapping /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import Iterable, Mapping /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import Sized W0108 00:26:51.097970 19449 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1 W0108 00:26:51.104054 19449 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[01/08 00:26:56] ppgan INFO: Found /home/aistudio/.cache/ppgan/GPEN-512.pdparams 1 persons have been detected
100%|██████████| 300/300 [00:36<00:00, 8.31it/s]
输入你想让虚拟数字人说的话,转换生成一段音频。
In [4]
- sentences = ['开发者你好,欢迎使用飞桨,我是你的专属虚拟人。'] # 输入说话内容
-
- TTS_Module = hub.Module(
- name='fastspeech2_baker',
- version='1.0.0')
- wav_files = TTS_Module.generate(sentences)
- print(f'声音已生成,音频文件输出在{wav_files}')
[2022-01-07 15:48:35,288] [ INFO] - Load fastspeech2 params from /home/aistudio/.paddlehub/modules/fastspeech2_baker/assets/fastspeech2_nosil_baker_ckpt_0.4/snapshot_iter_76000.pdz [2022-01-07 15:48:35,671] [ INFO] - Load vocoder params from /home/aistudio/.paddlehub/modules/fastspeech2_baker/assets/pwg_baker_ckpt_0.4/pwg_snapshot_iter_400000.pdz Building prefix dict from the default dictionary ... [2022-01-07 15:48:35] [DEBUG] [__init__.py:113] Building prefix dict from the default dictionary ... Dumping model to file cache /tmp/jieba.cache [2022-01-07 15:48:36] [DEBUG] [__init__.py:147] Dumping model to file cache /tmp/jieba.cache Loading model cost 0.791 seconds. [2022-01-07 15:48:36] [DEBUG] [__init__.py:165] Loading model cost 0.791 seconds. Prefix dict has been built successfully. [2022-01-07 15:48:36] [DEBUG] [__init__.py:166] Prefix dict has been built successfully. [2022-01-07 15:48:48,064] [ INFO] - 1 wave files have been generated in /home/aistudio/wavs
声音已生成,音频文件输出在['/home/aistudio/wavs/1.wav']
把刚刚得到的动态视频和音频文件输入到Wav2Lip模型中,让唇形根据说话的内容动态改变。
In [2]
- import paddlehub as hub
- W2F_Module = hub.Module(name="wav2lip")
-
- W2F_Module.wav2lip_transfer(face='output/FOM.mp4',
- audio='wavs/bo.wav',
- output_dir='./transfer_result/',
- use_gpu=True)
虚拟的主播的图片可以随意更改,这里随便百度了两张明星图片,原图就不放了,看下一些效果
最后效果如文章开头所示。
最终效果加入了GFGP进行图像超分辨率,后续会出详细教程,也欢迎关注B站账号后续更新视频教程。AI小白龙的个人空间-AI小白龙个人主页-哔哩哔哩视频
---------------------------------------------
写在最后
本文所述代码可在AI Stdio 可直接运行,欢迎fork paddle虚拟数字主播播新闻 - 飞桨AI Studio - 人工智能学习与实训社区
新手注册点这里可领100点算力卡: AI Studio注册
其他数字人教程:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。