赞
踩
本次要分享的数字人论文是:A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild,即Wav2Lip模型。
Wav2Lip模型是一个两阶段模型。第一阶段是:训练一个能够判别声音与嘴型是否同步的判别器 ;第二阶段是:采用编码-解码模型结构(一个生成器 ,两个判别器);也可基于GAN的训练方式,在一定程度上会影响同步性,但整体视觉效果稍好。
损失函数:
# 预处理视频数据 1. 使用cv2读取视频 import cv2 import face_alignment # pip install face_alignment fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, device='cpu') # cpu or gpu video_path = "demo.mp4" cap = cv2.VideoCapture(video_path) while cap.isOpened(): ret, frame = cap.read() preds = fa.get_landmarks(frame) # directly extract images from a video # .png的清晰度比.jpg高 ffmpeg -y -i {./video_path/demo.mp4} -r 25 {./save_path/demo/%06d.png} 例如:ffmpeg -y -i D://videos//demo.mp4 -r 25 D://images//demo//%06d.png # 预处理音频 # 使用ffmpeg将采样率转换为16000 ffmpeg -y -i {./demo.mp4} -async 1 -ac 1 -vn -acodec pcm_s16le -ar 16000 {xx.wav} # 下面的操作就是将音频与视频对齐 syncnet_mel_step_size = 16 # 视频的一帧对应音频块的长度是16 def crop_audio_window(self, spec, start_frame): start_frame_num = self.get_frame_id(start_frame) # 编号为 0, 1, 2, ..., n start_idx = int(80. * (start_frame_num / 25.)) # 对应视频帧率25fps下的音频块的起点位置 end_idx = start_idx + syncnet_mel_step_size # 长度为16 return spec[start_idx : end_idx, :]
# 在color_syncnet_train.py中数据集部分
# H x W x 3 * T
x = np.concatenate(window, axis=2) / 255.
x = x.transpose(2, 0, 1)
x = x[:, x.shape[1]//2:] # 仅取下半脸区域(训练用于判断嘴唇与音频是否同步的分类器)
import torch.nn as nn # 加载模型 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") model = SyncNet() model.load_state_dict(torch.load("ckpt_path", map_location="cpu")) model.to(device) model.eval() # 加载输入,选定一个视频,重复多次 vidname = self.all_videos[0] # Dataset.__getitem__() acc = 0 for _, (x, mel, y) in enumerate(test_loader): x = x.to(device) mel = mel.to(device) y = y.to(device) # gt, 0 or 1 a, v = model(mel, x) # a和v的特征 av_sim = nn.functional.cosine_similarity(a, v) # a与v越相似,得分越高 for i in range(a.size(0)): pred_label = 1 if av_sim[i] > 0.5 else 0 if pred_label == y: acc += 1 avg_acc = acc / 总次数
[1] A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild.
本人才疏学浅,未免有理解不到位之处;为方便总结与回顾,特以此方式记录学习过程。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。