当前位置:   article > 正文



code:GitHub - sstzal/DiffTalk: [CVPR2023] The implementation for "DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation"


问题1. ERROR: Failed building wheel for pysptk

  1. Cython.Compiler.Errors.CompileError: pysptk/_sptk.pyx
  2. [end of output]
  3. note: This error originates from a subprocess, and is likely not a problem with pip.
  4. ERROR: Failed building wheel for pysptk
  5. Failed to build pysptk
  6. ERROR: Could not build wheels for pysptk, which is required to install pyproject.toml-based projects

问题 2. ModuleNotFoundError: No module named 'ldm'

  1. Traceback (most recent call last):
  2. File "scripts/inference.py", line 7, in <module>
  3. from ldm.util import instantiate_from_config
  4. ModuleNotFoundError: No module named 'ldm'

问题3. 安装依赖失败

pip install -e git+https://github.com/CompVis/taming-transformers.git@24268930bf1dce879235a7fddd0b2355b84d7ea6#egg=taming_transformers

  1. Obtaining clip from git+https://github.com/openai/CLIP.git@d50d76daa670286dd6cacf3bcd80b5e4823fc8e1#egg=clip
  2. Cloning https://github.com/openai/CLIP.git (to revision d50d76daa670286dd6cacf3bcd80b5e4823fc8e1) to ./src/clip
  3. Running command git clone --filter=blob:none --quiet https://github.com/openai/CLIP.git /data/DiffTalk-main/src/clip
  4. fatal: unable to access 'https://github.com/openai/CLIP.git/': Failed to connect to github.com port 443: Connection timed out
  5. error: subprocess-exited-with-error
  6. × git clone --filter=blob:none --quiet https://github.com/openai/CLIP.git /data/DiffTalk-main/src/clip did not run successfully.
  7. │ exit code: 128
  8. ╰─> See above for output.
  9. note: This error originates from a subprocess, and is likely not a problem with pip.
  10. error: subprocess-exited-with-error
  11. × git clone --filter=blob:none --quiet https://github.com/openai/CLIP.git /data/DiffTalk-main/src/clip did not run successfully.
  12. │ exit code: 128
  13. ╰─> See above for output.
  14. note: This error originates from a subprocess, and is likely not a problem with pip.


pip install -e git+https://github.com/CompVis/taming-transformers.git@24268930bf1dce879235a7fddd0b2355b84d7ea6#egg=taming_transformers -i https://pypi.douban.com/simple/


问题4. 下载HDTF数据集

在github上找到了别人写的下载脚本:Modified version of the script to download both audio and video by yukyeongleee · Pull Request #11 · MRzzm/HDTF · GitHub

下载tqdm 和youtube-dl库即可

但是报错:connect:network is unreachable


报错2:ERROR: Unable to extract uploader id

ERROR: Unable to extract uploader id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

解决:参考youtube-dl报错解决_error: unable to extract uploader id;-CSDN博客


  1. def download_video(video_id, download_path, resolution: int=None, video_format="mp4", log_file=None):
  2. """
  3. Download video from YouTube.
  4. :param video_id: YouTube ID of the video.
  5. :param download_path: Where to save the video.
  6. :param video_format: Format to download.
  7. :param log_file: Path to a log file for youtube-dl.
  8. :return: Tuple: path to the downloaded video and a bool indicating success.
  9. Copy-pasted from https://github.com/ytdl-org/youtube-dl
  10. """
  11. # if os.path.isfile(download_path): return True # File already exists
  12. if log_file is None:
  13. stderr = subprocess.DEVNULL
  14. else:
  15. stderr = open(log_file, "a")
  16. # video_selection = f"bestvideo[ext={video_format}]"
  17. # video_selection = video_selection if resolution is None else f"{video_selection}[height={resolution}]"
  18. video_selection = f"bestvideo[ext={video_format}]+bestaudio[ext=m4a]/best[ext={video_format}]"
  19. video_selection = video_selection if resolution is None else f"{video_selection}[height={resolution}]"
  20. command = [
  21. "yt-dlp",
  22. "https://youtube.com/watch?v={}".format(video_id), "--quiet", "-f",
  23. video_selection,
  24. "--output", download_path,
  25. "--no-continue"
  26. ]
  27. return_code = subprocess.call(command, stderr=stderr)
  28. success = return_code == 0
  29. if log_file is not None:
  30. stderr.close()
  31. return success and os.path.isfile(download_path)



安装DiffTalk的要求,还要对视频进一步处理:设置为25 fps;提取音频信号和面部特征点。





CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --base configs/latent-diffusion/talking.yaml -t --gpus 0,1,2,3,4,5,6,7,

