云端炼丹,算力白嫖,基于云端GPU(Colab)使用So-vits库制作AI特朗普演唱《国际歌》_colab sovits 语音训练

作者：繁依Fanyi0 | 2024-02-18 00:56:57

踩

colab sovits 语音训练

人工智能AI技术早已深入到人们生活的每一个角落，君不见AI孙燕姿的歌声此起彼伏，不绝于耳，但并不是每个人都拥有一块N卡，没有GPU的日子总是不好过的，但是没关系，山人有妙计，本次我们基于Google的Colab免费云端服务器来搭建深度学习环境，制作AI特朗普，让他高唱《国际歌》。

Colab（全名Colaboratory ），它是Google公司的一款基于云端的基础免费服务器产品，可以在B端，也就是浏览器里面编写和执行Python代码，非常方便，贴心的是，Colab可以给用户分配免费的GPU进行使用，对于没有N卡的朋友来说，这已经远远超出了业界良心的范畴，简直就是在做慈善事业。

配置Colab

Colab是基于Google云盘的产品，我们可以将深度学习的Python脚本、训练好的模型、以及训练集等数据直接存放在云盘中，然后通过Colab执行即可。

首先访问Google云盘：drive.google.com

随后点击新建，选择关联更多应用：

接着安装Colab即可：

至此，云盘和Colab就关联好了，现在我们可以新建一个脚本文件my_sovits.ipynb文件，键入代码：

hello colab
1

随后，按快捷键 ctrl + 回车，即可运行代码：

这里需要注意的是，Colab使用的是基于Jupyter Notebook的ipynb格式的Python代码。

Jupyter Notebook是以网页的形式打开，可以在网页页面中直接编写代码和运行代码，代码的运行结果也会直接在代码块下显示。如在编程过程中需要编写说明文档，可在同一个页面中直接编写，便于作及时的说明和解释。

随后设置一下显卡类型：

接着运行命令，查看GPU版本：

!/usr/local/cuda/bin/nvcc --version  
  
!nvidia-smi
1
2
3

程序返回：

nvcc: NVIDIA (R) Cuda compiler driver  
Copyright (c) 2005-2022 NVIDIA Corporation  
Built on Wed_Sep_21_10:33:58_PDT_2022  
Cuda compilation tools, release 11.8, V11.8.89  
Build cuda_11.8.r11.8/compiler.31833905_0  
Tue May 16 04:49:23 2023         
+-----------------------------------------------------------------------------+  
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |  
|-------------------------------+----------------------+----------------------+  
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |  
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |  
|                               |                      |               MIG M. |  
|===============================+======================+======================|  
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |  
| N/A   65C    P8    13W /  70W |      0MiB / 15360MiB |      0%      Default |  
|                               |                      |                  N/A |  
+-------------------------------+----------------------+----------------------+  
                                                                                 
+-----------------------------------------------------------------------------+  
| Processes:                                                                  |  
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |  
|        ID   ID                                                   Usage      |  
|=============================================================================|  
|  No running processes found                                                 |  
+-----------------------------------------------------------------------------+
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

这里建议选择Tesla T4的显卡类型，性能更突出。

至此Colab就配置好了。

配置So-vits

下面我们配置so-vits环境，可以通过pip命令安装一些基础依赖：

!pip install pyworld==0.3.2  
!pip install numpy==1.23.5
1
2

注意jupyter语言是通过叹号来运行命令。

注意，由于不是本地环境，有的时候colab会提醒：

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/  
Collecting numpy==1.23.5  
  Downloading numpy-1.23.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)  
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 80.1 MB/s eta 0:00:00  
Installing collected packages: numpy  
  Attempting uninstall: numpy  
    Found existing installation: numpy 1.22.4  
    Uninstalling numpy-1.22.4:  
      Successfully uninstalled numpy-1.22.4  
Successfully installed numpy-1.23.5  
WARNING: The following packages were previously imported in this runtime:  
  [numpy]  
You must restart the runtime in order to use newly installed versions.
1
2
3
4
5
6
7
8
9
10
11
12
13

此时numpy库需要重启runtime才可以导入操作。

重启runtime后，需要再重新安装一次，直到系统提示依赖已经存在：

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/  
Requirement already satisfied: numpy==1.23.5 in /usr/local/lib/python3.10/dist-packages (1.23.5)
1
2

随后，克隆so-vits项目，并且安装项目的依赖：

import os  
import glob  
!git clone https://github.com/effusiveperiscope/so-vits-svc -b eff-4.0  
os.chdir('/content/so-vits-svc')  
# install requirements one-at-a-time to ignore exceptions  
!cat requirements.txt | xargs -n 1 pip install --extra-index-url https://download.pytorch.org/whl/cu117  
!pip install praat-parselmouth  
!pip install ipywidgets  
!pip install huggingface_hub  
!pip install pip==23.0.1 # fix pip version for fairseq install  
!pip install fairseq==0.12.2  
!jupyter nbextension enable --py widgetsnbextension  
existing_files = glob.glob('/content/**/*.*', recursive=True)  
!pip install --upgrade protobuf==3.9.2  
!pip uninstall -y tensorflow  
!pip install tensorflow==2.11.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

安装好依赖之后，定义一些前置工具方法：

os.chdir('/content/so-vits-svc') # force working-directory to so-vits-svc - this line is just for safety and is probably not required  
  
import tarfile  
import os  
from zipfile import ZipFile  
# taken from https://github.com/CookiePPP/cookietts/blob/master/CookieTTS/utils/dataset/extract_unknown.py  
def extract(path):  
    if path.endswith(".zip"):  
        with ZipFile(path, 'r') as zipObj:  
           zipObj.extractall(os.path.split(path)[0])  
    elif path.endswith(".tar.bz2"):  
        tar = tarfile.open(path, "r:bz2")  
        tar.extractall(os.path.split(path)[0])  
        tar.close()  
    elif path.endswith(".tar.gz"):  
        tar = tarfile.open(path, "r:gz")  
        tar.extractall(os.path.split(path)[0])  
        tar.close()  
    elif path.endswith(".tar"):  
        tar = tarfile.open(path, "r:")  
        tar.extractall(os.path.split(path)[0])  
        tar.close()  
    elif path.endswith(".7z"):  
        import py7zr  
        archive = py7zr.SevenZipFile(path, mode='r')  
        archive.extractall(path=os.path.split(path)[0])  
        archive.close()  
    else:  
        raise NotImplementedError(f"{path} extension not implemented.")  
  
# taken from https://github.com/CookiePPP/cookietts/tree/master/CookieTTS/_0_download/scripts  
  
# megatools download urls  
win64_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-win64.zip"  
win32_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-win32.zip"  
linux_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-linux-x86_64.tar.gz"  
# download megatools  
from sys import platform  
import os  
import urllib.request  
import subprocess  
from time import sleep  
  
if platform == "linux" or platform == "linux2":  
        dl_url = linux_url  
elif platform == "darwin":  
    raise NotImplementedError('MacOS not supported.')  
elif platform == "win32":  
        dl_url = win64_url  
else:  
    raise NotImplementedError ('Unknown Operating System.')  
  
dlname = dl_url.split("/")[-1]  
if dlname.endswith(".zip"):  
    binary_folder = dlname[:-4] # remove .zip  
elif dlname.endswith(".tar.gz"):  
    binary_folder = dlname[:-7] # remove .tar.gz  
else:  
    raise NameError('downloaded megatools has unknown archive file extension!')  
  
if not os.path.exists(binary_folder):  
    print('"megatools" not found. Downloading...')  
    if not os.path.exists(dlname):  
        urllib.request.urlretrieve(dl_url, dlname)  
    assert os.path.exists(dlname), 'failed to download.'  
    extract(dlname)  
    sleep(0.10)  
    os.unlink(dlname)  
    print("Done!")  
  
  
binary_folder = os.path.abspath(binary_folder)  
  
def megadown(download_link, filename='.', verbose=False):  
    """Use megatools binary executable to download files and folders from MEGA.nz ."""  
    filename = ' --path "'+os.path.abspath(filename)+'"' if filename else ""  
    wd_old = os.getcwd()  
    os.chdir(binary_folder)  
    try:  
        if platform == "linux" or platform == "linux2":  
            subprocess.call(f'./megatools dl{filename}{" --debug http" if verbose else ""} {download_link}', shell=True)  
        elif platform == "win32":  
            subprocess.call(f'megatools.exe dl{filename}{" --debug http" if verbose else ""} {download_link}', shell=True)  
    except:  
        os.chdir(wd_old) # don't let user stop download without going back to correct directory first  
        raise  
    os.chdir(wd_old)  
    return filename  
  
import urllib.request  
from tqdm import tqdm  
import gdown  
from os.path import exists  
  
def request_url_with_progress_bar(url, filename):  
    class DownloadProgressBar(tqdm):  
        def update_to(self, b=1, bsize=1, tsize=None):  
            if tsize is not None:  
                self.total = tsize  
            self.update(b * bsize - self.n)  
      
    def download_url(url, filename):  
        with DownloadProgressBar(unit='B', unit_scale=True,  
                                 miniters=1, desc=url.split('/')[-1]) as t:  
            filename, headers = urllib.request.urlretrieve(url, filename=filename, reporthook=t.update_to)  
            print("Downloaded to "+filename)  
    download_url(url, filename)  
  
  
def download(urls, dataset='', filenames=None, force_dl=False, username='', password='', auth_needed=False):  
    assert filenames is None or len(urls) == len(filenames), f"number of urls does not match filenames. Expected {len(filenames)} urls, containing the files listed below.\n{filenames}"  
    assert not auth_needed or (len(username) and len(password)), f"username and password needed for {dataset} Dataset"  
    if filenames is None:  
        filenames = [None,]*len(urls)  
    for i, (url, filename) in enumerate(zip(urls, filenames)):  
        print(f"Downloading File from {url}")  
        #if filename is None:  
        #    filename = url.split("/")[-1]  
        if filename and (not force_dl) and exists(filename):  
            print(f"{filename} Already Exists, Skipping.")  
            continue  
        if 'drive.google.com' in url:  
            assert 'https://drive.google.com/uc?id=' in url, 'Google Drive links should follow the format "https://drive.google.com/uc?id=1eQAnaoDBGQZldPVk-nzgYzRbcPSmnpv6".\nWhere id=XXXXXXXXXXXXXXXXX is the Google Drive Share ID.'  
            gdown.download(url, filename, quiet=False)  
        elif 'mega.nz' in url:  
            megadown(url, filename)  
        else:  
            #urllib.request.urlretrieve(url, filename=filename) # no progress bar  
            request_url_with_progress_bar(url, filename) # with progress bar  
  
import huggingface_hub  
import os  
import shutil  
  
class HFModels:  
    def __init__(self, repo = "therealvul/so-vits-svc-4.0",   
            model_dir = "hf_vul_models"):  
        self.model_repo = huggingface_hub.Repository(local_dir=model_dir,  
            clone_from=repo, skip_lfs_files=True)  
        self.repo = repo  
        self.model_dir = model_dir  
  
        self.model_folders = os.listdir(model_dir)  
        self.model_folders.remove('.git')  
        self.model_folders.remove('.gitattributes')  
  
    def list_models(self):  
        return self.model_folders  
  
    # Downloads model;  
    # copies config to target_dir and moves model to target_dir  
    def download_model(self, model_name, target_dir):  
        if not model_name in self.model_folders:  
            raise Exception(model_name + " not found")  
        model_dir = self.model_dir  
        charpath = os.path.join(model_dir,model_name)  
  
        gen_pt = next(x for x in os.listdir(charpath) if x.startswith("G_"))  
        cfg = next(x for x in os.listdir(charpath) if x.endswith("json"))  
        try:  
          clust = next(x for x in os.listdir(charpath) if x.endswith("pt"))  
        except StopIteration as e:  
          print("Note - no cluster model for "+model_name)  
          clust = None  
  
        if not os.path.exists(target_dir):  
            os.makedirs(target_dir, exist_ok=True)  
  
        gen_dir = huggingface_hub.hf_hub_download(repo_id = self.repo,  
            filename = model_name + "/" + gen_pt) # this is a symlink  
          
        if clust is not None:  
          clust_dir = huggingface_hub.hf_hub_download(repo_id = self.repo,  
              filename = model_name + "/" + clust) # this is a symlink  
          shutil.move(os.path.realpath(clust_dir), os.path.join(target_dir, clust))  
          clust_out = os.path.join(target_dir, clust)  
        else:  
          clust_out = None  
  
        shutil.copy(os.path.join(charpath,cfg),os.path.join(target_dir, cfg))  
        shutil.move(os.path.realpath(gen_dir), os.path.join(target_dir, gen_pt))  
  
        return {"config_path": os.path.join(target_dir,cfg),  
            "generator_path": os.path.join(target_dir,gen_pt),  
            "cluster_path": clust_out}  
  
# Example usage  
# vul_models = HFModels()  
# print(vul_models.list_models())  
# print("Applejack (singing)" in vul_models.list_models())  
# vul_models.download_model("Applejack (singing)","models/Applejack (singing)")  
  
    print("Finished!")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193

这些方法可以帮助我们下载、解压和加载模型。

音色模型下载和线上推理

接着将特朗普的音色模型和配置文件进行下载，下载地址是：

https://huggingface.co/Nardicality/so-vits-svc-4.0-models/tree/main/Trump18.5k
1

随后模型文件放到项目的models文件夹，配置文件则放入config文件夹。

接着将需要转换的歌曲上传到和项目平行的目录中。

运行代码：

import os  
import glob  
import json  
import copy  
import logging  
import io  
from ipywidgets import widgets  
from pathlib import Path  
from IPython.display import Audio, display  
  
os.chdir('/content/so-vits-svc')  
  
import torch  
from inference import infer_tool  
from inference import slicer  
from inference.infer_tool import Svc  
import soundfile  
import numpy as np  
  
MODELS_DIR = "models"  
  
def get_speakers():  
  speakers = []  
  for _,dirs,_ in os.walk(MODELS_DIR):  
    for folder in dirs:  
      cur_speaker = {}  
      # Look for G_****.pth  
      g = glob.glob(os.path.join(MODELS_DIR,folder,'G_*.pth'))  
      if not len(g):  
        print("Skipping "+folder+", no G_*.pth")  
        continue  
      cur_speaker["model_path"] = g[0]  
      cur_speaker["model_folder"] = folder  
  
      # Look for *.pt (clustering model)  
      clst = glob.glob(os.path.join(MODELS_DIR,folder,'*.pt'))  
      if not len(clst):  
        print("Note: No clustering model found for "+folder)  
        cur_speaker["cluster_path"] = ""  
      else:  
        cur_speaker["cluster_path"] = clst[0]  
  
      # Look for config.json  
      cfg = glob.glob(os.path.join(MODELS_DIR,folder,'*.json'))  
      if not len(cfg):  
        print("Skipping "+folder+", no config json")  
        continue  
      cur_speaker["cfg_path"] = cfg[0]  
      with open(cur_speaker["cfg_path"]) as f:  
        try:  
          cfg_json = json.loads(f.read())  
        except Exception as e:  
          print("Malformed config json in "+folder)  
        for name, i in cfg_json["spk"].items():  
          cur_speaker["name"] = name  
          cur_speaker["id"] = i  
          if not name.startswith('.'):  
            speakers.append(copy.copy(cur_speaker))  
  
    return sorted(speakers, key=lambda x:x["name"].lower())  
  
logging.getLogger('numba').setLevel(logging.WARNING)  
chunks_dict = infer_tool.read_temp("inference/chunks_temp.json")  
existing_files = []  
slice_db = -40  
wav_format = 'wav'  
  
class InferenceGui():  
  def __init__(self):  
    self.speakers = get_speakers()  
    self.speaker_list = [x["name"] for x in self.speakers]  
    self.speaker_box = widgets.Dropdown(  
        options = self.speaker_list  
    )  
    display(self.speaker_box)  
  
    def convert_cb(btn):  
      self.convert()  
    def clean_cb(btn):  
      self.clean()  
  
    self.convert_btn = widgets.Button(description="Convert")  
    self.convert_btn.on_click(convert_cb)  
    self.clean_btn = widgets.Button(description="Delete all audio files")  
    self.clean_btn.on_click(clean_cb)  
  
    self.trans_tx = widgets.IntText(value=0, description='Transpose')  
    self.cluster_ratio_tx = widgets.FloatText(value=0.0,   
      description='Clustering Ratio')  
    self.noise_scale_tx = widgets.FloatText(value=0.4,   
      description='Noise Scale')  
    self.auto_pitch_ck = widgets.Checkbox(value=False, description=  
      'Auto pitch f0 (do not use for singing)')  
  
    display(self.trans_tx)  
    display(self.cluster_ratio_tx)  
    display(self.noise_scale_tx)  
    display(self.auto_pitch_ck)  
    display(self.convert_btn)  
    display(self.clean_btn)  
  
  def convert(self):  
    trans = int(self.trans_tx.value)  
    speaker = next(x for x in self.speakers if x["name"] ==   
          self.speaker_box.value)  
    spkpth2 = os.path.join(os.getcwd(),speaker["model_path"])  
    print(spkpth2)  
    print(os.path.exists(spkpth2))  
  
    svc_model = Svc(speaker["model_path"], speaker["cfg_path"],   
      cluster_model_path=speaker["cluster_path"])  
      
    input_filepaths = [f for f in glob.glob('/content/**/*.*', recursive=True)  
     if f not in existing_files and   
     any(f.endswith(ex) for ex in ['.wav','.flac','.mp3','.ogg','.opus'])]  
    for name in input_filepaths:  
      print("Converting "+os.path.split(name)[-1])  
      infer_tool.format_wav(name)  
  
      wav_path = str(Path(name).with_suffix('.wav'))  
      wav_name = Path(name).stem  
      chunks = slicer.cut(wav_path, db_thresh=slice_db)  
      audio_data, audio_sr = slicer.chunks2audio(wav_path, chunks)  
  
      audio = []  
      for (slice_tag, data) in audio_data:  
          print(f'#=====segment start, '  
              f'{round(len(data)/audio_sr, 3)}s======')  
            
          length = int(np.ceil(len(data) / audio_sr *  
              svc_model.target_sample))  
            
          if slice_tag:  
              print('jump empty segment')  
              _audio = np.zeros(length)  
          else:  
              # Padding "fix" for noise  
              pad_len = int(audio_sr * 0.5)  
              data = np.concatenate([np.zeros([pad_len]),  
                  data, np.zeros([pad_len])])  
              raw_path = io.BytesIO()  
              soundfile.write(raw_path, data, audio_sr, format="wav")  
              raw_path.seek(0)  
              _cluster_ratio = 0.0  
              if speaker["cluster_path"] != "":  
                _cluster_ratio = float(self.cluster_ratio_tx.value)  
              out_audio, out_sr = svc_model.infer(  
                  speaker["name"], trans, raw_path,  
                  cluster_infer_ratio = _cluster_ratio,  
                  auto_predict_f0 = bool(self.auto_pitch_ck.value),  
                  noice_scale = float(self.noise_scale_tx.value))  
              _audio = out_audio.cpu().numpy()  
              pad_len = int(svc_model.target_sample * 0.5)  
              _audio = _audio[pad_len:-pad_len]  
          audio.extend(list(infer_tool.pad_array(_audio, length)))  
            
      res_path = os.path.join('/content/',  
          f'{wav_name}_{trans}_key_'  
          f'{speaker["name"]}.{wav_format}')  
      soundfile.write(res_path, audio, svc_model.target_sample,  
          format=wav_format)  
      display(Audio(res_path, autoplay=True)) # display audio file  
    pass  
  
  def clean(self):  
     input_filepaths = [f for f in glob.glob('/content/**/*.*', recursive=True)  
     if f not in existing_files and   
     any(f.endswith(ex) for ex in ['.wav','.flac','.mp3','.ogg','.opus'])]  
     for f in input_filepaths:  
       os.remove(f)  
  
inference_gui = InferenceGui()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172

此时系统会自动在根目录，也就是content下寻找音乐文件，包含但不限于wav、flac、mp3等等，随后根据下载的模型进行推理，推理之前会自动对文件进行背景音分离以及降噪和切片等操作。

推理结束之后，会自动播放转换后的歌曲。

结语

如果是刚开始使用Colab，默认分配的显存是15G左右，完全可以胜任大多数训练和推理任务，但是如果经常用它挂机运算，能分配到的显卡配置就会渐进式地降低，如果需要长时间并且相对稳定的GPU资源，还是需要付费订阅Colab pro服务，另外Google云盘的免费使用空间也是15G，如果模型下多了，导致云盘空间不足，运行代码也会报错，所以最好定期清理Google云盘，以此保证深度学习任务的正常运行。

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/103051