赞
踩
soundfile常用于音频文件读写:
import soundfile as sf
data, samplerate = sf.read('existing_file.wav')
sf.write('new_file.flac', data, samplerate)
#sf.write('new_file.wav', data, samplerate)
flac是一种无损压缩音乐格式
librosa是常用的音频处理库,注意librosa安装时要先装ffmpeg。在docker ubuntu中安装:
apt-get update && apt-get install -y ffmpeg
pip install librosa
我这次主要用librosa来进行加载音频和重采样:
wav,sr = librosa.load(file,sr=44100)
drums = librosa.resample(drums, orig_sr=44100, target_sr=22050, fix=True, scale=False)
此外,librosa还常用来提取音频特征,例如梅尔频谱和梅尔倒谱:
librosa.feature.melspectrogram()
librosa.feature.mfcc()
pydub是一个非常强大的音频处理和编辑工具。我这次主要用来增加减少音强,和合成多个音轨:
#合成音轨
bass = AudioSegment.from_file('bass.wav').set_frame_rate(22050).set_channels(1)
other = AudioSegment.from_file('other.wav').set_frame_rate(22050).set_channels(1)
vocals = AudioSegment.from_file('vocals.wav').set_frame_rate(22050).set_channels(1)
NoDrum_audio = bass.overlay(other).overlay(vocals)
nodrum_wav = np.frombuffer(NoDrum_audio.raw_data,np.short)/32768
#增加5个分贝
NoDrum_audio_5 = NoDrum_audio + 5
使用时必须要先加载成AudioSegment类的数据,然后再用overlay,合成后可以用上面np.frombuffer的方式再转换为和librosa加载后相同的numpy数组。
madmom是一个强大的音乐分析工具,专用来分析音乐。我这次主要用它来提取特征和使用它的HMM包来提取beat和downbeat(即音乐节拍和强拍)并计算多种评价指标得分。
#提取特征,这里用madmom的多线程器提取频谱特征以及频谱的一阶差分, #分成三组参数[1024, 2048, 4096],[3, 6, 12]提取后再合并。 from madmom.audio.signal import SignalProcessor, FramedSignalProcessor from madmom.audio.stft import ShortTimeFourierTransformProcessor from madmom.audio.spectrogram import ( FilteredSpectrogramProcessor, LogarithmicSpectrogramProcessor, SpectrogramDifferenceProcessor) from madmom.processors import ParallelProcessor, Processor, SequentialProcessor def madmom_feature(wav): sig = SignalProcessor(num_channels=1, sample_rate=44100 ) multi = ParallelProcessor([]) frame_sizes = [1024, 2048, 4096] num_bands = [3, 6, 12] for frame_size, num_bands in zip(frame_sizes, num_bands): frames = FramedSignalProcessor(frame_size=frame_size, fps=100) stft = ShortTimeFourierTransformProcessor() # caching FFT window filt = FilteredSpectrogramProcessor( num_bands=num_bands, fmin=30, fmax=17000, norm_filters=True) spec = LogarithmicSpectrogramProcessor(mul=1, add=1) diff = SpectrogramDifferenceProcessor( diff_ratio=0.5, positive_diffs=True, stack_diffs=np.hstack) # process each frame size with spec and diff sequentially multi.append(SequentialProcessor((frames, stft, filt, spec, diff))) # stack the features and processes everything sequentially pre_processor = SequentialProcessor((sig, multi, np.hstack)) feature = pre_processor.process( wav) return feature
下面用madmom自带的HMM模块处理beat和downbeat联合检测算法生成的激活值
from madmom.features.downbeats import DBNDownBeatTrackingProcessor as DownBproc
hmm_proc = DownBproc(beats_per_bar = [3,4], num_tempi = 80,
transition_lambda = 180,
observation_lambda = 21,
threshold = 0.5, fps = 100)
#act是用神经网络等音频节拍检测算法处理得到的激活值
beat_fuser_est = hmm_proc(act)
beat_pred = beat_fuser_est[:,0]
downbeat_pred = beat_pred[beat_fuser_est[:,1]==1]
下面对节拍检测结果和节拍标注值计算多种评估指标:
from madmom.evaluation.beats import BeatEvaluation
scr = BeatEvaluation(beat_pred,beat_true)
print(scr.fmeasure,scr.pscore,scr.cemgil,scr.cmlc,scr.cmlt,
scr.amlc,scr.amlt)
spleeter是一款效果非常不错的音乐音轨分离工具。可以分成两音轨,四音轨或五音轨,它本身是用tensorflow写的,必须要先下载预训练权重,第一次使用会自动下载。我安装时发现它和madmom版本冲突,多次尝试后发现用一个较早的版本1.4.9版才可以。安装时用pip install spleeter==1.4.9即可。
网上给出的用法都是在命令行中调用,我琢磨了一个在python代码中调用的方法:
import librosa
from spleeter.separator import Separator
separator = Separator('spleeter:4stems')
wav,sr = librosa.load(file,sr=44100)
wav = wav.reshape(-1,1)
prediction = separator.separate(wav)
drums = prediction['drums'][:,0]
bass = prediction['bass'][:,0]
other = prediction['other'][:,0]
vocals = prediction['vocals'][:,0]
注意,由于spleeter的预训练权重是在44100采样频率下训练的,所以使用时也必须44100Hz加载音乐。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。