赞
踩
pip install pyAudioAnalysis==0.2.5 (最重要的步骤)
频域信号处理
FFT变换所得的复数的含义:
下标为0的实数表示时域信号的直流部分
下标为i的复数为a+bj表示时域信号中周期为N/i个取样值的正弦波和余弦波的成分,其中a表示余弦波形的成分,b表示正弦波形的成分
复数的模的两倍为对应频率的余弦波的振幅
复数的辐角表示对应频率的余弦波的相位
import numpy as np
from scipy.fftpack import fft, ifft
import matplotlib.pyplot as plt
from matplotlib.pylab import mpl
x = np.arange(0, 2*np.pi, 2*np.pi/128)
y = 0.3*np.cos(x) + 0.5*np.cos(2*x+np.pi/4) + 0.8*np.cos(3*x-np.pi/3) + np.sin(4*x) + np.cos(x)
yf = fft(y)/len(y)
print(np.array_str(yf[:5], suppress_small=True))
for ii in range(0, 5):
print(np.abs(yf[ii]), np.rad2deg(np.angle(yf[ii])))
运行上述程序可以观察得到以上结论
合成时域信号
需要着重解释的是多个余弦信号合成任意时域信号的过程:
FFT转换得到的N个复数组成的数组A,A i A_iA
i
表示第i ii个子信号,其中i = 0 i=0i=0的子信号表示直流信号,且R e ( A i ) Re(A_i)Re(A
i
)表示直流信号的振幅。
2 × R e ( A i ) = A m p l i t u d e ( s i g n a l s i n ) 2 \times Re(A_i) = Amplitude(signal_{sin})
2×Re(A
i
)=Amplitude(signal
sin
)
2 × I m ( A i ) = A m p l i t u d e ( s i g n a l c o s ) 2 \times Im(A_i) = Amplitude(signal_{cos})
2×Im(A
i
)=Amplitude(signal
cos
)
利用前k kk个自信号合成过程用数学表达式表示:
2 × ∑ i = 1 k { R e ( A i ) c o s ( i t ) − I m ( A i ) s i n ( i t ) } + R e ( A 0 ) 2\times \sum_{i=1}^{k}\{Re(A_i)cos(it)-Im(A_i)sin(it)\}+Re(A_0)
2×
i=1
∑
k
{Re(A
i
)cos(it)−Im(A
i
)sin(it)}+Re(A
0
)
代码如下所示
- import numpy as np
- from scipy.fftpack import fft, ifft
- import matplotlib.pyplot as plt
- from matplotlib.pylab import mpl
- mpl.rcParams['font.sans-serif'] = ['SimHei'] #显示中文
- mpl.rcParams['axes.unicode_minus'] = False #显示负号
- # x = np.arange(0, 2*np.pi, 2*np.pi/128)
- # y = 0.3*np.cos(x) + 0.5*np.cos(2*x+np.pi/4) + 0.8*np.cos(3*x-np.pi/3) + np.sin(4*x) + np.cos(x)
- # yf = fft(y)/len(y)
- # print(np.array_str(yf[:5], suppress_small=True))
- # for ii in range(0, 5):
- # print(np.abs(yf[ii]), np.rad2deg(np.angle(yf[ii])))
- def triangle_wave(size):
- x = np.arange(0, 1, 1.0/size)
- y = np.where(x < 0.5, x, 0)
- y = np.where(x >= 0.5, 1-x, y)
- return x, y
- ###
- def fft_comnbine(bins, n, loops):
- length = len(bins)*loops
- data=np.zeros(length)
- index=loops*np.arange(0, 2*np.pi, (2*np.pi)/length)
-
- for k, p in enumerate(bins[:n]):
- if k != 0:
- p *= 2
- ###合成时域信号的过程
- data += np.real(p)*np.cos(k*index)
- data -= np.imag(p)*np.sin(k*index)
- return index, data
- fft_size = 256
- ###对三角波进行FFT
- x, y = triangle_wave(fft_size)
- fy = fft(y)/fft_size
- loops = 4
- y = np.tile(y, (1, loops))
- print(y.shape)
- y.shape = (fft_size*loops, )#画图python的特殊癖好
- ###
- fig, axes = plt.subplots(2, 1, figsize=(8, 6))
- eps = 1e-5
- # axes[0].plot(np.clip(20*np.log10(np.abs(fy[:20])+eps), -120, 120), "o")
- axes[0].plot(np.abs(fy[:20]), "o")
- axes[0].set_xlabel(u"频率窗口(frequency bin)")
- axes[0].set_ylabel(u"幅值(dB)")
- axes[1].plot(y, label=u"原始三角波", linewidth = 2)
- for ii in [0, 1, 3, 5, 7, 9]:
- index, data = fft_comnbine(fy, ii+1, loops)
- axes[1].plot(data, label="N=%s" % ii, alpha=0.6)
- print(index[:20])
- axes[1].legend(loc="best")
- plt.show()
理论部分后面学了信号与系统在深究吧
哈——现在的我已经学完信号与系统了,回复几个一开始学习遇到的问题。
复数的模的两倍为什么对应频率的余弦波的振幅?
ans:若x ( t ) x(t)x(t)为实信号,那么
由傅里叶变换:
f n = F { x ( t ) } f_n = \mathscr{F}\{x(t)\}f
n
=F{x(t)}
得到推导过程中运用了欧拉公式使得余弦波的振幅乘上1/2,而相位不变。
为什么周期为N的离散信号,它的傅里叶变换的周期也是N?
ans:这个就是离散信号的傅里叶变换的周期性。可以在奥本海默的信号与系统的傅里叶性质表看到。
顺便提及奈奎斯特频率:
在采样定理中,采样频率必须大于2 ω m 2 \omega_m2ω_m
,这个2 ω m 2\omega_m2ω
m
j就称作奈奎斯特频率,目的是防止采样信号的频率防止重叠,其中ω m \omega_mω
m
为原始信号的频域ω \omegaω的最大值。
利用pydub和ffmpeg处理音频
写在前面:
RuntimeWarning: Couldn’t find ffmpeg or avconv - defaulting to ffmpeg, but may not work 解决办法——ffmpeg的bin 目录添加到path变量里,注意是path变量而不仅仅是简单的加到系统变量中!!!然后重启。
一、将mp3转换为wav格式,并将歌曲划分为几个部分
说在前面:
1.将歌曲划分为几部分主要是为了将特征的时间顺序体现出来
2. wav:非压缩文件格式。
3.mp3:压缩文件格式。
代码如下:
- tail, track = os.path.split(mp3_path)
- song_name = track.split('.')
- wav_path = os.path.join(tail, 'w_session', song_name[0]+'.wav')
- sound = AudioSegment.from_file(mp3_path, format='mp3')
- sound.export(wav_path, format='wav')
获取wav文件信息:
- w = wave.open(wav_path)
- params = w.getparams()
- print(params)
- #声道数、量化位数(byte)、采样频率、采样点数
- nchannels, sampwidth, framerate, nframes = params[:4]
- t = np.arange(0, nframes)*(1/framerate)#文件时间
- strData = w.readframes(nframes)#读取音频,字符串格式
- waveData = np.fromstring(strData,dtype=np.int16)#将字符串转化为int
- #waveData = waveData*1.0/(max(abs(waveData)))#wave幅值归一化
- waveData = np.reshape(waveData,[nchannels, nframes])#双通道数
划分歌曲:
- for ii in range(nchannels):
- for jj in range(0, 4):
- end_time = start_time + chunk[jj]
- blockData = waveData[0, start_time*framerate:end_time*framerate]
- start_time = end_time
二、音频特征提取
按照处理空间区分
要提取的特征 详情请点击:
时域特征:
线性预测系数、过零率
频域特征:
Mel系数、LPC倒频谱系数、熵特征、光谱质心
时频特征:
小波系数
TOOLS:pyAudioAnalysis
下载以及安装方法:安装方法
个人感觉这个工具包满新的,github上有各种issues。issues详见
同时有一篇论文有对这个工具包有详细的描述:论文
下面摘抄一部分:
Feature Extraction
Audio Features
the audio signal is first divided into short-term windows (frames) and for each frame all 34 features are calculated. This results in a sequence of short-term feature vectors of 34 elements each. Widely accepted short-term window sizes are 20 to 100 ms.
Typical values of the mid-term segment size can be 1 to 10 seconds.
In cases of long recordings (e.g. music tracks) a long-term averaging of the mid-term features can be applied so that the whole signal is represented by an average vector of mid-term statistics.
Extract mid-term features and long-term averages in order to produce one feature vector per audio signal.
三、计算相似矩阵
论文中提到:A similarity matrix is computed based on the cosine distances of the individual feature vectors.
但是在实际操作的过程中发现不同特征的量纲不同,导致用余弦相似度来计算特征相似度不准确。例:
7 8 9
2.62281350727428e-10 0 -50.5964425626941
2.29494356256208e-11 0 -50.5964425626941
4.55467645371887e-11 0 -50.5964425626941
所以我决定计算不同特征的相对比值,然后取平均值。
- def similarity(v1, v2):
- # 计算平均相似度
- temp = []
- sim = []
- p = 0
- q = 1
-
- for ii in range(v1.shape[0]):
- for jj in range(v1.shape[1]):
- if v1[ii, jj]!=0 or v2[ii, jj]!=0 :
- temp.append((1 -
- abs(v1[ii, jj]-v2[ii, jj])/max(abs(v1[ii,
- jj]),abs(v2[ii, jj]))))
- q += 1
- sim.append(np.mean(temp[p:q]))
- p = q
- print(sim)
- return sim
此外可以尝试马氏距离——参考文章:
下文为部分摘抄。
使用场景:
1、度量两个服从同一分布并且其协方差矩阵为C的随机变量X与Y的差异程度
2、度量X与某一类的均值向量的差异程度,判别样本的归属。此时,Y为类均值向量.
马氏距离的优缺点:
优点:量纲无关,排除变量之间的相关性的干扰
缺点:不同的特征不能差别对待,可能夸大弱特征
四、减少运行代码的时间
之前将歌曲划分为等差序列的长度demo,可计算一个片段的特征就要好久。我等不下去,所以决定想法子降低复杂度。我想到两个办法:
在原来等差序列的片段上随机选取四秒片段,计算特征相似度。如果大于0.5,那么在计算完整片段的特征相似度。
将原来采样频率44.1kHz缩小四倍
- k = 4
- ii = 0
- w_decrease = [[], []]
- # 降低音频分辨率
- for kk in (0, 1):
- while ii < len(w[:, kk]):
- if ii + k < len(w[:, kk]):
- w_decrease[kk].append(np.mean(w[ii:ii+k, kk]))
- else:
- w_decrease[kk].append(np.mean(
- w[ii:len(w[:, kk])+1, kk]))
- ii = ii + k
- w = w_decrease
五、完整代码
- # -*- coding: utf-8 -*-
- """
- Created on Fri Jan 10 21:51:38 2020
- @author: yoona
- """
- import os
- import sys
- import wave
- import numpy as np
- #import struct
- from pydub import AudioSegment
- import matplotlib.pyplot as plt
- from pyAudioAnalysis import audioFeatureExtraction as afe
- import eyed3
- import random
- import math
-
- def Features(path, mode):
- x = wave.open(path)
- params = x.getparams()
- print(params)
-
- if params[0] != 2:
- raise ValueError('通道数不等于2')
-
- strData = x.readframes(params[3])
- w = np.frombuffer(strData, dtype=np.int16)
- w = np.reshape(w,[params[3], params[0]])
-
- k = 4
- ii = 0
- w_decrease = [[], []]
-
- if mode == 'second':
- # 降低音频分辨率
- for kk in (0, 1):
- while ii < len(w[:, kk]):
- if ii + k < len(w[:, kk]):
- w_decrease[kk].append(np.mean(w[ii:ii+k, kk]))
- else:
- w_decrease[kk].append(np.mean(
- w[ii:len(w[:, kk])+1, kk]))
- ii = ii + k
- w = w_decrease
-
- eigen_vector_0 = afe.mtFeatureExtraction(
- w[:, 0], params[2],30.0, 30.0, 2, 2)
- eigen_vector_1 = afe.mtFeatureExtraction(
- w[:, 1], params[2],30.0, 30.0, 2, 2)
-
- return eigen_vector_0, eigen_vector_1
-
- def read_wave(wav_path):
- w = wave.open(wav_path)
- params = w.getparams()
- # print(params)
- # 声道数、量化位数(byte)、采样频率、采样点数
- nchannels, sampwidth, framerate, nframes = params[:4]
-
- # 文件时间
- t = np.arange(0, nframes)*(1/framerate)
- strData = w.readframes(nframes)#读取音频,字符串格式
- waveData = np.frombuffer(strData, dtype=np.int16)#将字符串转化为int
- waveData = waveData*1.0/(max(abs(waveData)))#wave幅值归一化
- waveData = np.reshape(waveData,[nframes, nchannels])#双通道数
-
- # plot the wave
- plt.figure()
- plt.subplot(4,1,1)
- plt.plot(t,waveData[:, 0])
- plt.xlabel("Time(s)")
- plt.ylabel("Amplitude")
- plt.title("Ch-1 wavedata")
- plt.grid('on')#标尺,on:有,off:无
- plt.subplot(4,1,3)
- plt.plot(t,waveData[:, 1])
- plt.xlabel("Time(s)")
- plt.ylabel("Amplitude")
- plt.title("Ch-2 wavedata")
- plt.grid('on')#标尺,on:有,off:无
- plt.show()
-
- def similarity(v1, v2):
- # 计算平均相似度
- temp = []
- sim = []
- p = 0
- q = 1
-
- for ii in range(v1.shape[0]):
- for jj in range(v1.shape[1]):
- if v1[ii, jj]!=0 or v2[ii, jj]!=0 :
- temp.append((1 -
- abs(v1[ii, jj]-v2[ii, jj])/max(abs(v1[ii, jj]),abs(v2[ii, jj]))))
- q += 1
- sim.append(np.mean(temp[p:q]))
- p = q
- print(sim)
- return sim
-
- def compute_chunk_features(mp3_path):
- # =============================================================================
- # 计算相似度第一步
- # =============================================================================
- # 获取歌曲时长
- mp3Info = eyed3.load(mp3_path)
- time = int(mp3Info.info.time_secs)
- print(time)
- tail, track = os.path.split(mp3_path)
-
- # 创建两个文件夹
- dirct_1 = tail + r'\wavSession'
- dirct_2 = tail + r'\wavBlock'
- if not os.path.exists(dirct_1):
- os.makedirs(dirct_1)
- if not os.path.exists(dirct_2):
- os.makedirs(dirct_2)
-
- # 获取歌曲名字
- song_name = track.split('.')
- # 转换格式
- wav_all_path = os.path.join(tail, song_name[0]+'.wav')
- sound = AudioSegment.from_file(mp3_path, format='mp3')
- sound.export(wav_all_path, format='wav')
- read_wave(wav_all_path)
- # 划分音频
- gap = 4
- diff = time/10 - 8
- start_time = 0
- end_time = math.floor(diff)
- vector_0 = np.zeros((10, 68))
- vector_1 = np.zeros((10, 68))
- info = []#记录片段开始时间点
-
- for jj in range(5):
- wav_name = song_name[0]+str(jj)+'.wav'
- wav_path = os.path.join(tail, 'wavSession', wav_name)
- # 随机产生四秒片段
- rand_start = random.randint(start_time, end_time)
- blockData = sound[rand_start*1000:(rand_start+gap)*1000]
- ## 音频切片,时间的单位是毫秒
- # blockData = sound[start_time*1000:end_time*1000]
- blockData.export(wav_path, format='wav')
- eigVector_0, eigVector_1 = Features(wav_path, [])
- print(jj)# 标记程序运行进程
- # 得到一个片段的特征向量
- vector_0[jj, :] = np.mean(eigVector_0[0], 1)
- vector_1[jj, :] = np.mean(eigVector_1[0], 1)
- # 迭代
- diff = diff + 4
- info.append((start_time, end_time))
- start_time = end_time
- end_time = math.floor(start_time + diff)
-
- # 承上启下
- end_time = start_time
-
- for kk in range(5, 10):
- # 迭代
- diff = diff - 4
- info.append((start_time, start_time + diff))
- start_time = end_time
- end_time = math.floor(start_time + diff)
- wav_name = song_name[0]+str(kk)+'.wav'
- wav_path = os.path.join(tail, 'wavSession', wav_name)
-
- # 随机产生四秒片段
- rand_start = random.randint(start_time, end_time)
- blockData = sound[rand_start*1000:(rand_start+gap)*1000]
- # blockData = sound[start_time*1000:end_time*1000]
- blockData.export(wav_path, format='wav')
- eigVector_0, eigVector_1 = Features(wav_path, [])
- print(kk)#标记程序运行进程
- # 得到一个片段的特征向量
-
- vector_0[kk, :] = np.mean(eigVector_0[0], 1)
- vector_1[kk, :] = np.mean(eigVector_1[0], 1)
-
- return vector_0, vector_1, info# 双通道各自的特征向量
-
- def Compute_Bolck_Features(info, mp3_path):
- # =============================================================================
- # 计算相似度第二步
- # =============================================================================
- # 获取歌曲时长
- mp3Info = eyed3.load(mp3_path)
- time = int(mp3Info.info.time_secs)
- print(time)
- # 获取歌曲名字
- tail, track = os.path.split(mp3_path)
- song_name = track.split('.')
- # 转换格式
- sound = AudioSegment.from_file(mp3_path, format='mp3')
- vector_0 = np.zeros((len(info), 68))
- vector_1 = np.zeros((len(info), 68))
-
- for kk in range(len(info)):
- # 获取歌曲完整片段的特征
- wav_name = song_name[0]+str(kk)+'.wav'
- wav_path = os.path.join(tail, 'wavBlock', wav_name)
- # 截取完整片段
- blockData = sound[info[kk][0]*1000:info[kk][1]*1000]
- blockData.export(wav_path, format='wav')
- eigVector_0, eigVector_1 = Features(wav_path, 'second')
- print(kk)#标记程序运行进程
- # 得到一个片段的特征向量
- vector_0[kk, :] = np.mean(eigVector_0[0], 1)
- vector_1[kk, :] = np.mean(eigVector_1[0], 1)
- return vector_0, vector_1
-
- def file_exists(file_path):
- if os.path.splitext(file_path) == '.mp3':
- if os.path.isfile(file_path):
- return file_path
- else:
- raise TypeError('文件不存在')
- else:
- raise TypeError('文件格式错误,后缀不为.mp3')
-
- if __name__ == '__main__':
-
- #for path, dirs, files in os.walk('C:/Users/yoona/Desktop/music_test/'):
- # for f in files:
- # if not f.endwith('.mp3'):
- # continue
- # 把路径组装到一起
- #path = r'C:\Users\yoona\Desktop\musictest'
- #f = 'CARTA - Aranya (Jungle Festival Anthem).mp3'
- #mp3_path = os.path.join(path, f)
- # =============================================================================
- # sa_b:a表示歌曲的序号,b表示歌曲的通道序号
- # =============================================================================
- #s1_0, s1_1, info1= compute_chunk_features(mp3_path)
- # path_1 = file_exists(sys.argv[1])
- # path_2 = file_exists(sys.argv[2])]
-
- path_1 = r'C:\Users\yoona\Desktop\music\薛之谦 - 别.mp3'
- path_2 = r'C:\Users\yoona\Desktop\music\薛之谦 - 最好.mp3'
-
- s1_1, s1_2, info1 = compute_chunk_features(path_1)
- s2_1, s2_2, info2 = compute_chunk_features(path_2)
-
- sim_1 = similarity(s1_1, s2_1)#通道数1
- sim_2 = similarity(s1_2, s2_2)#通道数2
-
- info1_new = []
- info2_new = []
-
- for i, element in enumerate(sim_1):
- if element >= 0.5:
- info1_new.append(info1[i])
-
- if not info1_new:
- pos = np.argmax(sim_1)
- info1_new.append(info1[pos])
- s1_1, s1_2 = Compute_Bolck_Features(info1_new, path_1)
-
- for i, element in enumerate(sim_2):
- if element >= 0.5:
- info2_new.append(info2[i])
-
- if not info2_new:
- pos = np.argmax(sim_2)
- info2_new.append(info2[pos])
- s2_1, s2_2 = Compute_Bolck_Features(info2_new, path_2)
-
- sim_1 = similarity(s1_1, s2_1)#通道数1
- sim_2 = similarity(s1_2, s2_2)#通道数2
六、结果分析
第一组实验对象:
A:薛之谦 - 最好.mp3
B:薛之谦 - 别.mp3
第二组实验对象:
A:Karim Mika - Superficial Love.mp3
B: Burgess/JESSIA - Eclipse.mp3
第三组实验对象:
A: CARTA - Aranya (Jungle Festival Anthem).mp3
B: 薛之谦 - 别.mp3
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。