当前位置:   article > 正文

python简易实现歌曲相似度比较(未完稿)_ffmpeg 比较两个音频相似度

ffmpeg 比较两个音频相似度

pip install pyAudioAnalysis==0.2.5 (最重要的步骤)

频域信号处理
FFT变换所得的复数的含义:
下标为0的实数表示时域信号的直流部分
下标为i的复数为a+bj表示时域信号中周期为N/i个取样值的正弦波和余弦波的成分,其中a表示余弦波形的成分,b表示正弦波形的成分
复数的模的两倍为对应频率的余弦波的振幅
复数的辐角表示对应频率的余弦波的相位
import numpy as np
from scipy.fftpack import fft, ifft
import matplotlib.pyplot as plt
from matplotlib.pylab import mpl
x = np.arange(0, 2*np.pi, 2*np.pi/128)
y = 0.3*np.cos(x) + 0.5*np.cos(2*x+np.pi/4) + 0.8*np.cos(3*x-np.pi/3) + np.sin(4*x) + np.cos(x)
yf = fft(y)/len(y)
print(np.array_str(yf[:5], suppress_small=True))
for ii in range(0, 5):
    print(np.abs(yf[ii]), np.rad2deg(np.angle(yf[ii])))

运行上述程序可以观察得到以上结论

合成时域信号
需要着重解释的是多个余弦信号合成任意时域信号的过程:

FFT转换得到的N个复数组成的数组A,A i A_iA 
i

 表示第i ii个子信号,其中i = 0 i=0i=0的子信号表示直流信号,且R e ( A i ) Re(A_i)Re(A 
i

 )表示直流信号的振幅。
2 × R e ( A i ) = A m p l i t u d e ( s i g n a l s i n ) 2 \times Re(A_i) = Amplitude(signal_{sin})
2×Re(A 
i

 )=Amplitude(signal 
sin

 )

2 × I m ( A i ) = A m p l i t u d e ( s i g n a l c o s ) 2 \times Im(A_i) = Amplitude(signal_{cos})
2×Im(A 
i

 )=Amplitude(signal 
cos

 )

利用前k kk个自信号合成过程用数学表达式表示:
2 × ∑ i = 1 k { R e ( A i ) c o s ( i t ) − I m ( A i ) s i n ( i t ) } + R e ( A 0 ) 2\times \sum_{i=1}^{k}\{Re(A_i)cos(it)-Im(A_i)sin(it)\}+Re(A_0)
2× 
i=1

k

 {Re(A 
i

 )cos(it)−Im(A 
i

 )sin(it)}+Re(A 
0

 )

代码如下所示

  1. import numpy as np
  2. from scipy.fftpack import fft, ifft
  3. import matplotlib.pyplot as plt
  4. from matplotlib.pylab import mpl
  5. mpl.rcParams['font.sans-serif'] = ['SimHei']  #显示中文
  6. mpl.rcParams['axes.unicode_minus'] = False  #显示负号
  7. # x = np.arange(0, 2*np.pi, 2*np.pi/128)
  8. # y = 0.3*np.cos(x) + 0.5*np.cos(2*x+np.pi/4) + 0.8*np.cos(3*x-np.pi/3) + np.sin(4*x) + np.cos(x)
  9. # yf = fft(y)/len(y)
  10. # print(np.array_str(yf[:5], suppress_small=True))
  11. # for ii in range(0, 5):
  12. #     print(np.abs(yf[ii]), np.rad2deg(np.angle(yf[ii])))
  13. def triangle_wave(size):
  14.     x = np.arange(0, 1, 1.0/size)
  15.     y = np.where(x < 0.5, x, 0)
  16.     y = np.where(x >= 0.5, 1-x, y)
  17.     return x, y
  18. ###
  19. def fft_comnbine(bins, n, loops):
  20.     length = len(bins)*loops
  21.     data=np.zeros(length)
  22.     index=loops*np.arange(0, 2*np.pi, (2*np.pi)/length)
  23.     for k, p in enumerate(bins[:n]):
  24.         if k != 0:
  25.             p *= 2
  26.         ###合成时域信号的过程
  27.         data += np.real(p)*np.cos(k*index)
  28.         data -= np.imag(p)*np.sin(k*index)
  29.     return index, data
  30. fft_size = 256
  31. ###对三角波进行FFT
  32. x, y = triangle_wave(fft_size)
  33. fy = fft(y)/fft_size
  34. loops = 4
  35. y = np.tile(y, (1, loops))
  36. print(y.shape)
  37. y.shape = (fft_size*loops, )#画图python的特殊癖好
  38. ###
  39. fig, axes = plt.subplots(2, 1, figsize=(8, 6))
  40. eps = 1e-5
  41. # axes[0].plot(np.clip(20*np.log10(np.abs(fy[:20])+eps), -120, 120), "o")
  42. axes[0].plot(np.abs(fy[:20]), "o")
  43. axes[0].set_xlabel(u"频率窗口(frequency bin)")
  44. axes[0].set_ylabel(u"幅值(dB)")
  45. axes[1].plot(y, label=u"原始三角波", linewidth = 2)
  46. for ii in [0, 1, 3, 5, 7, 9]:
  47.     index, data = fft_comnbine(fy, ii+1, loops)
  48.     axes[1].plot(data, label="N=%s" % ii, alpha=0.6)
  49. print(index[:20])
  50. axes[1].legend(loc="best")
  51. plt.show()


理论部分后面学了信号与系统在深究吧
哈——现在的我已经学完信号与系统了,回复几个一开始学习遇到的问题。

复数的模的两倍为什么对应频率的余弦波的振幅?
ans:若x ( t ) x(t)x(t)为实信号,那么
由傅里叶变换:
f n = F { x ( t ) } f_n = \mathscr{F}\{x(t)\}f 
n

 =F{x(t)}
得到推导过程中运用了欧拉公式使得余弦波的振幅乘上1/2,而相位不变。

为什么周期为N的离散信号,它的傅里叶变换的周期也是N?
ans:这个就是离散信号的傅里叶变换的周期性。可以在奥本海默的信号与系统的傅里叶性质表看到。

顺便提及奈奎斯特频率:
在采样定理中,采样频率必须大于2 ω m 2 \omega_m2ω_m

 ,这个2 ω m 2\omega_m2ω 
m

 j就称作奈奎斯特频率,目的是防止采样信号的频率防止重叠,其中ω m \omega_mω 
m

 为原始信号的频域ω \omegaω的最大值。

利用pydub和ffmpeg处理音频
写在前面:
RuntimeWarning: Couldn’t find ffmpeg or avconv - defaulting to ffmpeg, but may not work 解决办法——ffmpeg的bin 目录添加到path变量里,注意是path变量而不仅仅是简单的加到系统变量中!!!然后重启。

一、将mp3转换为wav格式,并将歌曲划分为几个部分
说在前面:
1.将歌曲划分为几部分主要是为了将特征的时间顺序体现出来
2. wav:非压缩文件格式。
3.mp3:压缩文件格式。
代码如下:

  1. tail, track = os.path.split(mp3_path)
  2. song_name = track.split('.')
  3. wav_path = os.path.join(tail, 'w_session', song_name[0]+'.wav')
  4. sound = AudioSegment.from_file(mp3_path, format='mp3')
  5. sound.export(wav_path, format='wav')


获取wav文件信息:

  1. w = wave.open(wav_path)
  2. params = w.getparams()
  3. print(params)
  4. #声道数、量化位数(byte)、采样频率、采样点数 
  5. nchannels, sampwidth, framerate, nframes = params[:4]
  6. t = np.arange(0, nframes)*(1/framerate)#文件时间
  7. strData = w.readframes(nframes)#读取音频,字符串格式
  8. waveData = np.fromstring(strData,dtype=np.int16)#将字符串转化为int
  9. #waveData = waveData*1.0/(max(abs(waveData)))#wave幅值归一化
  10. waveData = np.reshape(waveData,[nchannels, nframes])#双通道数



划分歌曲:

  1. for ii in range(nchannels):
  2.     for jj in range(0, 4):
  3.         end_time = start_time + chunk[jj]
  4.         blockData = waveData[0, start_time*framerate:end_time*framerate]
  5.         start_time = end_time


二、音频特征提取
按照处理空间区分

要提取的特征 详情请点击:
时域特征:
线性预测系数、过零率
频域特征:
Mel系数、LPC倒频谱系数、熵特征、光谱质心
时频特征:
小波系数
TOOLS:pyAudioAnalysis
下载以及安装方法:安装方法
个人感觉这个工具包满新的,github上有各种issues。issues详见
同时有一篇论文有对这个工具包有详细的描述:论文

下面摘抄一部分:
Feature Extraction
Audio Features

the audio signal is first divided into short-term windows (frames) and for each frame all 34 features are calculated. This results in a sequence of short-term feature vectors of 34 elements each. Widely accepted short-term window sizes are 20 to 100 ms.
Typical values of the mid-term segment size can be 1 to 10 seconds.
In cases of long recordings (e.g. music tracks) a long-term averaging of the mid-term features can be applied so that the whole signal is represented by an average vector of mid-term statistics.
Extract mid-term features and long-term averages in order to produce one feature vector per audio signal.
三、计算相似矩阵
论文中提到:A similarity matrix is computed based on the cosine distances of the individual feature vectors.
但是在实际操作的过程中发现不同特征的量纲不同,导致用余弦相似度来计算特征相似度不准确。例:

7    8    9
2.62281350727428e-10    0    -50.5964425626941
2.29494356256208e-11    0    -50.5964425626941
4.55467645371887e-11    0    -50.5964425626941
所以我决定计算不同特征的相对比值,然后取平均值。

  1. def similarity(v1, v2):
  2. #   计算平均相似度
  3.     temp = []
  4.     sim = []
  5.     p = 0
  6.     q = 1
  7.     
  8.     for ii in range(v1.shape[0]):
  9.         for jj in range(v1.shape[1]):
  10.             if v1[ii, jj]!=0 or v2[ii, jj]!=0 :
  11.                 temp.append((1
  12.         abs(v1[ii, jj]-v2[ii, jj])/max(abs(v1[ii,
  13.         jj]),abs(v2[ii, jj]))))
  14.                 q += 1
  15.         sim.append(np.mean(temp[p:q]))
  16.         p = q
  17.     print(sim)
  18.     return sim


此外可以尝试马氏距离——参考文章:
下文为部分摘抄。

使用场景:

1、度量两个服从同一分布并且其协方差矩阵为C的随机变量X与Y的差异程度
2、度量X与某一类的均值向量的差异程度,判别样本的归属。此时,Y为类均值向量.

马氏距离的优缺点:
优点:量纲无关,排除变量之间的相关性的干扰
缺点:不同的特征不能差别对待,可能夸大弱特征

四、减少运行代码的时间
之前将歌曲划分为等差序列的长度demo,可计算一个片段的特征就要好久。我等不下去,所以决定想法子降低复杂度。我想到两个办法:

在原来等差序列的片段上随机选取四秒片段,计算特征相似度。如果大于0.5,那么在计算完整片段的特征相似度。
将原来采样频率44.1kHz缩小四倍

  1. k = 4
  2. ii = 0
  3. w_decrease = [[], []]
  4. #   降低音频分辨率
  5. for kk in (0, 1):
  6.     while ii < len(w[:, kk]):
  7.         if ii + k < len(w[:, kk]):
  8.             w_decrease[kk].append(np.mean(w[ii:ii+k, kk]))
  9.         else:
  10.             w_decrease[kk].append(np.mean(
  11.                     w[ii:len(w[:, kk])+1, kk]))
  12.         ii = ii + k
  13. w = w_decrease



五、完整代码

  1. # -*- coding: utf-8 -*-
  2. """
  3. Created on Fri Jan 10 21:51:38 2020
  4. @author: yoona
  5. """
  6. import os
  7. import sys
  8. import wave
  9. import numpy as np
  10. #import struct
  11. from pydub import AudioSegment
  12. import matplotlib.pyplot as plt
  13. from pyAudioAnalysis import audioFeatureExtraction as afe
  14. import eyed3
  15. import random
  16. import math
  17. def Features(path, mode):
  18. x = wave.open(path)
  19. params = x.getparams()
  20. print(params)
  21. if params[0] != 2:
  22. raise ValueError('通道数不等于2')
  23. strData = x.readframes(params[3])
  24. w = np.frombuffer(strData, dtype=np.int16)
  25. w = np.reshape(w,[params[3], params[0]])
  26. k = 4
  27. ii = 0
  28. w_decrease = [[], []]
  29. if mode == 'second':
  30. # 降低音频分辨率
  31. for kk in (0, 1):
  32. while ii < len(w[:, kk]):
  33. if ii + k < len(w[:, kk]):
  34. w_decrease[kk].append(np.mean(w[ii:ii+k, kk]))
  35. else:
  36. w_decrease[kk].append(np.mean(
  37. w[ii:len(w[:, kk])+1, kk]))
  38. ii = ii + k
  39. w = w_decrease
  40. eigen_vector_0 = afe.mtFeatureExtraction(
  41. w[:, 0], params[2],30.0, 30.0, 2, 2)
  42. eigen_vector_1 = afe.mtFeatureExtraction(
  43. w[:, 1], params[2],30.0, 30.0, 2, 2)
  44. return eigen_vector_0, eigen_vector_1
  45. def read_wave(wav_path):
  46. w = wave.open(wav_path)
  47. params = w.getparams()
  48. # print(params)
  49. # 声道数、量化位数(byte)、采样频率、采样点数
  50. nchannels, sampwidth, framerate, nframes = params[:4]
  51. # 文件时间
  52. t = np.arange(0, nframes)*(1/framerate)
  53. strData = w.readframes(nframes)#读取音频,字符串格式
  54. waveData = np.frombuffer(strData, dtype=np.int16)#将字符串转化为int
  55. waveData = waveData*1.0/(max(abs(waveData)))#wave幅值归一化
  56. waveData = np.reshape(waveData,[nframes, nchannels])#双通道数
  57. # plot the wave
  58. plt.figure()
  59. plt.subplot(4,1,1)
  60. plt.plot(t,waveData[:, 0])
  61. plt.xlabel("Time(s)")
  62. plt.ylabel("Amplitude")
  63. plt.title("Ch-1 wavedata")
  64. plt.grid('on')#标尺,on:有,off:无
  65. plt.subplot(4,1,3)
  66. plt.plot(t,waveData[:, 1])
  67. plt.xlabel("Time(s)")
  68. plt.ylabel("Amplitude")
  69. plt.title("Ch-2 wavedata")
  70. plt.grid('on')#标尺,on:有,off:无
  71. plt.show()
  72. def similarity(v1, v2):
  73. # 计算平均相似度
  74. temp = []
  75. sim = []
  76. p = 0
  77. q = 1
  78. for ii in range(v1.shape[0]):
  79. for jj in range(v1.shape[1]):
  80. if v1[ii, jj]!=0 or v2[ii, jj]!=0 :
  81. temp.append((1 -
  82. abs(v1[ii, jj]-v2[ii, jj])/max(abs(v1[ii, jj]),abs(v2[ii, jj]))))
  83. q += 1
  84. sim.append(np.mean(temp[p:q]))
  85. p = q
  86. print(sim)
  87. return sim
  88. def compute_chunk_features(mp3_path):
  89. # =============================================================================
  90. # 计算相似度第一步
  91. # =============================================================================
  92. # 获取歌曲时长
  93. mp3Info = eyed3.load(mp3_path)
  94. time = int(mp3Info.info.time_secs)
  95. print(time)
  96. tail, track = os.path.split(mp3_path)
  97. # 创建两个文件夹
  98. dirct_1 = tail + r'\wavSession'
  99. dirct_2 = tail + r'\wavBlock'
  100. if not os.path.exists(dirct_1):
  101. os.makedirs(dirct_1)
  102. if not os.path.exists(dirct_2):
  103. os.makedirs(dirct_2)
  104. # 获取歌曲名字
  105. song_name = track.split('.')
  106. # 转换格式
  107. wav_all_path = os.path.join(tail, song_name[0]+'.wav')
  108. sound = AudioSegment.from_file(mp3_path, format='mp3')
  109. sound.export(wav_all_path, format='wav')
  110. read_wave(wav_all_path)
  111. # 划分音频
  112. gap = 4
  113. diff = time/10 - 8
  114. start_time = 0
  115. end_time = math.floor(diff)
  116. vector_0 = np.zeros((10, 68))
  117. vector_1 = np.zeros((10, 68))
  118. info = []#记录片段开始时间点
  119. for jj in range(5):
  120. wav_name = song_name[0]+str(jj)+'.wav'
  121. wav_path = os.path.join(tail, 'wavSession', wav_name)
  122. # 随机产生四秒片段
  123. rand_start = random.randint(start_time, end_time)
  124. blockData = sound[rand_start*1000:(rand_start+gap)*1000]
  125. ## 音频切片,时间的单位是毫秒
  126. # blockData = sound[start_time*1000:end_time*1000]
  127. blockData.export(wav_path, format='wav')
  128. eigVector_0, eigVector_1 = Features(wav_path, [])
  129. print(jj)# 标记程序运行进程
  130. # 得到一个片段的特征向量
  131. vector_0[jj, :] = np.mean(eigVector_0[0], 1)
  132. vector_1[jj, :] = np.mean(eigVector_1[0], 1)
  133. # 迭代
  134. diff = diff + 4
  135. info.append((start_time, end_time))
  136. start_time = end_time
  137. end_time = math.floor(start_time + diff)
  138. # 承上启下
  139. end_time = start_time
  140. for kk in range(5, 10):
  141. # 迭代
  142. diff = diff - 4
  143. info.append((start_time, start_time + diff))
  144. start_time = end_time
  145. end_time = math.floor(start_time + diff)
  146. wav_name = song_name[0]+str(kk)+'.wav'
  147. wav_path = os.path.join(tail, 'wavSession', wav_name)
  148. # 随机产生四秒片段
  149. rand_start = random.randint(start_time, end_time)
  150. blockData = sound[rand_start*1000:(rand_start+gap)*1000]
  151. # blockData = sound[start_time*1000:end_time*1000]
  152. blockData.export(wav_path, format='wav')
  153. eigVector_0, eigVector_1 = Features(wav_path, [])
  154. print(kk)#标记程序运行进程
  155. # 得到一个片段的特征向量
  156. vector_0[kk, :] = np.mean(eigVector_0[0], 1)
  157. vector_1[kk, :] = np.mean(eigVector_1[0], 1)
  158. return vector_0, vector_1, info# 双通道各自的特征向量
  159. def Compute_Bolck_Features(info, mp3_path):
  160. # =============================================================================
  161. # 计算相似度第二步
  162. # =============================================================================
  163. # 获取歌曲时长
  164. mp3Info = eyed3.load(mp3_path)
  165. time = int(mp3Info.info.time_secs)
  166. print(time)
  167. # 获取歌曲名字
  168. tail, track = os.path.split(mp3_path)
  169. song_name = track.split('.')
  170. # 转换格式
  171. sound = AudioSegment.from_file(mp3_path, format='mp3')
  172. vector_0 = np.zeros((len(info), 68))
  173. vector_1 = np.zeros((len(info), 68))
  174. for kk in range(len(info)):
  175. # 获取歌曲完整片段的特征
  176. wav_name = song_name[0]+str(kk)+'.wav'
  177. wav_path = os.path.join(tail, 'wavBlock', wav_name)
  178. # 截取完整片段
  179. blockData = sound[info[kk][0]*1000:info[kk][1]*1000]
  180. blockData.export(wav_path, format='wav')
  181. eigVector_0, eigVector_1 = Features(wav_path, 'second')
  182. print(kk)#标记程序运行进程
  183. # 得到一个片段的特征向量
  184. vector_0[kk, :] = np.mean(eigVector_0[0], 1)
  185. vector_1[kk, :] = np.mean(eigVector_1[0], 1)
  186. return vector_0, vector_1
  187. def file_exists(file_path):
  188. if os.path.splitext(file_path) == '.mp3':
  189. if os.path.isfile(file_path):
  190. return file_path
  191. else:
  192. raise TypeError('文件不存在')
  193. else:
  194. raise TypeError('文件格式错误,后缀不为.mp3')
  195. if __name__ == '__main__':
  196. #for path, dirs, files in os.walk('C:/Users/yoona/Desktop/music_test/'):
  197. # for f in files:
  198. # if not f.endwith('.mp3'):
  199. # continue
  200. # 把路径组装到一起
  201. #path = r'C:\Users\yoona\Desktop\musictest'
  202. #f = 'CARTA - Aranya (Jungle Festival Anthem).mp3'
  203. #mp3_path = os.path.join(path, f)
  204. # =============================================================================
  205. # sa_b:a表示歌曲的序号,b表示歌曲的通道序号
  206. # =============================================================================
  207. #s1_0, s1_1, info1= compute_chunk_features(mp3_path)
  208. # path_1 = file_exists(sys.argv[1])
  209. # path_2 = file_exists(sys.argv[2])]
  210. path_1 = r'C:\Users\yoona\Desktop\music\薛之谦 - 别.mp3'
  211. path_2 = r'C:\Users\yoona\Desktop\music\薛之谦 - 最好.mp3'
  212. s1_1, s1_2, info1 = compute_chunk_features(path_1)
  213. s2_1, s2_2, info2 = compute_chunk_features(path_2)
  214. sim_1 = similarity(s1_1, s2_1)#通道数1
  215. sim_2 = similarity(s1_2, s2_2)#通道数2
  216. info1_new = []
  217. info2_new = []
  218. for i, element in enumerate(sim_1):
  219. if element >= 0.5:
  220. info1_new.append(info1[i])
  221. if not info1_new:
  222. pos = np.argmax(sim_1)
  223. info1_new.append(info1[pos])
  224. s1_1, s1_2 = Compute_Bolck_Features(info1_new, path_1)
  225. for i, element in enumerate(sim_2):
  226. if element >= 0.5:
  227. info2_new.append(info2[i])
  228. if not info2_new:
  229. pos = np.argmax(sim_2)
  230. info2_new.append(info2[pos])
  231. s2_1, s2_2 = Compute_Bolck_Features(info2_new, path_2)
  232. sim_1 = similarity(s1_1, s2_1)#通道数1
  233. sim_2 = similarity(s1_2, s2_2)#通道数2


六、结果分析
第一组实验对象:
A:薛之谦 - 最好.mp3
B:薛之谦 - 别.mp3
第二组实验对象:
A:Karim Mika - Superficial Love.mp3
B: Burgess/JESSIA - Eclipse.mp3
第三组实验对象:
A: CARTA - Aranya (Jungle Festival Anthem).mp3
B: 薛之谦 - 别.mp3
 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/盐析白兔/article/detail/112155?site
推荐阅读
相关标签
  

闽ICP备14008679号