当前位置:   article > 正文

基于python的音频信号处理_python音频信号处理

python音频信号处理

  1. 生成文件列表

采用递归方式读取指定目录下的文件列表

  1. import os
  2. def get_filelist(path, list):
  3.     list_dir = os.listdir(path)
  4.     for i in list_dir:
  5.         sub_dir = os.path.join(path, i)
  6.         if os.path.isdir(sub_dir):
  7.             get_filelist(sub_dir, list)
  8.         else:
  9.             list.append(sub_dir)
  1. 读取wav文件

  • 单通道 (matlab采用audioread实现)

  • 读取音频的方式很多,主要要利用好数据量转换函数np.fromstring或np.frombuffer

  1. import wave
  2. import matplotlib.pyplot as plt
  3. import numpy as np
  4. import os
  5. filepath = "./data/" #添加路径
  6. filelist= os.listdir(filepath) #得到文件夹下的所有文件名称
  7. f = wave.open(filepath+filelist[1],'rb')
  8. params = f.getparams()
  9. nchannels, sampwidth, framerate, nframes = params[:4]
  10. strData = f.readframes(nframes) #读取音频,字符串格式
  11. waveData = np.fromstring(strData,dtype=np.int16) #将字符串转化为int
  12. waveData = waveData*1.0/(max(abs(waveData))) #wave幅值归一化
  13. # plot the wave
  14. time = np.arange(0,nframes)*(1.0 / framerate)
  15. plt.plot(time,waveData)
  16. plt.xlabel("Time(s)")
  17. plt.ylabel("Amplitude")
  18. plt.title("Single channel wavedata")
  19. plt.grid('on')#标尺,on:有,off:无。
  20. ##另一种语音读取方式
  21. f = open(filepath+filelist[1],'rb')
  22. bufferData = f.read()
  23. waveData = np.frombuffer(bufferData, dtype=np.int16)

结果图:

  • 多通道

这里通道数为3,主要借助np.reshape一下,其他同单通道处理完全一致

  1. import wave
  2. import matplotlib.pyplot as plt
  3. import numpy as np
  4. import os
  5. filepath = "./data/" #添加路径
  6. filelist= os.listdir(filepath) #得到文件夹下的所有文件名称
  7. f = wave.open(filepath+filelist[0],'rb')
  8. params = f.getparams()
  9. nchannels, sampwidth, framerate, nframes = params[:4]
  10. strData = f.readframes(nframes) #读取音频,字符串格式
  11. waveData = np.fromstring(strData,dtype=np.int16)#将字符串转化为int
  12. waveData = waveData*1.0/(max(abs(waveData))) #wave幅值归一化
  13. waveData = np.reshape(waveData,[nframes,nchannels])
  14. f.close()
  15. # plot the wave
  16. time = np.arange(0,nframes)*(1.0 / framerate)
  17. plt.figure()
  18. plt.subplot(5,1,1)
  19. plt.plot(time,waveData[:,0])
  20. plt.xlabel("Time(s)")
  21. plt.ylabel("Amplitude")
  22. plt.title("Ch-1 wavedata")
  23. plt.grid('on')#标尺,on:有,off:无。
  24. plt.subplot(5,1,3)
  25. plt.plot(time,waveData[:,1])
  26. plt.xlabel("Time(s)")
  27. plt.ylabel("Amplitude")
  28. plt.title("Ch-2 wavedata")
  29. plt.grid('on')#标尺,on:有,off:无。
  30. plt.subplot(5,1,5)
  31. plt.plot(time,waveData[:,2])
  32. plt.xlabel("Time(s)")
  33. plt.ylabel("Amplitude")
  34. plt.title("Ch-3 wavedata")
  35. plt.grid('on')#标尺,on:有,off:无。
  36. plt.show()

  效果图:

单通道为多通道的特例,所以多通道的读取方式对任意通道wav文件都适用。需要注意的是,waveData在reshape之后,与之前的数据结构是不同的。即waveData[0]等价于reshape之前的waveData,但不影响绘图分析,只是在分析频谱时才有必要考虑这一点。

  1. 写入wav文件

matlab采用audiowrite实现

单通道数据写入

  1. import wave
  2. #import matplotlib.pyplot as plt
  3. import numpy as np
  4. import os
  5. import struct
  6. #wav文件读取
  7. filepath = "./data/" #添加路径
  8. filenlist= os.listdir(filepath) #得到文件夹下的所有文件名称
  9. f = wave.open(filepath+filelist[1],'rb')
  10. params = f.getparams()
  11. nchannels, sampwidth, framerate, nframes = params[:4]
  12. strData = f.readframes(nframes)#读取音频,字符串格式
  13. waveData = np.fromstring(strData,dtype=np.int16)#将字符串转化为int
  14. waveData = waveData*1.0/(max(abs(waveData)))#wave幅值归一化
  15. f.close()
  16. #wav文件写入
  17. outData = waveData#待写入wav的数据,这里仍然取waveData数据
  18. outfile = filepath+'out1.wav'
  19. outwave = wave.open(outfile, 'wb')#定义存储路径以及文件名
  20. nchannels = 1
  21. sampwidth = 2
  22. fs = 8000
  23. data_size = len(outData)
  24. framerate = int(fs)
  25. nframes = data_size
  26. comptype = "NONE"
  27. compname = "not compressed"
  28. outwave.setparams((nchannels, sampwidth, framerate, nframes,
  29. comptype, compname))
  30. for v in outData:
  31.     outwave.writeframes(struct.pack('h', int(v * 64000 / 2)))#outData:16位,-32767~32767,注意不要溢出
  32. outwave.close()

多通道数据写入

多通道的写入与多通道读取类似,多通道读取是将一维数据reshape为二维,多通道的写入是将二维的数据reshape为一维,其实就是一个逆向的过程:

  1. import wave
  2. #import matplotlib.pyplot as plt
  3. import numpy as np
  4. import os
  5. import struct
  6. #wav文件读取
  7. filepath = "./data/" #添加路径
  8. filelist= os.listdir(filepath) #得到文件夹下的所有文件名称
  9. f = wave.open(filepath+filelist[0],'rb')
  10. params = f.getparams()
  11. nchannels, sampwidth, framerate, nframes = params[:4]
  12. strData = f.readframes(nframes)#读取音频,字符串格式
  13. waveData = np.fromstring(strData,dtype=np.int16)#将字符串转化为int
  14. waveData = waveData*1.0/(max(abs(waveData)))#wave幅值归一化
  15. waveData = np.reshape(waveData,[nframes,nchannels])
  16. f.close()
  17. #wav文件写入
  18. outData = waveData#待写入wav的数据,这里仍然取waveData数据
  19. outData = np.reshape(outData,[nframes*nchannels,1])
  20. outfile = filepath+'out2.wav'
  21. outwave = wave.open(outfile, 'wb')#定义存储路径以及文件名
  22. nchannels = 3
  23. sampwidth = 2
  24. fs = 8000
  25. data_size = len(outData)
  26. framerate = int(fs)
  27. nframes = data_size
  28. comptype = "NONE"
  29. compname = "not compressed"
  30. outwave.setparams((nchannels, sampwidth, framerate, nframes,
  31. comptype, compname))
  32. for v in outData:
  33. outwave.writeframes(struct.pack('h', int(v * 64000 / 2)))#outData:16位,-32767~32767,注意不要溢出
  34. outwave.close()
  1. 信号加窗

通常对信号截断、分帧需要加窗,因为截断都有频域能量泄露,而窗函数可以减少截断带来的影响。

窗函数在scipy.signal信号处理工具箱中,如hamming窗:

  1. import pylab as pl
  2. import scipy.signal as signal
  3. pl.figure(figsize=(6,2))
  4. pl.plot(signal.hanning(512))
  1. 信号分帧

信号分帧的理论依据,其中x是语音信号,w是窗函数:

加窗截断类似采样,为了保证相邻帧不至于差别过大,通常帧与帧之间有帧移,其实就是插值平滑的作用。

给出示意图:

这是没有加窗的示例:

  1. import numpy as np
  2. import wave
  3. import os
  4. #import math
  5. def enframe(signal, nw, inc):
  6. '''将音频信号转化为帧。
  7. 参数含义:
  8. signal:原始音频型号
  9. nw:每一帧的长度(这里指采样点的长度,即采样频率乘以时间间隔)
  10. inc:相邻帧的间隔(同上定义)
  11. '''
  12. signal_length=len(signal) #信号总长度
  13. if signal_length<=nw: #若信号长度小于一个帧的长度,则帧数定义为1
  14. nf=1
  15. else: #否则,计算帧的总长度
  16. nf=int(np.ceil((1.0*signal_length-nw+inc)/inc))
  17. pad_length=int((nf-1)*inc+nw) #所有帧加起来总的铺平后的长度
  18. zeros=np.zeros((pad_length-signal_length,)) #不够的长度使用0填补,类似于FFT中的扩充数组操作
  19. pad_signal=np.concatenate((signal,zeros)) #填补后的信号记为pad_signal
  20. indices=np.tile(np.arange(0,nw),(nf,1))+np.tile(np.arange(0,nf*inc,inc),(nw,1)).T #相当于对所有帧的时间点进行抽取,得到nf*nw长度的矩阵
  21. indices=np.array(indices,dtype=np.int32) #将indices转化为矩阵
  22. frames=pad_signal[indices] #得到帧信号
  23. # win=np.tile(winfunc(nw),(nf,1)) #window窗函数,这里默认取1
  24. # return frames*win #返回帧信号矩阵
  25. return frames
  26. def wavread(filename):
  27. f = wave.open(filename,'rb')
  28. params = f.getparams()
  29. nchannels, sampwidth, framerate, nframes = params[:4]
  30. strData = f.readframes(nframes)#读取音频,字符串格式
  31. waveData = np.fromstring(strData,dtype=np.int16)#将字符串转化为int
  32. f.close()
  33. waveData = waveData*1.0/(max(abs(waveData)))#wave幅值归一化
  34. waveData = np.reshape(waveData,[nframes,nchannels]).T
  35. return waveData
  36. filepath = "./data/" #添加路径
  37. dirname= os.listdir(filepath) #得到文件夹下的所有文件名称
  38. filename = filepath+dirname[0]
  39. data = wavread(filename)
  40. nw = 512
  41. inc = 128
  42. Frame = enframe(data[0], nw, inc)

如果需要加窗,只需要将函数修改为:

  1. def enframe(signal, nw, inc, winfunc):
  2. '''将音频信号转化为帧。
  3. 参数含义:
  4. signal:原始音频型号
  5. nw:每一帧的长度(这里指采样点的长度,即采样频率乘以时间间隔)
  6. inc:相邻帧的间隔(同上定义)
  7. '''
  8. signal_length=len(signal) #信号总长度
  9. if signal_length<=nw: #若信号长度小于一个帧的长度,则帧数定义为1
  10. nf=1
  11. else: #否则,计算帧的总长度
  12. nf=int(np.ceil((1.0*signal_length-nw+inc)/inc))
  13. pad_length=int((nf-1)*inc+nw) #所有帧加起来总的铺平后的长度
  14. zeros=np.zeros((pad_length-signal_length,)) #不够的长度使用0填补,类似于FFT中的扩充数组操作
  15. pad_signal=np.concatenate((signal,zeros)) #填补后的信号记为pad_signal
  16. indices=np.tile(np.arange(0,nw),(nf,1))+np.tile(np.arange(0,nf*inc,inc),(nw,1)).T #相当于对所有帧的时间点进行抽取,得到nf*nw长度的矩阵
  17. indices=np.array(indices,dtype=np.int32) #将indices转化为矩阵
  18. frames=pad_signal[indices] #得到帧信号
  19. win=np.tile(winfunc,(nf,1)) #window窗函数,这里默认取1
  20. return frames*win #返回帧信号矩阵

  其中窗函数,以hamming窗为例:

  1. winfunc = signal.hamming(nw)
  2. Frame = enframe(data[0], nw, inc, winfunc)

  调用即可。

  1. 语谱图

其实得到了分帧信号,频域变换取幅值,就可以得到语谱图,如果仅仅是观察,matplotlib.pyplot有specgram指令:

  1. import wave
  2. import matplotlib.pyplot as plt
  3. import numpy as np
  4. import os
  5. filepath = "./data/" #添加路径
  6. filename= os.listdir(filepath) #得到文件夹下的所有文件名称
  7. f = wave.open(filepath+filename[0],'rb')
  8. params = f.getparams()
  9. nchannels, sampwidth, framerate, nframes = params[:4]
  10. strData = f.readframes(nframes)#读取音频,字符串格式
  11. waveData = np.fromstring(strData,dtype=np.int16)#将字符串转化为int
  12. waveData = waveData*1.0/(max(abs(waveData)))#wave幅值归一化
  13. waveData = np.reshape(waveData,[nframes,nchannels]).T
  14. f.close()
  15. # plot the wave
  16. plt.specgram(waveData[0],Fs = framerate, scale_by_freq = True, sides = 'default')
  17. plt.ylabel('Frequency(Hz)')
  18. plt.xlabel('Time(s)')
  19. plt.show()

  1. STFT和iSTFT

  1. def stft(signal, window, win_size, hop_size, last_sample=False):
  2. """Convert time-domain signal to time-frequency domain.
  3. Args:
  4. signal : multi-channel time-domain signal
  5. window : window function, see cola_hamming as an example.
  6. win_size : window size (number of samples)
  7. hop_size : hop size (number of samples)
  8. last_sample : include last sample, by default (due to legacy bug),
  9. the last sample is not included.
  10. Returns:
  11. tf : multi-channel time-frequency domain signal.
  12. """
  13. assert signal.ndim == 2
  14. w = window(win_size, hop_size)
  15. return np.array([[
  16. np.fft.fft(c[t:t + win_size] * w)
  17. for t in range(0,
  18. len(c) - win_size + (1 if last_sample else 0), hop_size)
  19. ] for c in signal])
  20. def istft(tf, hop_size):
  21. """Inverse STFT
  22. Args:
  23. tf : multi-channel time-frequency domain signal.
  24. hop_size : hop size (number of samples)
  25. Returns:
  26. signal : multi-channel time-domain signal
  27. """
  28. tf = np.asarray(tf)
  29. nch, nframe, nfbin = tf.shape
  30. signal = np.zeros((nch, (nframe - 1) * hop_size + nfbin))
  31. for t in range(nframe):
  32. signal[:, t*hop_size:t*hop_size+nfbin] += \
  33. np.real(np.fft.ifft(tf[:, t]))
  34. return signal

  1. 计算导向矢量

  1. def steering_vector(delay, win_size=0, fbins=None, fs=None):
  2. """Compute the steering vector.
  3. One and only one of the conditions are true:
  4. - win_size != 0
  5. - fbins is not None
  6. Args:
  7. delay : delay of each channel (see compute_delay),
  8. unit is second if fs is not None, otherwise sample
  9. win_size : (default 0) window (FFT) size. If zero, use fbins.
  10. fbins : (default None) center of frequency bins, as discrete value.
  11. fs : (default None) sample rate
  12. Returns:
  13. stv : steering vector, indices (cf)
  14. """
  15. assert (win_size != 0) != (fbins is not None)
  16. delay = np.asarray(delay)
  17. if fs is not None:
  18. delay *= fs # to discrete-time value
  19. if fbins is None:
  20. fbins = np.fft.fftfreq(win_size)
  21. return np.exp(-2j * math.pi * np.outer(delay, fbins))

  1. 计算延时

  1. def compute_delay(m_pos, doa, c=340, fs=None):
  2. """Compute delay of signal arrival at microphones.
  3. Args:
  4. m_pos : microphone positions, (M,3) array,
  5. M is number of microphones.
  6. doa : direction of arrival, (3,) array or (N,3) array,
  7. N is the number of sources.
  8. c : (default 340) speed of sound (m/s).
  9. fs : (default None) sample rate.
  10. Return:
  11. delay : delay with reference of arrival at first microphone.
  12. first element is always 0.
  13. unit is second if fs is None, otherwise sample.
  14. """
  15. m_pos = np.asarray(m_pos)
  16. doa = np.asarray(doa)
  17. # relative position wrt first microphone
  18. r_pos = m_pos - m_pos[0]
  19. # inner product -> different in time
  20. if doa.ndim == 1:
  21. doa /= np.sqrt(np.sum(doa**2.0)) # normalize
  22. diff = -np.einsum('ij,j->i', r_pos, doa) / c
  23. else:
  24. assert doa.ndim == 2
  25. doa /= np.sqrt(np.sum(doa**2.0, axis=1, keepdims=True)) # normalize
  26. diff = -np.einsum('ij,kj->ki', r_pos, doa) / c
  27. if fs is not None:
  28. return diff * fs
  29. else:
  30. return diff

  1. 计算协方差矩阵

  1. def cov_matrix(tf):
  2. """Covariance matrix of the multi-channel signal.
  3. Args:
  4. tf : multi-channel time-frequency domain signal.
  5. Returns:
  6. cov : covariance matrix, indexed by (ccf)
  7. """
  8. nch, nframe, nfbin = tf.shape
  9. return np.einsum('itf,jtf->ijf', tf, tf.conj()) / float(nframe)

11. VAD操作

  1. def vad_by_threshold(fs, sig, vadrate, threshold_db, neighbor_size=0):
  2. """Voice Activity Detection by threshold
  3. Args:
  4. fs : sample rate.
  5. signal : multi-channel time-domain signal.
  6. vadrate : output vad rate
  7. threshold_db : threshold in decibel
  8. neighbor_size : half size of (excluding center) neighbor area
  9. Returns:
  10. vad : VAD label (0: silence, 1: active)
  11. """
  12. nch, nsamples = sig.shape
  13. nframes = nsamples * vadrate / fs
  14. fpower = np.zeros((nch, nframes)) # power at frame level
  15. for i in range(nframes):
  16. fpower[:, i] = power(sig[:,
  17. (i * fs / vadrate):((i + 1) * fs / vadrate)])
  18. # average power in neighbor area
  19. if neighbor_size == 0:
  20. apower = fpower
  21. else:
  22. apower = np.zeros((nch, nframes))
  23. for i in range(nframes):
  24. apower[:, i] = np.mean(
  25. fpower[:,
  26. max(0, i - neighbor_size):min(nframes, i +
  27. neighbor_size + 1)],
  28. axis=1)
  29. return (apower > 10.0**(threshold_db / 10.0)).astype(int)

参考资料

https://www.cnblogs.com/xingshansi/p/6799994.html

https://lxp-never.blog.csdn.net/article/details/84967662

https://github.com/hwp/apkit

本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号