赞
踩
贝尔实验室
连续语音识别 continuous speech recognizer
自动音乐分类系统Automatic
1、声波输入计算机 decoding Raw audio
2、将声波sound waves 转化为 数字 进行存储。声波是一维的,只需要等距地记录 波的高度。
3、抽样sampling ,每秒钟读取N个样品。
奈奎斯特定理:采样速度 >= 2 * 声音最高频率f_max
将声音分成多个片段,如每段20ms
4、将 数字 转化成 简单的折线图
5、计算每个 频段 的能量,为音频片段audio snippet 创建 声纹:将声音信号 画成 频谱图spectrogram。纵轴频率,横轴时间。
6、对音频片段进行切片,找出声音的字母。元音Vowel。
通过神经网络 预测每个字母的下一个字母的可能性。
音频进行 特征提取 ,取出 pitch和MFCC,进行模型训练,训练分类器。当输入未知音频时,模型会进行预测。
音乐格式:
WMA——Windows Media、
Mp3、
wav
CD的采样率是44.1khz
import pandas as pd
#读取曲目
tracks = pd.read_csv('D:/My life/music/echonest/fma-rock-vs-hiphop.csv')
print(tracks.shape)
#tracks[0:5]
tracks
(17734, 21)
track_id | bit_rate | comments | composer | date_created | date_recorded | duration | favorites | genre_top | genres | ... | information | interest | language_code | license | listens | lyricist | number | publisher | tags | title | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 135 | 256000 | 1 | NaN | 2008-11-26 01:43:26 | 2008-11-26 00:00:00 | 837 | 0 | Rock | [45, 58] | ... | NaN | 2484 | en | Attribution-NonCommercial-ShareAlike 3.0 Inter... | 1832 | NaN | 0 | NaN | [] | Father's Day |
1 | 136 | 256000 | 1 | NaN | 2008-11-26 01:43:35 | 2008-11-26 00:00:00 | 509 | 0 | Rock | [45, 58] | ... | NaN | 1948 | en | Attribution-NonCommercial-ShareAlike 3.0 Inter... | 1498 | NaN | 0 | NaN | [] | Peel Back The Mountain Sky |
2 | 151 | 192000 | 0 | NaN | 2008-11-26 01:44:55 | NaN | 192 | 0 | Rock | [25] | ... | NaN | 701 | en | Attribution-NonCommercial-ShareAlike 3.0 Inter... | 148 | NaN | 4 | NaN | [] | Untitled 04 |
3 | 152 | 192000 | 0 | NaN | 2008-11-26 01:44:58 | NaN | 193 | 0 | Rock | [25] | ... | NaN | 637 | en | Attribution-NonCommercial-ShareAlike 3.0 Inter... | 98 | NaN | 11 | NaN | [] | Untitled 11 |
4 | 153 | 256000 | 0 | Arc and Sender | 2008-11-26 01:45:00 | 2008-11-26 00:00:00 | 405 | 5 | Rock | [26] | ... | NaN | 354 | en | Attribution-NonCommercial-NoDerivatives (aka M... | 424 | NaN | 2 | NaN | [] | Hundred-Year Flood |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
17729 | 155063 | 320000 | 0 | NaN | 2017-03-24 19:40:43 | NaN | 283 | 3 | Hip-Hop | [21, 811] | ... | NaN | 1283 | NaN | Attribution | 1050 | NaN | 4 | NaN | ['old school beats', '2017 free instrumentals'... | Been On |
17730 | 155064 | 320000 | 0 | NaN | 2017-03-24 19:40:44 | NaN | 250 | 2 | Hip-Hop | [21, 811] | ... | NaN | 1077 | NaN | Attribution | 858 | NaN | 2 | NaN | ['old school beats', '2017 free instrumentals'... | Send Me |
17731 | 155065 | 320000 | 0 | NaN | 2017-03-24 19:40:45 | NaN | 219 | 3 | Hip-Hop | [21, 811] | ... | NaN | 1340 | NaN | Attribution | 1142 | NaN | 1 | NaN | ['old school beats', '2017 free instrumentals'... | The Question |
17732 | 155066 | 320000 | 0 | NaN | 2017-03-24 19:40:47 | NaN | 252 | 6 | Hip-Hop | [21, 811] | ... | NaN | 2065 | NaN | Attribution | 1474 | NaN | 3 | NaN | ['old school beats', '2017 free instrumentals'... | Roy |
17733 | 155247 | 320000 | 0 | Fleslit | 2017-03-29 01:40:28 | NaN | 211 | 3 | Hip-Hop | [21, 539, 811] | ... | NaN | 1379 | NaN | Attribution | 1025 | NaN | 0 | Fleslit | ['instrumental trap beat', 'love', 'instrument... | Love In The Sky |
17734 rows × 21 columns
#读入in指标tracks metrics
echonest_metrics = pd.read_json('D:/My life/music/echonest/echonest-metrics.json', precise_float = True)
print(echonest_metrics.shape)
echonest_metrics
(13129, 9)
输出:
track_id | acousticness | danceability | energy | instrumentalness | liveness | speechiness | tempo | valence | |
---|---|---|---|---|---|---|---|---|---|
0 | 2 | 0.416675 | 0.675894 | 0.634476 | 0.010628 | 0.177647 | 0.159310 | 165.922 | 0.576661 |
1 | 3 | 0.374408 | 0.528643 | 0.817461 | 0.001851 | 0.105880 | 0.461818 | 126.957 | 0.269240 |
2 | 5 | 0.043567 | 0.745566 | 0.701470 | 0.000697 | 0.373143 | 0.124595 | 100.260 | 0.621661 |
3 | 10 | 0.951670 | 0.658179 | 0.924525 | 0.965427 | 0.115474 | 0.032985 | 111.562 | 0.963590 |
4 | 134 | 0.452217 | 0.513238 | 0.560410 | 0.019443 | 0.096567 | 0.525519 | 114.290 | 0.894072 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
13124 | 124857 | 0.007592 | 0.790364 | 0.719288 | 0.853114 | 0.720715 | 0.082550 | 141.332 | 0.890461 |
13125 | 124862 | 0.041498 | 0.843077 | 0.536496 | 0.865151 | 0.547949 | 0.074001 | 101.975 | 0.476845 |
13126 | 124863 | 0.000124 | 0.609686 | 0.895136 | 0.846624 | 0.632903 | 0.051517 | 129.996 | 0.496667 |
13127 | 124864 | 0.327576 | 0.574426 | 0.548327 | 0.452867 | 0.075928 | 0.033388 | 142.009 | 0.569274 |
13128 | 124911 | 0.993606 | 0.499339 | 0.050622 | 0.945677 | 0.095965 | 0.065189 | 119.965 | 0.204652 |
13129 rows × 9 columns
#合并
echo_tracks = pd.merge(echonest_metrics, tracks[['track_id', 'genre_top']], on = 'track_id')
print(echo_tracks.shape)
echo_tracks
(4802, 10)
输出:
track_id | acousticness | danceability | energy | instrumentalness | liveness | speechiness | tempo | valence | genre_top | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | 0.416675 | 0.675894 | 0.634476 | 1.062807e-02 | 0.177647 | 0.159310 | 165.922 | 0.576661 | Hip-Hop |
1 | 3 | 0.374408 | 0.528643 | 0.817461 | 1.851103e-03 | 0.105880 | 0.461818 | 126.957 | 0.269240 | Hip-Hop |
2 | 5 | 0.043567 | 0.745566 | 0.701470 | 6.967990e-04 | 0.373143 | 0.124595 | 100.260 | 0.621661 | Hip-Hop |
3 | 134 | 0.452217 | 0.513238 | 0.560410 | 1.944269e-02 | 0.096567 | 0.525519 | 114.290 | 0.894072 | Hip-Hop |
4 | 153 | 0.988306 | 0.255661 | 0.979774 | 9.730057e-01 | 0.121342 | 0.051740 | 90.241 | 0.034018 | Rock |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4797 | 124718 | 0.412194 | 0.686825 | 0.849309 | 6.000000e-10 | 0.867543 | 0.367315 | 96.104 | 0.692414 | Hip-Hop |
4798 | 124719 | 0.054973 | 0.617535 | 0.728567 | 7.215700e-06 | 0.131438 | 0.243130 | 96.262 | 0.399720 | Hip-Hop |
4799 | 124720 | 0.010478 | 0.652483 | 0.657498 | 7.098000e-07 | 0.701523 | 0.229174 | 94.885 | 0.432240 | Hip-Hop |
4800 | 124721 | 0.067906 | 0.432421 | 0.764508 | 1.625500e-06 | 0.104412 | 0.310553 | 171.329 | 0.580087 | Hip-Hop |
4801 | 124722 | 0.153518 | 0.638660 | 0.762567 | 5.000000e-10 | 0.264847 | 0.303372 | 77.842 | 0.656612 | Hip-Hop |
4802 rows × 10 columns
#检查结果数据-dataframe
echo_tracks.info()
输出:
<class 'pandas.core.frame.DataFrame'> Int64Index: 4802 entries, 0 to 4801 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 track_id 4802 non-null int64 1 acousticness 4802 non-null float64 2 danceability 4802 non-null float64 3 energy 4802 non-null float64 4 instrumentalness 4802 non-null float64 5 liveness 4802 non-null float64 6 speechiness 4802 non-null float64 7 tempo 4802 non-null float64 8 valence 4802 non-null float64 9 genre_top 4802 non-null object dtypes: float64(8), int64(1), object(1) memory usage: 412.7+ KB
echo_tracks.describe()
输出:
track_id | acousticness | danceability | energy | instrumentalness | liveness | speechiness | tempo | valence | |
---|---|---|---|---|---|---|---|---|---|
count | 4802.000000 | 4.802000e+03 | 4802.000000 | 4802.000000 | 4802.000000 | 4802.000000 | 4802.000000 | 4802.000000 | 4802.000000 |
mean | 30164.871720 | 4.870600e-01 | 0.436556 | 0.625126 | 0.604096 | 0.187997 | 0.104877 | 126.687944 | 0.453413 |
std | 28592.013796 | 3.681396e-01 | 0.183502 | 0.244051 | 0.376487 | 0.150562 | 0.145934 | 34.002473 | 0.266632 |
min | 2.000000 | 9.491000e-07 | 0.051307 | 0.000279 | 0.000000 | 0.025297 | 0.023234 | 29.093000 | 0.014392 |
25% | 7494.250000 | 8.351236e-02 | 0.296047 | 0.450757 | 0.164972 | 0.104052 | 0.036897 | 98.000750 | 0.224617 |
50% | 20723.500000 | 5.156888e-01 | 0.419447 | 0.648374 | 0.808752 | 0.123080 | 0.049594 | 124.625500 | 0.446240 |
75% | 44240.750000 | 8.555765e-01 | 0.565339 | 0.837016 | 0.915472 | 0.215151 | 0.088290 | 151.450000 | 0.666914 |
max | 124722.000000 | 9.957965e-01 | 0.961871 | 0.999768 | 0.993134 | 0.971392 | 0.966177 | 250.059000 | 0.983649 |
连续变量之间的成对关系-pairwis,保持模型简单并提高可解释性。
# 相关矩阵-CM correlation matrix
corr_metrics = echo_tracks.corr()
corr_metrics.style.background_gradient()
输出:
# 定义特征
features = echo_tracks.drop(['genre_top', 'track_id'], axis = 1)
# 定义标签
labels = echo_tracks['genre_top']
labels
输出:
0 Hip-Hop
1 Hip-Hop
2 Hip-Hop
3 Hip-Hop
4 Rock
...
4797 Hip-Hop
4798 Hip-Hop
4799 Hip-Hop
4800 Hip-Hop
4801 Hip-Hop
Name: genre_top, Length: 4802, dtype: object
features[0:5]
输出:
acousticness | danceability | energy | instrumentalness | liveness | speechiness | tempo | valence | |
---|---|---|---|---|---|---|---|---|
0 | 0.416675 | 0.675894 | 0.634476 | 0.010628 | 0.177647 | 0.159310 | 165.922 | 0.576661 |
1 | 0.374408 | 0.528643 | 0.817461 | 0.001851 | 0.105880 | 0.461818 | 126.957 | 0.269240 |
2 | 0.043567 | 0.745566 | 0.701470 | 0.000697 | 0.373143 | 0.124595 | 100.260 | 0.621661 |
3 | 0.452217 | 0.513238 | 0.560410 | 0.019443 | 0.096567 | 0.525519 | 114.290 | 0.894072 |
4 | 0.988306 | 0.255661 | 0.979774 | 0.973006 | 0.121342 | 0.051740 | 90.241 | 0.034018 |
# 导入标准化
from sklearn.preprocessing import StandardScaler
# 缩放特征,设置新变量的值
scaler = StandardScaler()
scaled_train_features = scaler.fit_transform(features)
scaler
StandardScaler()
scaled_train_features[0:5]
输出:
array([[-0.19121034, 1.30442004, 0.03831594, -1.57649422, -0.06875487,
0.37303429, 1.15397908, 0.46228696],
[-0.30603598, 0.50188641, 0.78817624, -1.59980943, -0.54546309,
2.44615517, 0.00791367, -0.69081137],
[-1.20481276, 1.68413943, 0.31285194, -1.60287574, 1.22982787,
0.13513049, -0.77731688, 0.63107745],
[-0.09465518, 0.41792741, -0.26520319, -1.55307896, -0.60732615,
2.88270682, -0.36465686, 1.6528586 ],
[ 1.36170559, -0.98589622, 1.45332318, 0.97997488, -0.44275673,
-0.36415677, -1.07200261, -1.57310227]])
# 绘图 %matplotlib inline # 导入绘图模块和PCA模块 import matplotlib.pyplot as plt from sklearn.decomposition import PCA # PCA获得方差比——all features pca = PCA() pca.fit(scaled_train_features) exp_variance = pca.explained_variance_ratio_ # 条形图barplot 绘制方差 fig,ax = plt.subplots() # 注意这里是subplots,不是subplot!!!切记加s!哭辽,最开始打错了,结果就一致报错。 ax.bar(x = range(pca.n_components_), height = exp_variance) ax.set_xlabel('Principal Component')
Text(0.5, 0, ‘Principal Component’)
import numpy as np # 计算累计解释方差 cum_exp_variance = np.cumsum(exp_variance) # 绘制累计解释方差-0.9处绘制虚线 fig, ax = plt.subplots() ax.plot(cum_exp_variance) ax.axhline(y = 0.9, linestyle = '--') n_components = 7 # 选定数量的组件 执行PCA-数据投影到组件 component pca = PCA(n_components, random_state = 10) pca.fit(scaled_train_features) pca_projection = pca.transform(scaled_train_features) print(pca_projection) print(scaled_train_features)
输出:
[[ 1.59666656 1.0500117 -0.01778555 ... -0.36832686 -0.71505324 -0.28731253] [ 1.58153526 1.07661327 1.04346038 ... -1.81917099 1.3884574 0.12558375] [ 2.01545627 1.4085176 0.24506524 ... 0.62769959 -0.45716338 -0.05285551] ... [ 1.66908628 1.84010121 2.38294303 ... 1.23664547 -0.63277253 0.60721569] [ 1.17001951 2.03158181 0.08689922 ... -1.45765649 -0.03590123 -0.02431674] [ 2.36368976 1.15900708 0.4473735 ... -0.03592518 0.82678557 -0.14947633]] [[-0.19121034 1.30442004 0.03831594 ... 0.37303429 1.15397908 0.46228696] [-0.30603598 0.50188641 0.78817624 ... 2.44615517 0.00791367 -0.69081137] [-1.20481276 1.68413943 0.31285194 ... 0.13513049 -0.77731688 0.63107745] ... [-1.29470431 1.17682795 0.13265633 ... 0.85182206 -0.93541008 -0.07941825] [-1.13869115 -0.02253433 0.57117905 ... 1.40951543 1.31301348 0.47513794] [-0.90611434 1.10148973 0.56322452 ... 1.36030881 -1.43669053 0.76217464]]
# 导入train_test_split函数
# 导入Decision tree classifier
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
#分割数据
train_features, test_features, train_labels, test_labels = train_test_split(pca_projection, labels, random_state = 10)
# 训练决策树
tree = DecisionTreeClassifier(random_state = 10)
tree.fit(train_features, train_labels)
# 预测 测试数据的标签
pred_labels_tree = tree.predict(test_features)
pred_labels_tree[0:100]
输出:
array(['Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock',
'Rock', 'Rock', 'Rock', 'Rock', 'Hip-Hop', 'Hip-Hop', 'Rock',
'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock',
'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Hip-Hop', 'Hip-Hop',
'Rock', 'Rock', 'Hip-Hop', 'Rock', 'Rock', 'Hip-Hop', 'Rock',
'Hip-Hop', 'Rock', 'Rock', 'Hip-Hop', 'Rock', 'Rock', 'Hip-Hop',
'Rock', 'Rock', 'Hip-Hop', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock',
'Rock', 'Rock', 'Rock', 'Rock', 'Hip-Hop', 'Rock', 'Rock', 'Rock',
'Rock', 'Rock', 'Rock', 'Rock', 'Hip-Hop', 'Rock', 'Rock', 'Rock',
'Hip-Hop', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock',
'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock',
'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock',
'Rock', 'Hip-Hop', 'Rock', 'Rock', 'Rock', 'Hip-Hop', 'Hip-Hop',
'Hip-Hop'], dtype=object)
#比较决策树 与 逻辑回归 #导入LogisticRegression from sklearn.linear_model import LogisticRegression # 训练 逻辑回归 并预测测试集的标签 logreg = LogisticRegression(random_state = 10) logreg.fit(train_features, train_labels) pred_labels_logit = logreg.predict(test_features) # 两个模型创建分类报告 from sklearn.metrics import classification_report class_rep_tree = classification_report(test_labels, pred_labels_tree) class_rep_log = classification_report(test_labels, pred_labels_logit) print("Decision Tree:\n", class_rep_tree) print("Logistic Regression:\n", class_rep_log)
输出:
Decision Tree: precision recall f1-score support Hip-Hop 0.68 0.66 0.67 235 Rock 0.92 0.93 0.92 966 accuracy 0.87 1201 macro avg 0.80 0.79 0.80 1201 weighted avg 0.87 0.87 0.87 1201 Logistic Regression: precision recall f1-score support Hip-Hop 0.78 0.57 0.66 235 Rock 0.90 0.96 0.93 966 accuracy 0.88 1201 macro avg 0.84 0.76 0.79 1201 weighted avg 0.88 0.88 0.88 1201
pred_labels_logit.shape
(1201,)
len(pred_labels_logit)
1201
pred_labels_logit[0:100]
输出:
array(['Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock',
'Hip-Hop', 'Rock', 'Rock', 'Rock', 'Hip-Hop', 'Rock', 'Rock',
'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock',
'Rock', 'Rock', 'Rock', 'Hip-Hop', 'Rock', 'Rock', 'Hip-Hop',
'Rock', 'Rock', 'Hip-Hop', 'Rock', 'Rock', 'Hip-Hop', 'Rock',
'Hip-Hop', 'Rock', 'Rock', 'Rock', 'Rock', 'Hip-Hop', 'Rock',
'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock',
'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock',
'Rock', 'Rock', 'Rock', 'Rock', 'Hip-Hop', 'Hip-Hop', 'Rock',
'Rock', 'Hip-Hop', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock',
'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock',
'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock', 'Rock',
'Rock', 'Rock', 'Hip-Hop', 'Rock', 'Rock', 'Rock', 'Rock',
'Hip-Hop', 'Hip-Hop'], dtype=object)
from sklearn.model_selection import KFold, cross_val_score
# 设置K折交叉验证
kf = KFold(n_splits = 10, random_state = 10)
tree = DecisionTreeClassifier(random_state = 10)
logreg = LogisticRegression(random_state = 10, solver = 'lbfgs')
# KFold cv 训练模型
tree_score = cross_val_score(estimator = tree, X = pca_projection, y = labels, cv = kf)
logit_score = cross_val_score(estimator = logreg, X = pca_projection, y = labels, cv = kf)
# 打印 分数数组的平均值
print("Decision Tree:", np.mean(tree_score).round(4), "\nLogistic Regression:", np.mean(logit_score).round(4))
Decision Tree: 0.86
Logistic Regression: 0.8794
import gtts
import pyttsx3
gtts.__version__
‘2.2.3’
engine = pyttsx3.init()
engine.say("Sweeter")
engine.say("音乐是时间的艺术")
engine.runAndWait()
安装:pip install pyaudio
import playsound
from playsound import playsound
playsound('D:/My life/music/some music/sodagreen/take_me_away.wav')
实现 不同时长、不同口音、相同文本的语音 正确转化
安装:pip install SpeechRecognition
http://github.com/Uberi/speech_recognition # readme
导入:import speech_recognition as sr
import speech_recognition as sr
print(sr.__version__)
3.8.1
r = sr.Recognizer()
r
<speech_recognition.Recognizer at 0x1fab1004490>
harvard = sr.AudioFile('D:/My life/music/some music/sodagreen/take_me_away.wav')
with harvard as source:
audio = r.record(source)
audio
type(audio)
r.recognize_google(audio, language = 'zh-tw')#繁体中文
#eg 英文
type(audio)
r.recognize_google(audio)
import speech_recognition as sr
r = sr.Recognizer()
mic = sr.Microphone()
sr.Microphone.list_microphone_names()
['Microsoft Sound Mapper - Input',
'麦克风阵列 (Realtek(R) Audio)',
'Microsoft Sound Mapper - Output',
'扬声器 (Realtek(R) Audio)',
'主声音捕获驱动程序',
'麦克风阵列 (Realtek(R) Audio)',
'主声音驱动程序',
'扬声器 (Realtek(R) Audio)',
'扬声器 (Realtek(R) Audio)',
'麦克风阵列 (Realtek(R) Audio)',
'麦克风阵列 (Realtek HD Audio Mic input)',
'立体声混音 (Realtek HD Audio Stereo input)',
'Speakers (Realtek HD Audio output)']
with mic as source:
audio = r.listen(source)
r.recognize_google(audio)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。