当前位置:   article > 正文

基于DEAP数据集的四种机器学习方法的情绪分类

基于DEAP数据集的四种机器学习方法的情绪分类

        在机器学习领域,KNN(K-Nearest Neighbors)、SVM(Support Vector Machine)、决策树(Decision Tree)和随机森林(Random Forest)是常见且广泛应用的算法。

介绍

1. KNN(K-Nearest Neighbors,K近邻)

KNN算法是一种基本的分类和回归方法。对于分类任务,它基于特征空间中最接近的k个邻居的多数投票进行预测。对于回归任务,KNN算法则是通过k个最近邻居的平均值(或加权平均值)来估计目标变量的值。KNN算法简单易懂,适用于小型数据集和基本的模式识别任务。

2. SVM(Support Vector Machine,支持向量机)

SVM是一种强大的监督学习算法,适用于分类和回归任务。它的核心思想是通过在特征空间中找到一个最优的超平面来进行分类。SVM通过最大化类别之间的间隔来提高分类性能,同时可以通过核函数将线性SVM扩展到非线性情况下。SVM在处理高维数据和复杂数据分布时表现出色。

3. 决策树(Decision Tree)

决策树是一种树形结构的分类器,每个节点代表一个特征,每个分支代表该特征的一个可能取值,最终的叶子节点代表分类结果。决策树的构建过程是基于训练数据,通过递归地将数据划分为最纯净的子集来进行分类。决策树易于理解和解释,并且可以处理数值型和类别型数据。但是,决策树容易出现过拟合的问题,因此需要进行剪枝等处理。

4. 随机森林(Random Forest)

随机森林是一种集成学习方法,基于多个决策树构建而成。它通过随机选择特征和样本子集来构建每棵树,然后对每棵树的预测结果进行投票或取平均值来得到最终预测结果。随机森林具有良好的泛化能力和抗过拟合能力,适用于处理大规模数据和高维数据。

总的来说,KNN算法简单直观,适用于小型数据集;SVM适用于处理高维数据和复杂数据分布;决策树易于理解和解释,但容易过拟合;随机森林是一种强大的集成学习方法,适用于处理大规模数据和高维数据。

程序实现

1.数据准备

  1. import pickle
  2. import numpy as np
  3. def read_data(filename):
  4.     x = pickle._Unpickler(open(filename, 'rb'))
  5.     x.encoding = 'latin1'
  6.     data = x.load()
  7.     return data
  8. files = []
  9. for n in range(1, 33): 
  10.     s = ''
  11.     if n < 10:
  12.         s += '0'
  13.     s += str(n)
  14.     files.append(s)
  15. # print(files)
  16. labels = []
  17. data = []
  18. for i in files: 
  19.     fileph = "E:/DEAP投票/data_preprocessed_python/s" + i + ".dat"
  20.     d = read_data(fileph)
  21.     labels.append(d['labels'])
  22.     data.append(d['data'])
  23. # print(labels)
  24. # print(data)

2.将数据转换为array格式

  1. labels = np.array(labels)
  2. data = np.array(data)
  3. print(labels.shape)
  4. print(data.shape)
  5. labels = labels.reshape(1280, 4)
  6. data = data.reshape(1280, 40, 8064)
  7. print(labels.shape)
  8. print(data.shape)
  9. # 特征提取
  10. eeg_data = data[:,:32,:]   #后面通道不是脑电通道,只有前32个为脑电通道
  11. print(eeg_data.shape)
  12. PSD特征
  13. from scipy.signal import welch
  14. from scipy.integrate import simps
  15. def bandpower(data, sf, band): 
  16.     band = np.asarray(band)
  17.     low, high = band
  18.     nperseg = (2 / low) * sf
  19.     freqs, psd = welch(data, sf, nperseg=nperseg)  #计算功率谱密度数组
  20.     freq_res = freqs[1] - freqs[0]
  21.     idx_band = np.logical_and(freqs >= low, freqs <= high)
  22.     bp = simps(psd[idx_band], dx=freq_res)   #积分
  23.     return bp
  24. def get_band_power(people, channel, band):
  25.     bd = (0,0)
  26.     if (band == "delta"):
  27.         bd = (0.5,4)
  28.     if (band == "theta"):
  29.         bd = (4,8)
  30.     elif (band == "alpha"):
  31.         bd = (8,12)
  32.     elif (band == "beta"):
  33.         bd = (12,30)
  34.     elif (band == "gamma"):
  35.         bd = (30,64)
  36.     return bandpower(eeg_data[people,channel], 128, bd)
  37. print(len(eeg_data))
  38. print(len(eeg_data[0]))
  39. eeg_band = []
  40. for i in range (len(eeg_data)):  #1280
  41.     for j in range (len(eeg_data[0])):   #32
  42.         eeg_band.append(get_band_power(i,j,"delta"))
  43.         eeg_band.append(get_band_power(i,j,"theta"))
  44.         eeg_band.append(get_band_power(i,j,"alpha"))
  45.         eeg_band.append(get_band_power(i,j,"beta"))
  46.         eeg_band.append(get_band_power(i,j,"gamma"))
  47.     # print(i)
  48. np.array(eeg_band).shape  #1280*32*5
  49. eeg_band = np.array(eeg_band)
  50. eeg_band = eeg_band.reshape((1280,160)) # 5×32
  51. print(eeg_band.shape)
  52. ## Label数据
  53. import pandas as pd
  54. df_label = pd.DataFrame({'Valence': labels[:,0], 'Arousal': labels[:,1], 
  55.                         'Dominance': labels[:,2], 'Liking': labels[:,3]})
  56. df_label
  57. df_label.info()
  58. df_label.describe()
  59. label_name = ["valence","arousal","dominance","liking"]
  60. labels_valence = []
  61. labels_arousal = []
  62. labels_dominance = []
  63. labels_liking = []
  64. for la in labels:   #两分类
  65.     l = []
  66.     if la[0]>5:
  67.         labels_valence.append(1)
  68.     else:
  69.         labels_valence.append(0)
  70.     if la[1]>5:
  71.         labels_arousal.append(1)
  72.     else:
  73.         labels_arousal.append(0)
  74.     if la[2]>5:
  75.         labels_dominance.append(1)
  76.     else:
  77.         labels_dominance.append(0)
  78.     if la[3]>6:
  79.         labels_liking.append(1)
  80.     else:
  81.         labels_liking.append(0)

3.模型搭建、训练、测试、优化

  1. # X数据
  2. data_x = eeg_band
  3. print(data_x.shape)
  4. # Y数据
  5. label_y = labels_valence # 根据需求替换Y数据
  6. # label_y = labels_arousal
  7. # label_y = labels_dominance
  8. # label_y = labels_liking
  9. trainscores = []
  10. testscores = []

3.1SVM

  1. from sklearn import preprocessing
  2. X = data_x
  3. # 升维
  4. poly = preprocessing.PolynomialFeatures(degree=2) #生成了二次多项式
  5. X = poly.fit_transform(X)
  6. min_max_scaler = preprocessing.MinMaxScaler()
  7. X=min_max_scaler.fit_transform(X) #对数据进行缩放
  8. # X=preprocessing.scale(X)
  9. X = preprocessing.normalize(X, norm='l1') #L1正则化处理
  10. print(X.shape)
  11. # 降维
  12. # from sklearn.decomposition import PCA
  13. # pca = PCA(n_components=1000)
  14. # X=pca.fit_transform(X)
  15. # print(X.shape)
  16. from sklearn.model_selection import train_test_split
  17. X_train, X_test, y_train, y_test = train_test_split(X, label_y)
  18. from sklearn.neighbors import KNeighborsClassifier
  19. knn = KNeighborsClassifier(n_neighbors=7)
  20. knn.fit(X_train, y_train)
  21. train_score=knn.score(X_train,y_train)
  22. test_score=knn.score(X_test,y_test)
  23. knn_pred = knn.predict(X_test)
  24. print("训练集得分:", train_score)
  25. print("测试集得分:", test_score)
  26. trainscores.append(train_score)
  27. testscores.append(test_score)

3.2KNN

  1. X = data_x
  2. from sklearn import preprocessing
  3. # 升维
  4. poly = preprocessing.PolynomialFeatures(degree=2)
  5. X = poly.fit_transform(X)
  6. min_max_scaler = preprocessing.MinMaxScaler()
  7. X=min_max_scaler.fit_transform(X)
  8. # X=preprocessing.scale(X)
  9. # X = preprocessing.normalize(X, norm='l2')
  10. print(X.shape)
  11. # 降维
  12. # from sklearn.decomposition import PCA
  13. # pca = PCA(n_components=20)
  14. # X=pca.fit_transform(X)
  15. # print(X.shape)
  16. from sklearn.model_selection import train_test_split
  17. X_train, X_test, y_train, y_test = train_test_split(data_x, label_y)
  18. from sklearn.svm import SVC
  19. svc = SVC(kernel='rbf',C = 0.1)
  20. svc.fit(X_train, y_train)
  21. train_score=svc.score(X_train,y_train)
  22. test_score=svc.score(X_test,y_test)
  23. svm_pred = svc.predict(X_test)
  24. print("训练集得分:", train_score)
  25. print("测试集得分:", test_score)
  26. trainscores.append(train_score)
  27. testscores.append(test_score)

3.3决策树

  1. X = data_x
  2. from sklearn import preprocessing
  3. # 升维
  4. poly = preprocessing.PolynomialFeatures(degree=2)
  5. X = poly.fit_transform(X)
  6. min_max_scaler = preprocessing.MinMaxScaler()
  7. X=min_max_scaler.fit_transform(X)
  8. # X=preprocessing.scale(X)
  9. X = preprocessing.normalize(X, norm='l1')
  10. print(X.shape)
  11. # 降维
  12. # from sklearn.decomposition import PCA
  13. # pca = PCA(n_components=100)
  14. # X=pca.fit_transform(X)
  15. # print(X.shape)
  16. from sklearn.model_selection import train_test_split
  17. X_train, X_test, y_train, y_test = train_test_split(data_x, label_y)
  18. from sklearn import tree
  19. dtree = tree.DecisionTreeClassifier(max_depth=20,min_samples_split=4)
  20. dtree = dtree.fit(X_train, y_train)
  21. dtree_pred = dtree.predict(X_test)
  22. train_score=dtree.score(X_train,y_train)
  23. test_score=dtree.score(X_test,y_test)
  24. print("训练集得分:", train_score)
  25. print("测试集得分:", test_score)
  26. trainscores.append(train_score)
  27. testscores.append(test_score)

3.4随机森林

  1. X = data_x
  2. from sklearn import preprocessing
  3. # 升维
  4. poly = preprocessing.PolynomialFeatures(degree=2)
  5. X = poly.fit_transform(X)
  6. min_max_scaler = preprocessing.MinMaxScaler()
  7. X=min_max_scaler.fit_transform(X)
  8. # X=preprocessing.scale(X)
  9. X = preprocessing.normalize(X, norm='l1')
  10. print(X.shape)
  11. # 降维
  12. # from sklearn.decomposition import PCA
  13. # pca = PCA(n_components=100)
  14. # X=pca.fit_transform(X)
  15. # print(X.shape)
  16. from sklearn.model_selection import train_test_split
  17. X_train, X_test, y_train, y_test = train_test_split(data_x, label_y)
  18. from sklearn.ensemble import RandomForestClassifier
  19. rf=RandomForestClassifier(n_estimators=50,max_depth=20,min_samples_split=5)
  20. rf=rf.fit(X_train, y_train)
  21. train_score=rf.score(X_train,y_train)
  22. test_score=rf.score(X_test,y_test)
  23. rf_pred = rf.predict(X_test)
  24. print("训练集得分:", train_score)
  25. print("测试集得分:", test_score)
  26. trainscores.append(train_score)
  27. testscores.append(test_score)

4.模型比较

  1. model_name = ["KNN","SVM","Dtree","RF"]
  2. import matplotlib.pyplot as plt
  3. plt.title('Model Score', fontsize=16)
  4. plt.xlabel('model', fontsize=14)
  5. plt.ylabel('score', fontsize=14)
  6. plt.grid(linestyle=':', axis='y')
  7. x = np.arange(4)
  8. a = plt.bar(x - 0.3, trainscores, 0.3, color='dodgerblue', label='train', align='center')
  9. b = plt.bar(x, testscores, 0.3, color='orangered', label='test', align='center')
  10. # 设置标签
  11. for i in a + b:
  12.     h = i.get_height()
  13.     plt.text(i.get_x() + i.get_width() / 2, h, '%.3f' % h, ha='center', va='bottom')
  14. plt.xticks(x,model_name,rotation=75)
  15. plt.legend(loc='lower right')
  16. plt.show()

4.1模型比较结果 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/2023面试高手/article/detail/493425
推荐阅读
相关标签
  

闽ICP备14008679号