当前位置:   article > 正文

数据挖掘作业——PCA_数据挖掘大作业

数据挖掘大作业

SVD实现

代码

数据:自己编写的矩阵

from numpy import *
from numpy import linalg as la

myl=[[4,0,5],[0,0,5],[3,5,7],[2,4,6],[7,9,3],[2,2,1]]
data=mat(myl)
U,Sigma,VT = la.svd(data)
print("U:",U)
print("Sigma:",Sigma)
print("VT:",VT)
c1=VT.T[:,0]
c2=VT.T[:,1]

print("c1:",c1)
print("c2:",c2)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

运行结果

D:\pychram_Project\scikitproject\env\python.exe D:/pychram_Project/scikitproject/svd.py
U: [[-2.99191541e-01 -3.67787337e-01  8.56979841e-01  1.36701914e-01
   1.21246779e-01 -8.61060593e-02]
 [-1.83789930e-01 -5.35921424e-01 -1.33474835e-01 -4.82377244e-01
  -6.49551254e-01 -8.11746283e-02]
 [-5.21800408e-01 -2.41034450e-01 -3.09819342e-01 -3.54372854e-01
   6.57971317e-01  1.23013539e-01]
 [-4.20603359e-01 -2.52514682e-01 -3.57578928e-01  7.75403548e-01
  -1.71375488e-01 -2.90901765e-02]
 [-6.32524712e-01  6.62360759e-01  9.47787388e-02 -1.47626866e-01
  -2.44102270e-01 -2.66135481e-01]
 [-1.65636110e-01  1.30144750e-01  1.22214139e-01 -5.54061813e-04
  -2.03717100e-01  9.48256170e-01]]
Sigma: [17.0250123   7.23052429  3.296737  ]
VT: [[-0.49117846 -0.60589737 -0.62580516]
 [ 0.3039244   0.55408228 -0.77499857]
 [ 0.81631714 -0.57086006 -0.08800629]]
c1: [[-0.49117846]
 [-0.60589737]
 [-0.62580516]]
c2: [[ 0.3039244 ]
 [ 0.55408228]
 [-0.77499857]]

Process finished with exit code 0

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26

scikit-learn PCA

代码

数据:采用scikit-learn中的鸢尾花数据

import numpy as np
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import pandas as pd

iris=load_iris()
X=iris.data
Y=iris.target
print("数组维度:",X.shape)
iris_dataFrame=pd.DataFrame(X)
print("特征矩阵:\n",iris_dataFrame)

#PCA算法实现降维
pca=PCA(n_components=2)
X_reduced=pca.fit_transform(X)
print(X_reduced)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

运行结果

这部分截取一部分运行结果

D:\pychram_Project\scikitproject\env\python.exe D:/pychram_Project/scikitproject/PCA.py
数组维度: (150, 4)
特征矩阵:
        0    1    2    3
0    5.1  3.5  1.4  0.2
1    4.9  3.0  1.4  0.2
2    4.7  3.2  1.3  0.2
3    4.6  3.1  1.5  0.2
4    5.0  3.6  1.4  0.2
..   ...  ...  ...  ...
145  6.7  3.0  5.2  2.3
146  6.3  2.5  5.0  1.9
147  6.5  3.0  5.2  2.0
148  6.2  3.4  5.4  2.3
149  5.9  3.0  5.1  1.8

[150 rows x 4 columns]
[[-2.68412563  0.31939725]
 [-2.71414169 -0.17700123]
 [-2.88899057 -0.14494943]
 [-2.74534286 -0.31829898]
 [-2.72871654  0.32675451]
 [-2.28085963  0.74133045]
 [-2.82053775 -0.08946138]
 [-2.62614497  0.16338496]
 [-2.88638273 -0.57831175]
 [-2.6727558  -0.11377425]
 [-2.50694709  0.6450689 ]
 [-2.61275523  0.01472994]
 [-2.78610927 -0.235112  ]
 [-3.22380374 -0.51139459]
 [-2.64475039  1.17876464]
 [-2.38603903  1.33806233]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33

explained variance Ratio

代码

数据:采用scikit-learn中的鸢尾花数据

print("explained variance ratio:",pca.explained_variance_ratio_)
  • 1

运行结果

explained variance ratio: [0.92461872 0.05306648]
  • 1

choosing the right number of Dimensions

代码

数据:采用scikit-learn中的鸢尾花数据

import numpy as np
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import pandas as pd

iris=load_iris()
X=iris.data
Y=iris.target
print("数组维度:",X.shape)
iris_dataFrame=pd.DataFrame(X)
print("特征矩阵:\n",iris_dataFrame)

pca=PCA()
pca.fit(X)
cumsum=np.cumsum(pca.explained_variance_ratio_)
d=np.argmax(cumsum>=0.95)+1
X_new =pca.transform(X)
print("不减少维数后的矩阵:\n",X_new)
print("保留95%后的最小维度:\n",d)

pca=PCA(n_components=0.95)
X_reduced=pca.fit_transform(X)
print("降维后的矩阵:\n",X_reduced)
print("降维后的矩阵维度:\n",X_reduced.shape)



  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27

运行结果

均只保留一部分

D:\pychram_Project\scikitproject\env\python.exe D:/pychram_Project/scikitproject/Right_Number_of_Dimernsions.py
数组维度: (150, 4)
特征矩阵:
        0    1    2    3
0    5.1  3.5  1.4  0.2
1    4.9  3.0  1.4  0.2
2    4.7  3.2  1.3  0.2
3    4.6  3.1  1.5  0.2
4    5.0  3.6  1.4  0.2
..   ...  ...  ...  ...
145  6.7  3.0  5.2  2.3
146  6.3  2.5  5.0  1.9
147  6.5  3.0  5.2  2.0
148  6.2  3.4  5.4  2.3
149  5.9  3.0  5.1  1.8

[150 rows x 4 columns]
不减少维数后的矩阵:
 [[-2.68412563e+00  3.19397247e-01 -2.79148276e-02 -2.26243707e-03]
 [-2.71414169e+00 -1.77001225e-01 -2.10464272e-01 -9.90265503e-02]
 [-2.88899057e+00 -1.44949426e-01  1.79002563e-02 -1.99683897e-02]
 [-2.74534286e+00 -3.18298979e-01  3.15593736e-02  7.55758166e-02]
 [-2.72871654e+00  3.26754513e-01  9.00792406e-02  6.12585926e-02]
 [-2.28085963e+00  7.41330449e-01  1.68677658e-01  2.42008576e-02]
 [-2.82053775e+00 -8.94613845e-02  2.57892158e-01  4.81431065e-02]
 [-2.62614497e+00  1.63384960e-01 -2.18793179e-02  4.52978706e-02]
 [-2.88638273e+00 -5.78311754e-01  2.07595703e-02  2.67447358e-02]
 [-2.67275580e+00 -1.13774246e-01 -1.97632725e-01  5.62954013e-02]
 [-2.50694709e+00  6.45068899e-01 -7.53180094e-02  1.50199245e-02]
 [-2.61275523e+00  1.47299392e-02  1.02150260e-01  1.56379208e-01]
 [-2.78610927e+00 -2.35112000e-01 -2.06844430e-01  7.88791149e-03]
 [-3.22380374e+00 -5.11394587e-01  6.12996725e-02  2.16798118e-02]
 [-2.64475039e+00  1.17876464e+00 -1.51627524e-01 -1.59209718e-01]
 [-2.38603903e+00  1.33806233e+00  2.77776903e-01 -6.55154587e-03]
 [-2.62352788e+00  8.10679514e-01  1.38183228e-01 -1.67734737e-01]
 [-2.64829671e+00  3.11849145e-01  2.66683156e-02 -7.76281796e-02]
 [-2.19982032e+00  8.72839039e-01 -1.20305523e-01 -2.70518681e-02]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
保留95%后的最小维度:
 2
降维后的矩阵:
 [[-2.68412563  0.31939725]
 [-2.71414169 -0.17700123]
 [-2.88899057 -0.14494943]
 [-2.74534286 -0.31829898]
 [-2.72871654  0.32675451]
 [-2.28085963  0.74133045]
 [-2.82053775 -0.08946138]
 [-2.62614497  0.16338496]
 [-2.88638273 -0.57831175]
 [-2.6727558  -0.11377425]
 [-2.50694709  0.6450689 ]
 [-2.61275523  0.01472994]
 [-2.78610927 -0.235112  ]
 [-3.22380374 -0.51139459]
 [-2.64475039  1.17876464]
 [-2.38603903  1.33806233]
 ...
 降维后的矩阵维度:
 (150, 2)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

PCA for Compression

代码

数据集:MNIST数据集

import gzip
import os

import pandas as pd
from sklearn.decomposition import PCA

import torch
import torchvision
import numpy as np
from PIL import Image
from matplotlib import pyplot as plt
from torchvision import datasets,transforms
from torch.utils.data import DataLoader,Dataset

#导入MNIST数据集
train_data = datasets.MNIST(root="./data_MNIST",train=True,transform=transforms.ToTensor(),download=True)


#加载MNIST数据集
train_data_loader=torch.utils.data.DataLoader(
    dataset=train_data,
    batch_size=64,
    shuffle=True,
    drop_last=True
)

images,labels=next(iter(train_data_loader))
#把64张图片拼接成1张
img=torchvision.utils.make_grid(images)

img=img.numpy().transpose(1,2,0)
std=[0.5,0.5,0.5]
mean=[0.5,0.5,0.5]
print(labels)
plt.imshow(img)
plt.show()

#进行PCA降维
pca=PCA(n_components=154)
X_reduced=pca.fit_transform(train_data)  #X_reduced 是一个矩阵
X_recovered=pca.inverse_transform(X_reduced)
print(X_reduced)
print(X_recovered)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44

运行结果

在这里插入图片描述

Randomized PCA

代码

数据:scikit-learn中的鸢尾花数据

import numpy as np
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import pandas as pd
from sklearn.decomposition import KernelPCA

#使用鸢尾花的数据集进行IPCA
iris=load_iris()
X=iris.data
Y=iris.target

rnd_pca=PCA(n_components=2,svd_solver="randomized")
X_reduced=rnd_pca.fit_transform(X)

print("Randomized PCA 为:",X_reduced)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

运行结果

只保留一部分数据

D:\pychram_Project\scikitproject\env\python.exe D:/pychram_Project/scikitproject/Randomed_PCA.py
Randomized PCA 为: [[-2.68412563  0.31939725]
 [-2.71414169 -0.17700123]
 [-2.88899057 -0.14494943]
 [-2.74534286 -0.31829898]
 [-2.72871654  0.32675451]
 [-2.28085963  0.74133045]
 [-2.82053775 -0.08946138]
 [-2.62614497  0.16338496]
 [-2.88638273 -0.57831175]
 [-2.6727558  -0.11377425]
 [-2.50694709  0.6450689 ]
 [-2.61275523  0.01472994]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

Incremental PCA

代码

数据:scikit-learn中的鸢尾花数据

import numpy as np
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import pandas as pd

from sklearn.decomposition import IncrementalPCA

#使用鸢尾花的数据集进行IPCA
iris=load_iris()
X=iris.data
Y=iris.target

n_batches=50
inc_pca=IncrementalPCA(n_components=2)

for X_batch in np.array_split(X,n_batches):
    inc_pca.partial_fit(X_batch)

X_reduced=inc_pca.transform(X)
print("降维后的数据:",X_reduced)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

运行结果

只截取一部分结果

D:\pychram_Project\scikitproject\env\python.exe D:/pychram_Project/scikitproject/Incremental_PCA.py
降维后的数据: [[-2.68394844  0.31969757]
 [-2.71387015 -0.18128039]
 [-2.889061   -0.14349322]
 [-2.74556086 -0.31599244]
 [-2.7287332   0.33013247]
 [-2.28078673  0.74600637]
 [-2.82094992 -0.08216709]
 [-2.62606964  0.16411573]
 [-2.88666725 -0.57640399]
 [-2.67256555 -0.11690702]
 [-2.50657841  0.64422391]
 [-2.61297559  0.01896878]
 [-2.78592941 -0.23865411]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

Kernel PCA

代码

数据:scikit-learn中的鸢尾花数据

import numpy as np
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import pandas as pd
from sklearn.decomposition import KernelPCA

#使用鸢尾花的数据集进行IPCA
iris=load_iris()
X=iris.data
Y=iris.target

rbf_pca=KernelPCA(n_components=2,kernel="rbf",gamma=0.04)
X_reduced=rbf_pca.fit_transform(X)

print("降维后的数据:",X_reduced)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

运行结果

只截取部分结果

D:\pychram_Project\scikitproject\env\python.exe D:/pychram_Project/scikitproject/kernel_PCA.py
降维后的数据: [[ 6.16694155e-01  9.06710794e-02]
 [ 6.15553246e-01  4.74462925e-02]
 [ 6.40676125e-01  8.31836417e-02]
 [ 6.17803788e-01  4.18640718e-02]
 [ 6.22801191e-01  9.99982328e-02]
 [ 5.46462972e-01  7.38888756e-02]
 [ 6.31275744e-01  7.84629063e-02]
 [ 6.07857958e-01  6.50243225e-02]
 [ 6.28364788e-01  4.68915553e-02]
 [ 6.10986496e-01  4.54071502e-02]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

LLE

代码

数据:scikit-learn中的鸢尾花数据

import numpy as np
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import pandas as pd
from sklearn.decomposition import KernelPCA
from sklearn.manifold import LocallyLinearEmbedding

#使用鸢尾花的数据集进行IPCA
iris=load_iris()
X=iris.data
Y=iris.target

lle=LocallyLinearEmbedding(n_components=2,n_neighbors=10)
X_reduced=lle.fit_transform(X)
print("LLE后的数据为:",X_reduced)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

运行结果

只截取部分结果

D:\pychram_Project\scikitproject\env\python.exe D:/pychram_Project/scikitproject/LLE.py
LLE后的数据为: 
[ 0.1        -0.00061889]
 [ 0.1        -0.03144328]
 [ 0.1         0.01235161]
 [ 0.1        -0.1352621 ]
 [ 0.1        -0.01827559]
 [ 0.1        -0.08228979]
 [ 0.1        -0.02115927]
 [ 0.1        -0.18097355]
 [ 0.1        -0.03001005]
 [ 0.1        -0.12610493]
 [ 0.1        -0.19048799]
 [ 0.1        -0.06305756]
 [ 0.1        -0.10067827]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/668414
推荐阅读
相关标签
  

闽ICP备14008679号