赞
踩
主成分分析(Principal Component Analysis,PCA)是一种线性降维方法,通过将数据投影到新的低维子空间,保留最大方差的特征,以实现维度降低和噪声削减。例如下面是一个使用PyTorch实现主成分分析(PCA)方法进行特征抽取的例子,本实例将使用PCA降低图像数据的维度,并使用降维后的数据训练一个简单的神经网络模型。
实例6-3:PyTorch使用特征选择方法制作神经网络模型(源码路径:daima\6\zhu.py)
实例文件zhu.py的具体实现代码如下所示。
- # 加载MNIST数据集
- transform = transforms.Compose([transforms.ToTensor()])
- train_loader = torch.utils.data.DataLoader(datasets.MNIST('./data', train=True, download=True, transform=transform), batch_size=64, shuffle=True)
-
- # 提取数据并进行PCA降维
- X = []
- y = []
- for images, labels in train_loader:
- images = images.view(images.size(0), -1) # 将图像展平为向量
- X.append(images)
- y.append(labels)
- X = torch.cat(X, dim=0).numpy()
- y = torch.cat(y, dim=0).numpy()
-
- num_components = 20 # 选择降维后的维度
- pca = PCA(n_components=num_components)
- X_pca = pca.fit_transform(X)
-
- # 划分数据集
- X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=0.2, random_state=42)
-
- # 定义简单的神经网络模型
- class SimpleModel(nn.Module):
- def __init__(self, input_dim, output_dim):
- super(SimpleModel, self).__init__()
- self.fc = nn.Linear(input_dim, output_dim)
-
- def forward(self, x):
- return self.fc(x)
-
- # 设置模型参数
- input_dim = num_components
- output_dim = 10 # 类别数
- learning_rate = 0.01
- num_epochs = 10
-
- # 初始化模型、损失函数和优化器
- model = SimpleModel(input_dim, output_dim)
- criterion = nn.CrossEntropyLoss()
- optimizer = optim.SGD(model.parameters(), lr=learning_rate)
-
- # 训练模型
- for epoch in range(num_epochs):
- inputs = torch.tensor(X_train, dtype=torch.float32)
- labels = torch.tensor(y_train, dtype=torch.long)
-
- optimizer.zero_grad()
- outputs = model(inputs)
- loss = criterion(outputs, labels)
- loss.backward()
- optimizer.step()
-
- if (epoch+1) % 1 == 0:
- print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')
-
- # 在测试集上评估模型性能
- with torch.no_grad():
- inputs = torch.tensor(X_test, dtype=torch.float32)
- labels = torch.tensor(y_test, dtype=torch.long)
- outputs = model(inputs)
- _, predicted = torch.max(outputs.data, 1)
- accuracy = (predicted == labels).sum().item() / labels.size(0)
- print(f'Accuracy on test set: {accuracy:.2f}')
在这个例子中,首先加载了MNIST数据集并进行了数据预处理。然后,将图像数据展平为向量,并使用PCA对数据进行降维。接下来,定义了一个简单的神经网络模型,使用降维后的数据进行训练。最后,在测试集上评估了模型的性能。执行后会输出:
- Epoch [1/10], Loss: 2.3977
- Epoch [2/10], Loss: 2.3872
- Epoch [3/10], Loss: 2.3768
- Epoch [4/10], Loss: 2.3665
- Epoch [5/10], Loss: 2.3563
- Epoch [6/10], Loss: 2.3461
- Epoch [7/10], Loss: 2.3360
- Epoch [8/10], Loss: 2.3260
- Epoch [9/10], Loss: 2.3160
- Epoch [10/10], Loss: 2.3061
- Accuracy on test set: 0.18
下面是一个使用TensorFlow使用主成分分析(PCA)方法进行特征抽取的例子,并保存处理后的模型。
实例6-4:Tensorflow使用PCA方法制作神经网络模型并保存(源码路径:daima\6\tzhu.py)
实例文件tzhu.py的具体实现代码如下所示。
- import tensorflow as tf
- from tensorflow.keras.datasets import mnist
- from tensorflow.keras.layers import Input, Dense
- from tensorflow.keras.models import Model
- from sklearn.decomposition import PCA
- from sklearn.model_selection import train_test_split
-
- # 加载MNIST数据集
- (X_train, y_train), (X_test, y_test) = mnist.load_data()
- X_train = X_train.reshape(-1, 28 * 28) / 255.0 # 归一化
- X_test = X_test.reshape(-1, 28 * 28) / 255.0
-
- # 使用PCA进行降维
- num_components = 20 # 选择降维后的维度
- pca = PCA(n_components=num_components)
- X_train_pca = pca.fit_transform(X_train)
- X_test_pca = pca.transform(X_test)
-
- # 划分数据集
- X_train_split, X_val_split, y_train_split, y_val_split = train_test_split(X_train_pca, y_train, test_size=0.1, random_state=42)
-
- # 定义神经网络模型
- input_layer = Input(shape=(num_components,))
- x = Dense(128, activation='relu')(input_layer)
- output_layer = Dense(10, activation='softmax')(x)
-
- model = Model(inputs=input_layer, outputs=output_layer)
-
- # 编译模型
- model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
-
- # 训练模型
- batch_size = 64
- epochs = 10
- history = model.fit(X_train_split, y_train_split, batch_size=batch_size, epochs=epochs, validation_data=(X_val_split, y_val_split))
-
- # 保存模型
- model.save('pca_model.h5')
- print("Model saved")
-
- # 在测试集上评估模型性能
- test_loss, test_accuracy = model.evaluate(X_test_pca, y_test, verbose=0)
- print(f'Test accuracy: {test_accuracy:.4f}')
-
- # 加载保存的模型
- loaded_model = tf.keras.models.load_model('pca_model.h5')
-
- # 在测试集上评估加载的模型性能
- loaded_test_loss, loaded_test_accuracy = loaded_model.evaluate(X_test_pca, y_test, verbose=0)
- print(f'Loaded model test accuracy: {loaded_test_accuracy:.4f}')
上述代码的实现流程如下:
执行后会输出:
- Epoch 1/10
- 844/844 [==============================] - 4s 4ms/step - loss: 0.4939 - accuracy: 0.8608 - val_loss: 0.2515 - val_accuracy: 0.9273
- Epoch 2/10
- 844/844 [==============================] - 3s 3ms/step - loss: 0.2107 - accuracy: 0.9376 - val_loss: 0.1775 - val_accuracy: 0.9498
- Epoch 3/10
- 844/844 [==============================] - 4s 5ms/step - loss: 0.1604 - accuracy: 0.9521 - val_loss: 0.1490 - val_accuracy: 0.9577
- Epoch 4/10
- 844/844 [==============================] - 5s 6ms/step - loss: 0.1363 - accuracy: 0.9592 - val_loss: 0.1332 - val_accuracy: 0.9612
- Epoch 5/10
- 844/844 [==============================] - 3s 4ms/step - loss: 0.1218 - accuracy: 0.9630 - val_loss: 0.1236 - val_accuracy: 0.9640
- Epoch 6/10
- 844/844 [==============================] - 3s 3ms/step - loss: 0.1115 - accuracy: 0.9654 - val_loss: 0.1166 - val_accuracy: 0.9638
- Epoch 7/10
- 844/844 [==============================] - 3s 4ms/step - loss: 0.1034 - accuracy: 0.9681 - val_loss: 0.1091 - val_accuracy: 0.9658
- Epoch 8/10
- 844/844 [==============================] - 3s 4ms/step - loss: 0.0978 - accuracy: 0.9697 - val_loss: 0.1104 - val_accuracy: 0.9653
- Epoch 9/10
- 844/844 [==============================] - 2s 3ms/step - loss: 0.0934 - accuracy: 0.9712 - val_loss: 0.1063 - val_accuracy: 0.9657
- Epoch 10/10
- 844/844 [==============================] - 2s 3ms/step - loss: 0.0890 - accuracy: 0.9727 - val_loss: 0.1034 - val_accuracy: 0.9670
- Model saved
- Test accuracy: 0.9671
- Loaded model test accuracy: 0.9671
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。