赞
踩
结束了入门教程。开始一些实例应用教程。
先记录一下一些基本概念:(方便别人看,以及以后自己看)
图像语义分割是 将图像中的每个像素点分配到一个语义类别。 与传统的图像分类任务不同,语义分割需要对图像中的每个像素进行精细的分类,而不是只识别图像中包含的物体类别。
FCN 模型就像一位超级精细的画家,它的目标是将一张彩色照片“上色”,让每个像素点都对应一个颜色,每种颜色代表一个类别(比如人、车、树)。就是说,FCN 模型就像一位超级精细的画家,它通过提取特征、缩小图像、恢复图像和跳跃连接等步骤,将一张彩色照片“上色”,让每个像素点都对应一个类别,从而实现图像的语义分割。
FCN 的工作流程:
代码部分:
- import numpy as np
- import mindspore as ms
- import mindspore.nn as nn
- import mindspore.train as train
-
- class PixelAccuracy(train.Metric):
- def __init__(self, num_class=21):
- super(PixelAccuracy, self).__init__()
- self.num_class = num_class
-
- def _generate_matrix(self, gt_image, pre_image):
- mask = (gt_image >= 0) & (gt_image < self.num_class)
- label = self.num_class * gt_image[mask].astype('int') + pre_image[mask]
- count = np.bincount(label, minlength=self.num_class**2)
- confusion_matrix = count.reshape(self.num_class, self.num_class)
- return confusion_matrix
-
- def clear(self):
- self.confusion_matrix = np.zeros((self.num_class,) * 2)
-
- def update(self, *inputs):
- y_pred = inputs[0].asnumpy().argmax(axis=1)
- y = inputs[1].asnumpy().reshape(4, 512, 512)
- self.confusion_matrix += self._generate_matrix(y, y_pred)
-
- def eval(self):
- pixel_accuracy = np.diag(self.confusion_matrix).sum() / self.confusion_matrix.sum()
- return pixel_accuracy
-
-
- class PixelAccuracyClass(train.Metric):
- def __init__(self, num_class=21):
- super(PixelAccuracyClass, self).__init__()
- self.num_class = num_class
-
- def _generate_matrix(self, gt_image, pre_image):
- mask = (gt_image >= 0) & (gt_image < self.num_class)
- label = self.num_class * gt_image[mask].astype('int') + pre_image[mask]
- count = np.bincount(label, minlength=self.num_class**2)
- confusion_matrix = count.reshape(self.num_class, self.num_class)
- return confusion_matrix
-
- def update(self, *inputs):
- y_pred = inputs[0].asnumpy().argmax(axis=1)
- y = inputs[1].asnumpy().reshape(4, 512, 512)
- self.confusion_matrix += self._generate_matrix(y, y_pred)
-
- def clear(self):
- self.confusion_matrix = np.zeros((self.num_class,) * 2)
-
- def eval(self):
- mean_pixel_accuracy = np.diag(self.confusion_matrix) / self.confusion_matrix.sum(axis=1)
- mean_pixel_accuracy = np.nanmean(mean_pixel_accuracy)
- return mean_pixel_accuracy
-
-
- class MeanIntersectionOverUnion(train.Metric):
- def __init__(self, num_class=21):
- super(MeanIntersectionOverUnion, self).__init__()
- self.num_class = num_class
-
- def _generate_matrix(self, gt_image, pre_image):
- mask = (gt_image >= 0) & (gt_image < self.num_class)
- label = self.num_class * gt_image[mask].astype('int') + pre_image[mask]
- count = np.bincount(label, minlength=self.num_class**2)
- confusion_matrix = count.reshape(self.num_class, self.num_class)
- return confusion_matrix
-
- def update(self, *inputs):
- y_pred = inputs[0].asnumpy().argmax(axis=1)
- y = inputs[1].asnumpy().reshape(4, 512, 512)
- self.confusion_matrix += self._generate_matrix(y, y_pred)
-
- def clear(self):
- self.confusion_matrix = np.zeros((self.num_class,) * 2)
-
- def eval(self):
- mean_iou = np.diag(self.confusion_matrix) / (
- np.sum(self.confusion_matrix, axis=1) + np.sum(self.confusion_matrix, axis=0) -
- np.diag(self.confusion_matrix))
- mean_iou = np.nanmean(mean_iou)
- return mean_iou
-
-
- class FrequencyWeightedIntersectionOverUnion(train.Metric):
- def __init__(self, num_class=21):
- super(FrequencyWeightedIntersectionOverUnion, self).__init__()
- self.num_class = num_class
-
- def _generate_matrix(self, gt_image, pre_image):
- mask = (gt_image >= 0) & (gt_image < self.num_class)
- label = self.num_class * gt_image[mask].astype('int') + pre_image[mask]
- count = np.bincount(label, minlength=self.num_class**2)
- confusion_matrix = count.reshape(self.num_class, self.num_class)
- return confusion_matrix
-
- def update(self, *inputs):
- y_pred = inputs[0].asnumpy().argmax(axis=1)
- y = inputs[1].asnumpy().reshape(4, 512, 512)
- self.confusion_matrix += self._generate_matrix(y, y_pred)
-
- def clear(self):
- self.confusion_matrix = np.zeros((self.num_class,) * 2)
-
- def eval(self):
- freq = np.sum(self.confusion_matrix, axis=1) / np.sum(self.confusion_matrix)
- iu = np.diag(self.confusion_matrix) / (
- np.sum(self.confusion_matrix, axis=1) + np.sum(self.confusion_matrix, axis=0) -
- np.diag(self.confusion_matrix))
-
- frequency_weighted_iou = (freq[freq > 0] * iu[freq > 0]).sum()
- return frequency_weighted_iou
- import mindspore
- from mindspore import Tensor
- import mindspore.nn as nn
- from mindspore.train import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor, Model
-
- device_target = "Ascend"
- mindspore.set_context(mode=mindspore.PYNATIVE_MODE, device_target=device_target)
-
- train_batch_size = 4
- num_classes = 21
- # 初始化模型结构
- net = FCN8s(n_class=21)
- # 导入vgg16预训练参数
- load_vgg16()
- # 计算学习率
- min_lr = 0.0005
- base_lr = 0.05
- train_epochs = 1
- iters_per_epoch = dataset.get_dataset_size()
- total_step = iters_per_epoch * train_epochs
-
- lr_scheduler = mindspore.nn.cosine_decay_lr(min_lr,
- base_lr,
- total_step,
- iters_per_epoch,
- decay_epoch=2)
- lr = Tensor(lr_scheduler[-1])
-
- # 定义损失函数
- loss = nn.CrossEntropyLoss(ignore_index=255)
- # 定义优化器
- optimizer = nn.Momentum(params=net.trainable_params(), learning_rate=lr, momentum=0.9, weight_decay=0.0001)
- # 定义loss_scale
- scale_factor = 4
- scale_window = 3000
- loss_scale_manager = ms.amp.DynamicLossScaleManager(scale_factor, scale_window)
- # 初始化模型
- if device_target == "Ascend":
- model = Model(net, loss_fn=loss, optimizer=optimizer, loss_scale_manager=loss_scale_manager, metrics={"pixel accuracy": PixelAccuracy(), "mean pixel accuracy": PixelAccuracyClass(), "mean IoU": MeanIntersectionOverUnion(), "frequency weighted IoU": FrequencyWeightedIntersectionOverUnion()})
- else:
- model = Model(net, loss_fn=loss, optimizer=optimizer, metrics={"pixel accuracy": PixelAccuracy(), "mean pixel accuracy": PixelAccuracyClass(), "mean IoU": MeanIntersectionOverUnion(), "frequency weighted IoU": FrequencyWeightedIntersectionOverUnion()})
-
- # 设置ckpt文件保存的参数
- time_callback = TimeMonitor(data_size=iters_per_epoch)
- loss_callback = LossMonitor()
- callbacks = [time_callback, loss_callback]
- save_steps = 330
- keep_checkpoint_max = 5
- config_ckpt = CheckpointConfig(save_checkpoint_steps=10,
- keep_checkpoint_max=keep_checkpoint_max)
- ckpt_callback = ModelCheckpoint(prefix="FCN8s",
- directory="./ckpt",
- config=config_ckpt)
- callbacks.append(ckpt_callback)
- model.train(train_epochs, dataset, callbacks=callbacks)
模型评估:
- IMAGE_MEAN = [103.53, 116.28, 123.675]
- IMAGE_STD = [57.375, 57.120, 58.395]
- DATA_FILE = "dataset/dataset_fcn8s/mindname.mindrecord"
-
- # 下载已训练好的权重文件
- url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/FCN8s.ckpt"
- download(url, "FCN8s.ckpt", replace=True)
- net = FCN8s(n_class=num_classes)
-
- ckpt_file = "FCN8s.ckpt"
- param_dict = load_checkpoint(ckpt_file)
- load_param_into_net(net, param_dict)
-
- if device_target == "Ascend":
- model = Model(net, loss_fn=loss, optimizer=optimizer, loss_scale_manager=loss_scale_manager, metrics={"pixel accuracy": PixelAccuracy(), "mean pixel accuracy": PixelAccuracyClass(), "mean IoU": MeanIntersectionOverUnion(), "frequency weighted IoU": FrequencyWeightedIntersectionOverUnion()})
- else:
- model = Model(net, loss_fn=loss, optimizer=optimizer, metrics={"pixel accuracy": PixelAccuracy(), "mean pixel accuracy": PixelAccuracyClass(), "mean IoU": MeanIntersectionOverUnion(), "frequency weighted IoU": FrequencyWeightedIntersectionOverUnion()})
-
- # 实例化Dataset
- dataset = SegDataset(image_mean=IMAGE_MEAN,
- image_std=IMAGE_STD,
- data_file=DATA_FILE,
- batch_size=train_batch_size,
- crop_size=crop_size,
- max_scale=max_scale,
- min_scale=min_scale,
- ignore_label=ignore_label,
- num_classes=num_classes,
- num_readers=2,
- num_parallel_calls=4)
- dataset_eval = dataset.get_dataset()
- model.eval(dataset_eval)
使用训练的网络对模型推理结果进行展示。
- import cv2
- import matplotlib.pyplot as plt
-
- net = FCN8s(n_class=num_classes)
- # 设置超参
- ckpt_file = "FCN8s.ckpt"
- param_dict = load_checkpoint(ckpt_file)
- load_param_into_net(net, param_dict)
- eval_batch_size = 4
- img_lst = []
- mask_lst = []
- res_lst = []
- # 推理效果展示(上方为输入图片,下方为推理效果图片)
- plt.figure(figsize=(8, 5))
- show_data = next(dataset_eval.create_dict_iterator())
- show_images = show_data["data"].asnumpy()
- mask_images = show_data["label"].reshape([4, 512, 512])
- show_images = np.clip(show_images, 0, 1)
- for i in range(eval_batch_size):
- img_lst.append(show_images[i])
- mask_lst.append(mask_images[i])
- res = net(show_data["data"]).asnumpy().argmax(axis=1)
- for i in range(eval_batch_size):
- plt.subplot(2, 4, i + 1)
- plt.imshow(img_lst[i].transpose(1, 2, 0))
- plt.axis("off")
- plt.subplots_adjust(wspace=0.05, hspace=0.02)
- plt.subplot(2, 4, i + 5)
- plt.imshow(res[i])
- plt.axis("off")
- plt.subplots_adjust(wspace=0.05, hspace=0.02)
- plt.show()
这部分内容的学习总结如下:
- 介绍了全卷积网络(FCN),它是一种用于图像语义分割的框架,是深度学习应用在图像语义分割的开山之作。
- 解释了语义分割的概念,即对图像中每个像素点进行分类,并展示了一些语义分割的实例。
- 描述了FCN的网络结构,包括卷积化、上采样和跳跃结构等技术。2
- 提供了数据处理、网络构建、损失函数和评价指标、模型训练和推理等方面的代码实现。
原理上:
- FCN通过将全连接层替换为全卷积层,使网络能够接受任意大小的输入图像,并输出与输入图像大小相同的分割结果。
- 网络中的卷积层用于提取图像的特征,池化层用于降低特征图的分辨率,上采样层用于恢复特征图的分辨率,跳跃结构用于将深层的全局信息与浅层的局部信息相结合。
- 在训练过程中,使用交叉熵损失函数来计算网络输出与真实标签之间的差异,并通过反向传播算法来更新网络的参数。
- 在推理过程中,将输入图像输入到训练好的FCN网络中,得到输出的分割结果。
代码包括:
1. 数据预处理:对输入图像进行标准化处理,使其具有相同的尺寸和数值范围。 - 数据加载:将PASCAL VOC2012数据集与SDB数据集进行混合,并使用MindSpore的Dataset类进行加载。 - 训练集可视化:运行代码观察载入的数据集图片。
2. 网络构建: 描述了FCN网络的流程,包括卷积、池化、反卷积等操作。 - 网络结构:使用MindSpore的nn模块构建FCN-8s网络,包括卷积层、池化层、反卷积层等。 - 导入预训练权重:导入VGG-16部分预训练权重,以提高模型的性能。
3. 损失函数和评价指标: - 损失函数:选择交叉熵损失函数来计算FCN网络输出与mask之间的交叉熵损失。 - 评价指标:自定义了PixelAccuracy、PixelAccuracyClass、MeanIntersectionOverUnion和FrequencyWeightedIntersectionOverUnion等评价指标,用于评估模型的性能。
4. 模型训练: - 导入预训练参数:实例化损失函数、优化器,使用Model接口编译网络,训练FCN-8s网络。 - 模型评估:使用训练好的模型对测试集进行评估,计算模型的准确率、召回率等指标。
5. 模型推理: - 模型推理:使用训练的网络对模型推理结果进行展示。 总结笔记: - FCN是一种用于图像分割的全卷积网络,通过将全连接层替换为全卷积层,实现了对任意大小输入图像的像素级预测。 - FCN的主要贡献包括提出使用全卷积层、可以接受任意大小的输入图像、更加高效等。 - FCN的网络结构包括卷积层、池化层、反卷积层等,通过跳跃结构将深层的全局信息与浅层的局部信息相结合。 - FCN的训练过程包括数据预处理、网络构建、损失函数和评价指标的选择、模型训练和模型评估等步骤。 - FCN的推理结果可以通过可视化展示,展示了模型对输入图像的分割效果。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。