当前位置:   article > 正文

tensorflow2.x自建数据集训练_用tensorflow2.x训练自己的分类数据集

用tensorflow2.x训练自己的分类数据集

1.导入库

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import datasets, layers, optimizers, Sequential, metrics
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
import os
import pathlib
import random
import matplotlib.pyplot as plt
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

2.读取文件
数据是人和马的图片集,存储在humanandhorse父目录下,子目录下分别是二者的图片。
在这里插入图片描述

data_root = pathlib.Path('E:/tensorflowdataset/humanandhorse')
print(data_root)
for item in data_root.iterdir():
  print(item)
  • 1
  • 2
  • 3
  • 4

E:\tensorflowdataset\humanandhorse
E:\tensorflowdataset\humanandhorse\horses
E:\tensorflowdataset\humanandhorse\humans

用glob方法读取数据存储到list数组,并统计共有多少张图片

all_image_paths = list(data_root.glob('*/*'))
print(all_image_paths[:10])
all_image_paths = [str(path) for path in all_image_paths]
print(all_image_paths[:10])
image_count = len(all_image_paths)
print(image_count)    
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

[WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-000.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-105.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-122.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-127.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-170.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-204.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-224.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-241.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-264.png’), WindowsPath(‘E:/tensorflowdataset/humanandhorse/horses/horse1-276.png’)]
[‘E:\tensorflowdataset\humanandhorse\horses\horse1-000.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-105.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-122.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-127.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-170.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-204.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-224.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-241.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-264.png’, ‘E:\tensorflowdataset\humanandhorse\horses\horse1-276.png’]
256
3.展示部分图片
label = image_path.split(’\’)[-2]将数据的目录的倒数第二级目录作为标签

import matplotlib.pyplot as plt
from PIL import Image

plt.figure('image show')
for n in range(3):
	image_path = random.choice(all_image_paths)
	label = image_path.split('\\')[-2]
	image = Image.open(image_path)
	print(image.size)
 
	plt.subplot(1, 3, n+1)
	plt.title(label)
	plt.imshow(image)
plt.show()

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

在这里插入图片描述
4.设置label
先获取有哪些标签

label_names = sorted(item.name for item in data_root.glob('*/') if item.is_dir())
print(label_names) 
  • 1
  • 2

[‘horses’, ‘humans’]

用将标签按顺序排好标记

label_to_index = dict((name, index) for index, name in enumerate(label_names))
print(label_to_index)
  • 1
  • 2

{‘horses’: 0, ‘humans’: 1}
确定每张图片的标签

all_image_labels = [label_to_index[pathlib.Path(path).parent.name]
                    for path in all_image_paths]

print(all_image_labels)
  • 1
  • 2
  • 3
  • 4

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
5.预处理数据
通过tf.io.read_file将图片路径名转化为图片张量,并将每个像素值转换为[0 - 1]的范围(方便训练)


def preprocess_image(img_raw):
	img_tensor = tf.image.decode_jpeg(contents=img_raw, channels=3) # can be used for plt.imshow(img_tensor)
	img_final = tf.image.resize(images=img_tensor, size=[300, 300])
	img_final /= 255.0 # normalize to [0,1] range
	return img_final
 
def load_and_preprocess_image(path):
	img_raw = tf.io.read_file(path) # can't be used for plt.imshow(img_raw)
	return preprocess_image(img_raw)
 
def load_and_preprocess_from_path_label(path, label):
	return load_and_preprocess_image(path), label
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

6.构建dataset
将图片和标签一 一打包。
tf.data.Dataset.from_tensor_slices返回的ds具有很多实用的方法用来操作数据集,例如:shuffle、batch、repeat等,方便后来加载进模型进行训练。

ds = tf.data.Dataset.from_tensor_slices((all_image_paths, all_image_labels))
for item_x, item_y in ds:
    print(item_x.numpy(), item_y.numpy())
  • 1
  • 2
  • 3

b’E:\tensorflowdataset\humanandhorse\horses\horse1-000.png’ 0
b’E:\tensorflowdataset\humanandhorse\horses\horse1-105.png’ 0
b’E:\tensorflowdataset\humanandhorse\horses\horse1-122.png’ 0
b’E:\tensorflowdataset\humanandhorse\horses\horse1-127.png’ 0
b’E:\tensorflowdataset\humanandhorse\horses\horse1-170.png’ 0
b’E:\tensorflowdataset\humanandhorse\horses\horse1-204.png’ 0

在调用模型训练方法model.fit()时,其参数要求为``model.fit(x,y,batch_size,epochs)`,

若参数x被指定为Dataset对象,则参数y和batch_size不应该被填写,此时要求Dataset中储存的元素为批数据(),其中每一批的元素要求为(特征,标签)元组。

故我们更期望Dataset中储存(特征,标签)结构的数据。此时就可以灵活的使用tf.data.Dataset.from_tensors()与tf.data.Dataset.from_tensor_slices()方法了,如果内存中的是”特征-标签“对,则使用tf.data.Dataset.from_tensors()加载,内存中储存的是(多批特征向量,多批标签)则使用tf.data.Dataset.from_tensor_slices()加载

image_label_ds = ds.map(load_and_preprocess_from_path_label)
image_label_ds=image_label_ds.batch(1)
  • 1
  • 2

这里的image_label_ds是mapdataset,1*300*300*3的向量和标签组成的数据对。其中第一个数据如下:
[[[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]

[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]]] [0]
batch() 方法将原先的数据集进行分批处理,在可迭代的元素的第1维度增加1维形成批。
如果不进行batch,下面训练时会报错: expected conv2d_10_input to have 4 dimensions, but got array with shape (300,300,3),意思是维度不匹配
7.构建模型并训练

model = tf.keras.models.Sequential([
    # Note the input shape is the desired size of the image 300x300 with 3 bytes color
    # This is the first convolution
    tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(300, 300, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    # The second convolution
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The third convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The fourth convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The fifth convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # Flatten the results to feed into a DNN
    tf.keras.layers.Flatten(),
    # 512 neuron hidden layer
    tf.keras.layers.Dense(512, activation='relu'),
    # Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('horses') and 1 for the other ('humans')
    tf.keras.layers.Dense(1, activation='sigmoid')
])
from tensorflow.keras.optimizers import RMSprop
model.compile(loss='binary_crossentropy',
              optimizer=RMSprop(lr=0.001),
              metrics=['acc'])
history = model.fit(
       image_label_ds,
     steps_per_epoch=8,  
      epochs=15,
    )
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/123776
推荐阅读
相关标签
  

闽ICP备14008679号