赞
踩
1. mat -> ndarray
数据处理经常用到matlab,matlab中数据保存格式常为.mat,因此首先提供一份从mat转到ndarray的代码.
#读取.mat格式数据
#.mat 中包含trainFeatures矩阵
import tensorflow as tf
import os
import numpy as np
import scipy.io #for load mat
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' #close the warning
# --------------------load data-----------------------------------------------------------
train = 'imageTrainData.mat'
trainData = scipy.io.loadmat(train)['trainFeatures'].ravel()#load data
trainData = np.reshape(trainData,[featurelNum ,trainNum ])#reshape to 2d array
trainData = np.transpose(trainData)#transpose
在训练过程中,可以直接使用整个数据集进行feed,训练:
for i in range(20000):
if i % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x: trainData , y_: trainLabel, keep_prob: 1.0})
print "setup_%d,_training_accuracy%g" % (i, train_accuracy)
print "test_accuracy_%g" % accuracy.eval(feed_dict={
x: testData, y_: testLabel, keep_prob: 1.0})
train_step.run(feed_dict={x: trainData, y_: trainLabel, keep_prob: 0.5})
2. ndarray-> batch
在训练较复杂的模型中,为了防止过拟合,需要用到随机feed,就必须对数据进行分块,变成多个batch,word2vec中给出了一个的例子。
# Function to generate a training batch for the skip-gram model.
def generate_batch(batch_size, num_skips, skip_window):
global data_index
assert batch_size % num_skips == 0
assert num_skips <= 2 * skip_window
batch = np.ndarray(shape=(batch_size), dtype=np.int32)
labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
span = 2 * skip_window + 1 # [ skip_window target skip_window ]
buffer = collections.deque(maxlen=span)
for _ in range(span):
buffer.append(data[data_index])
data_index = (data_index + 1) % len(data)
for i in range(batch_size // num_skips):
target = skip_window # target label at the center of the buffer
targets_to_avoid = [skip_window]
for j in range(num_skips):
while target in targets_to_avoid:
target = random.randint(0, span - 1)
targets_to_avoid.append(target)
batch[i * num_skips + j] = buffer[skip_window]
labels[i * num_skips + j, 0] = buffer[target]
buffer.append(data[data_index])
data_index = (data_index + 1) % len(data)
# Backtrack a little bit to avoid skipping words in the end of a batch
data_index = (data_index + len(data) - span) % len(data)
return batch, labels
#调用
batch, labels = generate_batch(batch_size=8, num_skips=2, skip_window=1)
#训练过程中即可调用batch
for step in xrange(num_steps):
batch_inputs, batch_labels = generate_batch(
batch_size, num_skips, skip_window)
feed_dict = {train_inputs: batch_inputs, train_labels: batch_labels}
# We perform one update step by evaluating the optimizer op (including it
# in the list of returned values for session.run()
_, loss_val = session.run([optimizer, loss], feed_dict=feed_dict)
average_loss += loss_val
3. txt -> ndarray
值得一提的是做文本分类时,可发现tf可以很方便的把压缩为zip格式的文本读取为字符串数组。
import zipfile
import numpy as np
import tensorflow as tf
# Read the data into a list of strings.
def read_data(filename):
"""Extract the first file enclosed in a zip file as a list of words."""
with zipfile.ZipFile(filename) as f:
data = tf.compat.as_str(f.read(f.namelist()[0])).split()
return data
以上三种是自己使用的,tf本身提供了csv读取,多输入读取,批处理等,csv读取请参考:
https://www.tensorflow.org/versions/master/tutorials/estimators/index.html#loading-abalone-csv-data-into-tensorflow-datasets read-csv
https://www.tensorflow.org/guide/estimators#loading-abalone-csv-data-into-tensorflow-datasets
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。