当前位置:   article > 正文

TensorFlow读取自己数据集的几个小方法_tensorflow读取自己的数据集

tensorflow读取自己的数据集

1. mat -> ndarray
数据处理经常用到matlab,matlab中数据保存格式常为.mat,因此首先提供一份从mat转到ndarray的代码.

#读取.mat格式数据
#.mat 中包含trainFeatures矩阵
import tensorflow as tf
import os
import numpy as np
import scipy.io #for load mat
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'  #close the warning

# --------------------load data-----------------------------------------------------------
train = 'imageTrainData.mat'
trainData = scipy.io.loadmat(train)['trainFeatures'].ravel()#load data
trainData = np.reshape(trainData,[featurelNum ,trainNum ])#reshape to 2d array
trainData = np.transpose(trainData)#transpose
 
在训练过程中,可以直接使用整个数据集进行feed,训练:

for i in range(20000):
    if i % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict={
            x: trainData , y_: trainLabel, keep_prob: 1.0})
        print "setup_%d,_training_accuracy%g" % (i, train_accuracy)
        print "test_accuracy_%g" % accuracy.eval(feed_dict={
            x: testData, y_: testLabel, keep_prob: 1.0})
    train_step.run(feed_dict={x: trainData, y_: trainLabel, keep_prob: 0.5})
 
2. ndarray-> batch
在训练较复杂的模型中,为了防止过拟合,需要用到随机feed,就必须对数据进行分块,变成多个batch,word2vec中给出了一个的例子。

# Function to generate a training batch for the skip-gram model.
def generate_batch(batch_size, num_skips, skip_window):
  global data_index
  assert batch_size % num_skips == 0
  assert num_skips <= 2 * skip_window
  batch = np.ndarray(shape=(batch_size), dtype=np.int32)
  labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
  span = 2 * skip_window + 1  # [ skip_window target skip_window ]
  buffer = collections.deque(maxlen=span)
  for _ in range(span):
    buffer.append(data[data_index])
    data_index = (data_index + 1) % len(data)
  for i in range(batch_size // num_skips):
    target = skip_window  # target label at the center of the buffer
    targets_to_avoid = [skip_window]
    for j in range(num_skips):
      while target in targets_to_avoid:
        target = random.randint(0, span - 1)
      targets_to_avoid.append(target)
      batch[i * num_skips + j] = buffer[skip_window]
      labels[i * num_skips + j, 0] = buffer[target]
    buffer.append(data[data_index])
    data_index = (data_index + 1) % len(data)
  # Backtrack a little bit to avoid skipping words in the end of a batch
  data_index = (data_index + len(data) - span) % len(data)
  return batch, labels
#调用
batch, labels = generate_batch(batch_size=8, num_skips=2, skip_window=1)

#训练过程中即可调用batch
  for step in xrange(num_steps):
    batch_inputs, batch_labels = generate_batch(
        batch_size, num_skips, skip_window)
    feed_dict = {train_inputs: batch_inputs, train_labels: batch_labels}

    # We perform one update step by evaluating the optimizer op (including it
    # in the list of returned values for session.run()
    _, loss_val = session.run([optimizer, loss], feed_dict=feed_dict)
    average_loss += loss_val
 
3. txt -> ndarray
值得一提的是做文本分类时,可发现tf可以很方便的把压缩为zip格式的文本读取为字符串数组。

import zipfile
import numpy as np
import tensorflow as tf

# Read the data into a list of strings.
def read_data(filename):
  """Extract the first file enclosed in a zip file as a list of words."""
  with zipfile.ZipFile(filename) as f:
    data = tf.compat.as_str(f.read(f.namelist()[0])).split()
  return data 
以上三种是自己使用的,tf本身提供了csv读取,多输入读取,批处理等,csv读取请参考:

https://www.tensorflow.org/versions/master/tutorials/estimators/index.html#loading-abalone-csv-data-into-tensorflow-datasets read-csv 

 

https://www.tensorflow.org/guide/estimators#loading-abalone-csv-data-into-tensorflow-datasets

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Cpp五条/article/detail/123827
推荐阅读
相关标签
  

闽ICP备14008679号