赞
踩
张量、变量共同点:具有形状、类型、值等3个属性。
不同点:变量可被TensorFlow的自动求导机制求导,常被用于机器学习模型的参数。
tensorflow定义的数据格式,一种二进制文件格式,用于保存和读取图像和文本数据。tfrecord文件包含了tf.train.Example protobuf数据。It is designed for use with TensorFlow and is used throughout the higher-level APIS such as TFX.
tf.train.Example的数据结构是一个字典称为Features
,其内部结构可从proto文件看出:
message Example {
Features features = 1;
};
message Features{
map<string, Feature> featrue = 1;
};
message Feature{
oneof kind{
BytesList bytes_list = 1;
FloatList float_list = 2;
Int64List int64_list = 3;
}
};
数据类型Feature有3个,Int64、Bytes、Float;Int64存储bool、Enum、uint32、int32、int64、uint64,Bytes存储字符串、二进制,Float存储float(float32)和double(float64)。
文件格式即把数据参考字典结构做二进制数据的protobuf序列化,称为string。
def serialize_example(f1, f2, f3, f4):
fts = {
"feature0": _int64_feature(f1),
"feature1": _int64_feature(f2),
"feature2": _bytes_feature(f3),
"feature3": _float_feature(f4),
}
m = tf.train.Example(features=tf.train.Features(feature=fts))
return m.SerializeToString()
ps = serialize_example(3, True, b"goal", 0.999)
ex_proto = tf.train.Example.FromString(ps)
tf.train.Feature
是被tf.train.Example
兼容的。
import tensorflow as tf
def _bytes_feature(x):
if isinstance(x, type(tf.constant(0))):
x = x.numpy()
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[x]))
# Write the `tf.train.Example` observations to the file.
with tf.io.TFRecordWriter(filename) as writer:
for i in range(n_observations):
example = serialize_example(feature0[i], feature1[i], feature2[i], feature3[i])
writer.write(example)
fn = "./Waymo.tfrecord" rd = tf.data.TFRecordDataset(fn) # 数据格式 feature_description = { 'feature0': tf.io.FixedLenFeature([], tf.int64, default_value=0), 'feature1': tf.io.FixedLenFeature([], tf.int64, default_value=0), 'feature2': tf.io.FixedLenFeature([], tf.string, default_value=''), 'feature3': tf.io.FixedLenFeature([], tf.float32, default_value=0.0), } def _parse_function(example_proto): # Parse the input `tf.train.Example` proto using the dictionary above. return tf.io.parse_single_example(example_proto, feature_description) parsed_dataset = raw_dataset.map(_parse_function) for parsed_record in parsed_dataset.take(10): print(repr(parsed_record))
采用tfrecord的数据协议,Dataset结构需参考
https://github.com/waymo-research/waymo-open-dataset/blob/master/waymo_open_dataset/dataset.proto
使用Python库waymo-open-dataset
#与tensorflow版本对应,如tf为2.3.0
pip3 install waymo-open-dataset-tf-2-3-0 --user
fn = [ "/data/Waymo_training_segment-10023947602400723454_1120_000_1140_000_with_camera_labels.tfrecord" ] dataset = tf.data.TFRecordDataset(fn) for data in dataset.take(1000): frame = open_dataset.Frame() frame.ParseFromString(bytearray(data.numpy())) # plt.figure(figsize=(25, 20)) # for index, image in enumerate(frame.images): # show_camera_image(image, frame.camera_labels, [3, 3, index+1]) # plt.show() ts = frame.timestamp_micros st_img = frame.images[0] for labels in frame.camera_labels: if labels.name == st_img.name: for label in labels.labels: x = int(label.box.center_x - 0.5 * label.box.length) y = int(label.box.center_y - 0.5 * label.box.width) width = int(label.box.length) height = int(label.box.width)
重复造轮子:用tf.io实现读取数据集。
https://stackoverflow.com/questions/61166864/tensorflow-python-framework-ops-eagertensor-object-has-no-attribute-in-graph
Waymo Open Dataset文件解析格式,如何确定字典结构
raw_image_dataset = tf.data.TFRecordDataset('images.tfrecords') # Create a dictionary describing the features. image_feature_description = { 'height': tf.io.FixedLenFeature([], tf.int64), 'width': tf.io.FixedLenFeature([], tf.int64), 'depth': tf.io.FixedLenFeature([], tf.int64), 'label': tf.io.FixedLenFeature([], tf.int64), 'image_raw': tf.io.FixedLenFeature([], tf.string), } def _parse_image_function(example_proto): # Parse the input tf.train.Example proto using the dictionary above. return tf.io.parse_single_example(example_proto, image_feature_description) parsed_image_dataset = raw_image_dataset.map(_parse_image_function)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。