赞
踩
【什么是sequence?】
这是一个sequence(一串的历史数据),时间序列预测是由历史数据(蓝色)来推测现在(黑色)。比如假设[ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
推测20
,那么[10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
推测30
。
这是非紧挨的预测,用昨天的数据推测后天。下面weather forecast的例子:由前720个数据,预测720+72处的标签。sequence长度在采样频率为1时就是720。
要生成时间序列数据集,就是由每个data
生成sequence,每个sequence对应一个target
标签(要预测的值)。
tf.keras.preprocessing.timeseries_dataset_from_array(
data,
targets,
sequence_length,
sequence_stride=1,
sampling_rate=1,
batch_size=128,
shuffle=False,
seed=None,
start_index=None,
end_index=None,
)
data
: 表示x数据,里面的每个叫做一个timestep。targets
: 表示y标签。如果不处理标签只处理数据,传入targets=None
。sequence_length
: 一个输出序列sequence的长度,即有多少个timestep。sequence_stride
: 每个sequence的开头相隔几个timestep。For stride s, output samples would start at index data[i], data[i + s], data[i + 2 * s], etc.sampling_rate
: 一个sequence内对timestep的采样频率。For rate r, timesteps data[i], data[i + r], … data[i + sequence_length] are used for create a sample sequence.batch_size
: 因为返回是tf.data.Dataset
,所以要设定分批。data = np.array([i for i in range(100)]) # [0,1,...,99] timeseries = keras.preprocessing.timeseries_dataset_from_array( data, targets=None, sequence_length=10, sampling_rate=2, sequence_stride=10 ) print(list(timeseries)) ''' [<tf.Tensor: shape=(9, 10), dtype=int32, numpy= array([[ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18], [10, 12, 14, 16, 18, 20, 22, 24, 26, 28], [20, 22, 24, 26, 28, 30, 32, 34, 36, 38], [30, 32, 34, 36, 38, 40, 42, 44, 46, 48], [40, 42, 44, 46, 48, 50, 52, 54, 56, 58], [50, 52, 54, 56, 58, 60, 62, 64, 66, 68], [60, 62, 64, 66, 68, 70, 72, 74, 76, 78], [70, 72, 74, 76, 78, 80, 82, 84, 86, 88], [80, 82, 84, 86, 88, 90, 92, 94, 96, 98]])>] '''
注意:[0,1,...,99]
的99
表示滑动窗口的尾截至到99,而不是滑动窗口的头截至到99。
所以data和targets分开处理是:
import numpy as np import pandas as pd import tensorflow as tf from tensorflow import keras from tensorflow.keras.layers.experimental import preprocessing data = np.array([i for i in range(20)]) # [0,1,...,19] targets = np.array([i for i in range(11)]) # [0,1,...,10] data_timeseries = keras.preprocessing.timeseries_dataset_from_array( data=data, targets=None, sequence_length=10 ) targets_timeseries = keras.preprocessing.timeseries_dataset_from_array( data=targets, targets=None, sequence_length=1 ) print(list(data_timeseries)) print(list(targets_timeseries)) ''' [<tf.Tensor: shape=(11, 10), dtype=int32, numpy= array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], [ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], [ 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], [ 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [ 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], [ 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], [ 9, 10, 11, 12, 13, 14, 15, 16, 17, 18], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])>] [<tf.Tensor: shape=(11, 1), dtype=int32, numpy= array([[ 0], [ 1], [ 2], [ 3], [ 4], [ 5], [ 6], [ 7], [ 8], [ 9], [10]])>] '''
需要填充targets
import numpy as np import pandas as pd import tensorflow as tf from tensorflow import keras from tensorflow.keras.layers.experimental import preprocessing data = np.array([i for i in range(20)]) # [0,1,...,19] targets = np.array([i for i in range(11)]) targets_app = np.zeros(data.size - targets.size, dtype=int) targets = np.append(targets, targets_app).reshape(data.shape) print(targets) # [ 0 1 2 3 4 5 6 7 8 9 10 0 0 0 0 0 0 0 0 0] timeseries = keras.preprocessing.timeseries_dataset_from_array( data, targets=targets, sequence_length=10 ) print(list(timeseries)) ''' [(<tf.Tensor: shape=(11, 10), dtype=int32, numpy= array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], [ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], [ 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], [ 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [ 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], [ 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], [ 9, 10, 11, 12, 13, 14, 15, 16, 17, 18], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])>, <tf.Tensor: shape=(11,), dtype=float64, numpy=array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])>)] '''
有几点注意:
targets = np.array([i for i in range(20)])
中必须是20)。即使targets不需要那么多(结果是[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
,却传入了[0, ..., 19]
)input_shape = (sequence_length, len(selected_features_index))
inputs = keras.layers.Input(shape=input_shape)
lstm_out = keras.layers.LSTM(32)(inputs)
outputs = keras.layers.Dense(1)(lstm_out)
model = keras.Model(inputs=inputs, outputs=outputs)
每个输入的数据x是一个sequence,一个sequence里有sequence_length
个timestep,每个timestep包含len(selected_features_index)
个特征。
翻译一下就是,每个输入的数据x是一串各时间的天气数据,有sequence_length
个,每个天气数据包含len(selected_features_index)
个统计值。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。