赞
踩
今天完成了昨天定的基础任务,但没有完成全部的进阶任务。明天的基础任务是完成自然语言这一章,由于这不是我将来的重点计划,所以不会像图像那样学的这么扎实。进阶任务是学完神经网络的风格迁移。加油吧少年!
今天主要练习一下时间序列。建议在kaggle上下载数据集,速度快很多:[耶拿气候] - LSTM |卡格尔 (kaggle.com)
用pandas处理图表数据,非常方便:
import os
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
fname=os.path.join('jena_climate_2009_2016.csv')
data=pd.read_csv(fname,header=[0],index_col=[0])
# 读取数据
temperature=np.array(data.iloc[:,1]) # 温度
raw_data=np.array(data) # 所有特征
plt.plot(range(len(temperature)),temperature)
plt.show()
所有的温度特征,具有明显的周期性,可以看出来有8年
plt.plot(range(1440),temperature[:1440])
plt.show()
前十天的温度,共1440个数据点(每十分钟采集一次数据)后四天有明显的周期性
划分数据集:50%训练,25%验证,25%测试。按时间顺序一次划分,因为是用过去数据预测未来,而不能打乱顺序,否则有信息泄漏的风险。
num_train_samples=int(0.5*len(raw_data))
num_val_samples=int(0.25*len(raw_data))
num_test_samples=int(raw_data)-num_train_samples-num_val_samples
进行数据的标准化:
mean=raw_data[:num_train_samples].mean(axis=0)
raw_data-=mean
std=raw_data[:num_train_samples].std(axis=0)
raw_data/=std
# 数据规范化
在时间序列里面,数据集中的样本是高度冗余的,比如样本N和N+1,只有一行不一样。如果显示地保存将浪费很多资源(我以前就是这么干的…)。相反,我们可以实时生成样本,仅保存最初的数组raw_data(输入)和temperature(输出),下面是一个使用timeseries_dataset_from_array的例子
import numpy as np
import tensorflow.keras as keras
int_sequence=np.arange(10) # 生成一个0-9的有序整数数组
dummy_dataset=keras.utils.timeseries_dataset_from_array(data=int_sequence[:-3],targets=int_sequence[3:],sequence_length=3,batch_size=2) # 从0-6中抽样,用[n,n+2]的数据预测data[n+3]
for inputs,targets in dummy_dataset:
for i in range(inputs.shape[0]):
print([int(x) for x in inputs[i]],int (targets[i]))
[0, 1, 2] 3
[1, 2, 3] 4
[2, 3, 4] 5
[3, 4, 5] 6
[4, 5, 6] 7
有了上面的例子后,就大概知道怎么用timeseries_dataset_from_array了:
sampling_rate=6 # 每6个点采样一次 sequence_length=120 # 以120h为一个输入 delay=sampling_rate*(sequence_length+24-1) # 预测24小时后的温度 # 第0个输入对应第858个温度,即6*(120+24-1),要-1是因为是从0开始计数的 # 关于delay怎么算:第0个输入对应第n个标签,delay就取n batch_size=256 train_dataset=tf_keras.utils.timeseries_dataset_from_array( data=raw_data[:-delay], targets=temperature[delay:], sampling_rate=sampling_rate, sequence_length=sequence_length, shuffle=True, batch_size=batch_size, start_index=0, end_index=num_train_samples )
验证集和测试集类似,注意start_index和end_index就行:
val_dataset=tf_keras.utils.timeseries_dataset_from_array( data=raw_data[:-delay], targets=temperature[delay:], sampling_rate=sampling_rate, sequence_length=sequence_length, shuffle=True, batch_size=batch_size, start_index=num_train_samples, end_index=num_train_samples+num_val_samples ) train_dataset=tf_keras.utils.timeseries_dataset_from_array( data=raw_data[:-delay], targets=temperature[delay:], sampling_rate=sampling_rate, sequence_length=sequence_length, shuffle=True, batch_size=batch_size, start_index=num_train_samples+num_val_samples )
测试一下:
for samples,targets in train_dataset:
print('samples shape:',samples.shape)
print('targets shape:',targets.shape)
break
samples shape: (256, 120, 14)
targets shape: (256,)
这个工具搭建数据集确实方便,我当时处理数据也学了一两天才处理好的,有了这个工具就能很快的处理好。但是内存足够的时候还是不要把它当做模型的输入,因为这会降低训练的时间,测试过了,本来只要1s一个epoch的,由于输入数据还要临时准备,模型的运行时间增加到4s-5s,这浪费了挺多时间的。可以用这个工具构建数据集,然后把数据全部取出来,放数组里,这样既方便,又节省时间。
下面计算基于常识的基准MAE,我们假设24小时后的温度和现在的温度相同
def evaluate_naive_method(dataset):
total_abs_err=0.
samples_seen=0
for samples,targets in dataset:
preds=samples[:,-1,1] * std[1] + mean[1]
# 每个样本的预测值是最后时刻的温度,-1表示最后一个时刻,1表示是温度
# 要*std和+mean,做逆标准化,还原数据
total_abs_err+=np.sum(np.abs(preds-targets)) # 总的绝对误差
samples_seen+=samples.shape[0] # 总的样本数
return total_abs_err/samples_seen # 返回平均绝对误差
print(f'Train MAE: {evaluate_naive_method(train_dataset):.4f}')
print(f'Validation MAE: {evaluate_naive_method(val_dataset):.4f}')
print(f'Test MAE: {evaluate_naive_method(test_dataset):.4f}')
顺便把MSE也算一下:
def evaluate_mse(dataset):
total_abs_err=0.
samples_seen=0
for samples,targets in dataset:
preds=samples[:,-1,1] * std[1] + mean[1]
# 每个样本的预测值是最后时刻的温度,-1表示最后一个时刻,1表示是温度
# 要*std和+mean,做逆标准化,还原数据
total_abs_err+=np.sum(np.square(preds-targets)) # 总的绝对误差
samples_seen+=samples.shape[0] # 总的样本数
return total_abs_err/samples_seen # 返回平均绝对误差
print(f'Train MSE: {evaluate_mse(train_dataset) :.4f}')
print(f'Validation MSE: {evaluate_mse(val_dataset) :.4f}')
print(f'Test MSE: {evaluate_mse(test_dataset) :.4f}')
Train MAE: 2.6912
Validation MAE: 2.4425
Test MAE: 2.6207
Train MSE: 12.1946
Validation MSE: 10.1015
Test MSE: 11.7235
我们的模型至少要超过这个基准,否则模型是没有效果的。
先从简单的模型入手,构架一个全连接层:
inputs=Input(shape=(sequence_length,raw_data.shape[-1])) x=Flatten()(inputs) x=Dense(2048,activation='relu')(x) x=Dropout(0.5)(x) x=Dense(16,activation='relu')(x) outputs=Dense(1)(x) # 没有激活函数 model=keras.Model(inputs,outputs) callbacks=[tf_keras.callbacks.EarlyStopping(patience=20,restore_best_weights=False), tf_keras.callbacks.TensorBoard('log2')] model.compile(loss='mse',metrics=['mae']) history=model.fit(train_dataset,epochs=10,validation_data=val_dataset,callbacks=callbacks) print(F'Test MAE: {model.evaluate(test_dataset)[1]:.4f}')
这个全连接层是失败的,没有超越基准,在验证集接近了,但是在测试集上差了很多。
Epoch 1/10 819/819 [==============================] - 6s 5ms/step - loss: 14.1056 - mae: 2.8096 - val_loss: 11.8406 - val_mae: 2.6959 Epoch 2/10 819/819 [==============================] - 5s 6ms/step - loss: 8.0565 - mae: 2.2023 - val_loss: 11.1052 - val_mae: 2.6097 Epoch 3/10 819/819 [==============================] - 5s 6ms/step - loss: 6.5892 - mae: 1.9864 - val_loss: 16.3483 - val_mae: 3.2352 Epoch 4/10 819/819 [==============================] - 5s 6ms/step - loss: 5.7363 - mae: 1.8514 - val_loss: 12.4075 - val_mae: 2.7540 Epoch 5/10 819/819 [==============================] - 5s 6ms/step - loss: 5.0634 - mae: 1.7385 - val_loss: 12.2462 - val_mae: 2.7425 Epoch 6/10 819/819 [==============================] - 5s 6ms/step - loss: 4.5455 - mae: 1.6456 - val_loss: 12.4611 - val_mae: 2.7509 Epoch 7/10 819/819 [==============================] - 5s 6ms/step - loss: 4.2195 - mae: 1.5850 - val_loss: 12.6044 - val_mae: 2.7832 Epoch 8/10 819/819 [==============================] - 5s 6ms/step - loss: 3.9093 - mae: 1.5246 - val_loss: 11.5965 - val_mae: 2.6590 Epoch 9/10 819/819 [==============================] - 5s 6ms/step - loss: 3.6119 - mae: 1.4667 - val_loss: 13.4965 - val_mae: 2.8742 Epoch 10/10 819/819 [==============================] - 5s 6ms/step - loss: 3.4307 - mae: 1.4282 - val_loss: 12.2868 - val_mae: 2.7464 405/405 [==============================] - 1s 3ms/step - loss: 33464.5742 - mae: 17.3457 Test MAE: 17.3457 进程已结束,退出代码0
书上接着尝试用了1D卷积神经网络,他的搭建方法如下,实际上他的这种搭建方法是有问题的,效果低于基准:
inputs=keras.Input(shape=(sequence_length,raw_data.shape[-1])) x=Conv1D(8,24,activation='relu')(inputs) x=MaxPooling1D()(x) x=Conv1D(8,12,activation='relu')(x) x=MaxPooling1D()(x) x=Conv1D(8,6,activation='relu')(x) x=GlobalAvgPool1D()(x) outputs=Dense(1)(x) model=keras.Model(inputs,outputs) callbacks=[tf_keras.callbacks.EarlyStopping(patience=20,restore_best_weights=False), tf_keras.callbacks.TensorBoard('log3')] model.compile(loss='mse',metrics=['mae']) history=model.fit(train_dataset,epochs=15,validation_data=val_dataset,callbacks=callbacks) print(F'Test MAE: {model.evaluate(test_dataset)[1]:.4f}')
效果如下:
Epoch 1/15 819/819 [==============================] - 9s 7ms/step - loss: 22.3153 - mae: 3.6854 - val_loss: 16.7212 - val_mae: 3.2195 Epoch 2/15 819/819 [==============================] - 6s 7ms/step - loss: 15.7827 - mae: 3.1373 - val_loss: 15.7907 - val_mae: 3.1145 Epoch 3/15 819/819 [==============================] - 5s 7ms/step - loss: 14.1812 - mae: 2.9732 - val_loss: 17.8576 - val_mae: 3.3215 Epoch 4/15 819/819 [==============================] - 6s 7ms/step - loss: 13.1539 - mae: 2.8636 - val_loss: 17.6693 - val_mae: 3.3219 Epoch 5/15 819/819 [==============================] - 6s 7ms/step - loss: 12.4092 - mae: 2.7829 - val_loss: 14.1953 - val_mae: 2.9481 Epoch 6/15 819/819 [==============================] - 6s 7ms/step - loss: 11.8168 - mae: 2.7178 - val_loss: 15.2743 - val_mae: 3.0338 Epoch 7/15 819/819 [==============================] - 6s 7ms/step - loss: 11.3287 - mae: 2.6560 - val_loss: 14.8101 - val_mae: 2.9883 Epoch 8/15 819/819 [==============================] - 6s 7ms/step - loss: 10.9874 - mae: 2.6156 - val_loss: 15.7618 - val_mae: 3.0926 Epoch 9/15 819/819 [==============================] - 6s 7ms/step - loss: 10.6877 - mae: 2.5811 - val_loss: 16.2249 - val_mae: 3.1194 Epoch 10/15 819/819 [==============================] - 5s 7ms/step - loss: 10.4238 - mae: 2.5487 - val_loss: 15.9130 - val_mae: 3.1177 Epoch 11/15 819/819 [==============================] - 6s 7ms/step - loss: 10.1893 - mae: 2.5197 - val_loss: 14.8439 - val_mae: 3.0112 Epoch 12/15 819/819 [==============================] - 5s 7ms/step - loss: 10.0088 - mae: 2.4984 - val_loss: 15.9239 - val_mae: 3.1553 Epoch 13/15 819/819 [==============================] - 6s 7ms/step - loss: 9.8089 - mae: 2.4716 - val_loss: 15.2804 - val_mae: 3.0627 Epoch 14/15 819/819 [==============================] - 6s 7ms/step - loss: 9.6700 - mae: 2.4567 - val_loss: 15.5689 - val_mae: 3.0812 Epoch 15/15 819/819 [==============================] - 5s 7ms/step - loss: 9.5251 - mae: 2.4370 - val_loss: 15.4453 - val_mae: 3.0701 405/405 [==============================] - 2s 4ms/step - loss: 35801.5000 - mae: 15.0808 Test MAE: 15.0808 进程已结束,退出代码0
我改进以后效果好了一些,尤其是在测试集上有了很大提升:
inputs=keras.Input(shape=(sequence_length,raw_data.shape[-1])) x=Conv1D(8,3,activation='relu')(inputs) x=Conv1D(8,3,activation='relu')(x) x=Conv1D(8,3,activation='relu')(x) x=x[:,-1,:] outputs=Dense(1)(x) model=keras.Model(inputs,outputs) callbacks=[tf_keras.callbacks.EarlyStopping(patience=20,restore_best_weights=False), tf_keras.callbacks.TensorBoard('log3')] model.compile(loss='mse',metrics=['mae']) history=model.fit(train_dataset,epochs=15,validation_data=val_dataset,callbacks=callbacks) print(F'Test MAE: {model.evaluate(test_dataset)[1]:.4f}')
Epoch 1/15 2023-01-11 15:29:52.083749: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8600 2023-01-11 15:29:52.657463: I tensorflow/stream_executor/cuda/cuda_blas.cc:1774] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once. 819/819 [==============================] - 8s 6ms/step - loss: 20.7546 - mae: 3.3272 - val_loss: 9.5602 - val_mae: 2.4193 Epoch 2/15 819/819 [==============================] - 5s 6ms/step - loss: 10.8946 - mae: 2.5747 - val_loss: 9.3926 - val_mae: 2.3896 Epoch 3/15 819/819 [==============================] - 5s 6ms/step - loss: 10.6686 - mae: 2.5451 - val_loss: 9.6082 - val_mae: 2.4107 Epoch 4/15 819/819 [==============================] - 5s 6ms/step - loss: 10.5398 - mae: 2.5305 - val_loss: 9.3037 - val_mae: 2.3733 Epoch 5/15 819/819 [==============================] - 5s 6ms/step - loss: 10.4401 - mae: 2.5183 - val_loss: 9.3937 - val_mae: 2.4011 Epoch 6/15 819/819 [==============================] - 5s 6ms/step - loss: 10.3718 - mae: 2.5101 - val_loss: 9.0525 - val_mae: 2.3397 Epoch 7/15 819/819 [==============================] - 5s 6ms/step - loss: 10.3138 - mae: 2.5027 - val_loss: 9.2140 - val_mae: 2.3710 Epoch 8/15 819/819 [==============================] - 5s 6ms/step - loss: 10.2631 - mae: 2.4962 - val_loss: 8.9608 - val_mae: 2.3236 Epoch 9/15 819/819 [==============================] - 5s 6ms/step - loss: 10.2240 - mae: 2.4901 - val_loss: 9.2397 - val_mae: 2.3639 Epoch 10/15 819/819 [==============================] - 5s 7ms/step - loss: 10.1816 - mae: 2.4838 - val_loss: 8.9042 - val_mae: 2.3149 Epoch 11/15 819/819 [==============================] - 5s 6ms/step - loss: 10.1380 - mae: 2.4775 - val_loss: 8.9440 - val_mae: 2.3243 Epoch 12/15 819/819 [==============================] - 5s 6ms/step - loss: 10.1112 - mae: 2.4736 - val_loss: 9.0232 - val_mae: 2.3320 Epoch 13/15 819/819 [==============================] - 5s 6ms/step - loss: 10.0765 - mae: 2.4690 - val_loss: 8.8785 - val_mae: 2.3134 Epoch 14/15 819/819 [==============================] - 5s 6ms/step - loss: 10.0503 - mae: 2.4652 - val_loss: 8.8700 - val_mae: 2.3159 Epoch 15/15 819/819 [==============================] - 5s 6ms/step - loss: 10.0250 - mae: 2.4614 - val_loss: 9.0228 - val_mae: 2.3249 405/405 [==============================] - 2s 4ms/step - loss: 3431.3291 - mae: 3.7295 Test MAE: 3.7295 进程已结束,退出代码0
我是凭习惯搭建的,具体为什么这个网络更稳定、过拟合更不明显,就不在这里探讨了,如果你恰巧读到了这里,并且感兴趣,可以和我探讨。
接下来使用RNN,构建一个最简单的RNN模型
inputs=Input(shape=(sequence_length,raw_data.shape[-1]))
x=LSTM(16)(inputs)
outputs=Dense(1)(x)
model=keras.Model(inputs,outputs)
callbacks=[tf_keras.callbacks.EarlyStopping(patience=20,restore_best_weights=False),
tf_keras.callbacks.TensorBoard('log4')]
model.compile(loss='mse',metrics=['mae'])
history=model.fit(train_dataset,epochs=12,validation_data=val_dataset,callbacks=callbacks)
print(F'Test MAE: {model.evaluate(test_dataset)[1]:.4f}')
Epoch 1/12 2023-01-11 15:47:55.789804: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8600 2023-01-11 15:48:03.189360: I tensorflow/stream_executor/cuda/cuda_blas.cc:1774] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once. 819/819 [==============================] - 19s 12ms/step - loss: 44.2629 - mae: 4.8449 - val_loss: 13.0754 - val_mae: 2.7540 Epoch 2/12 819/819 [==============================] - 9s 10ms/step - loss: 11.3814 - mae: 2.6197 - val_loss: 9.9275 - val_mae: 2.4705 Epoch 3/12 819/819 [==============================] - 9s 11ms/step - loss: 9.9833 - mae: 2.4671 - val_loss: 9.4718 - val_mae: 2.4007 Epoch 4/12 819/819 [==============================] - 9s 11ms/step - loss: 9.5256 - mae: 2.4123 - val_loss: 9.7278 - val_mae: 2.4292 Epoch 5/12 819/819 [==============================] - 9s 11ms/step - loss: 9.1955 - mae: 2.3755 - val_loss: 9.7076 - val_mae: 2.4281 Epoch 6/12 819/819 [==============================] - 10s 12ms/step - loss: 8.9652 - mae: 2.3462 - val_loss: 9.8820 - val_mae: 2.4488 Epoch 7/12 819/819 [==============================] - 9s 11ms/step - loss: 8.7773 - mae: 2.3238 - val_loss: 9.8793 - val_mae: 2.4484 Epoch 8/12 819/819 [==============================] - 9s 11ms/step - loss: 8.6154 - mae: 2.3033 - val_loss: 10.2244 - val_mae: 2.4705 Epoch 9/12 819/819 [==============================] - 9s 11ms/step - loss: 8.4448 - mae: 2.2836 - val_loss: 9.9520 - val_mae: 2.4346 Epoch 10/12 819/819 [==============================] - 9s 11ms/step - loss: 8.2629 - mae: 2.2578 - val_loss: 10.0459 - val_mae: 2.4407 Epoch 11/12 819/819 [==============================] - 9s 11ms/step - loss: 8.1347 - mae: 2.2373 - val_loss: 10.0667 - val_mae: 2.4374 Epoch 12/12 819/819 [==============================] - 9s 11ms/step - loss: 8.0236 - mae: 2.2213 - val_loss: 10.0322 - val_mae: 2.4461 405/405 [==============================] - 2s 5ms/step - loss: 11.5939 - mae: 2.6364 Test MAE: 2.6364 进程已结束,退出代码0
出现了在测试集上的最好效果!看来RNN处理时间序列确实很占优势。
下面进一步复习一下RNN知识:
RNN可以接受任意序列长度的输入,这是以前没写过的,方法很简单,在输入层的时间步长处写None就可以了。
num_features=14
inputs=keras.Input(shape=(None,num_features))
outputs=SimpleRNN(16)(inputs)
model = keras.Model(inputs, outputs)
model.summary()
Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, None, 14)] 0 simple_rnn (SimpleRNN) (None, 16) 496 ================================================================= Total params: 496 Trainable params: 496 Non-trainable params: 0 _________________________________________________________________ 进程已结束,退出代码0
由于刚刚的LSTM过拟合了,所以这次使用dropout技术:
inputs=Input(x_train.shape[1:])
x=LSTM(32,recurrent_dropout=0.25)(inputs)
x=Dropout(0.5)(x)
outputs=Dense(1)(x)
model=keras.Model(inputs,outputs)
callbacks=[tf_keras.callbacks.EarlyStopping('val_mae',patience=20,restore_best_weights=True),
tf_keras.callbacks.TensorBoard('log7')]
model.compile(tf_keras.optimizers.RMSprop(0.005),loss='mse',metrics=['mae'])
history=model.fit(x_train,y_train,epochs=50,validation_data=(x_val,y_val),
callbacks=callbacks,batch_size=2048)
print(F'Test MAE: {model.evaluate(test_dataset)[1]:.4f}')
Test MAE: 2.5383
结果得到进一步提升。已经高于基准的2.62了,过拟合减轻了很多。
由于RNN训练起来很慢,所以要尽可能的想办法加快一点,有一个方法是把RNN中的for循环给展开,把循环去掉。这样可以大大加快训练速度。(我以前都不知道!!多花费了很多时间)这是用空间换时间,有可能消耗过多内存。我实测,训练用时缩短50%以上。方法很简单,把参数设置为unroll=True就可以了。
x=LSTM(32,recurrent_dropout=0.25,unroll=True)(inputs)
接下来试一下堆叠的GRU模型,一共有两层:
inputs=Input(x_train.shape[1:])
x=GRU(32,recurrent_dropout=0.5,return_sequences=True,unroll=True)(inputs)
x=GRU(32,recurrent_dropout=0.5,unroll=True)(x)
x=Dropout(0.5)(x)
outputs=Dense(1)(x)
model=keras.Model(inputs,outputs)
callbacks=[tf_keras.callbacks.EarlyStopping('val_mae',patience=20,restore_best_weights=True),
tf_keras.callbacks.TensorBoard('log8')]
model.compile(tf_keras.optimizers.RMSprop(0.005),loss='mse',metrics=['mae'])
history=model.fit(x_train,y_train,epochs=50,validation_data=(x_val,y_val),
callbacks=callbacks,batch_size=2048)
print(F'Test MAE: {model.evaluate(test_dataset)[1]:.4f}')
模型的效果进一步提升了:
Test MAE: 2.4276
使用双向LSTM,
inputs=Input(shape=x_train.shape[1:])
x=Bidirectional(LSTM(16))(inputs)
outputs=Dense(1)(x)
model=keras.Model(inputs,outputs)
callbacks=[tf_keras.callbacks.EarlyStopping('val_mae',patience=20,restore_best_weights=True),
tf_keras.callbacks.TensorBoard('log8')]
model.compile(tf_keras.optimizers.RMSprop(0.005),loss='mse',metrics=['mae'])
history=model.fit(x_train,y_train,epochs=50,validation_data=(x_val,y_val),
callbacks=callbacks,batch_size=2048)
print(F'Test MAE: {model.evaluate(test_dataset)[1]:.4f}')
过拟合比较严重,效果倒是还算可以,不过一次实验的好坏不太能说明什么,对于该任务,双向的LSTM引入了更多无用的信息,所以理论上性能要差一些:
Test MAE: 2.5377
稍微学了一点文本数据的处理
使用了keras里面自带的TextVectorization层对文本进行词向量化
text_vectorization=TextVectorization(output_mode='int') # 改层的返回值是编码为整数索引的单词序列
# 默认情况下文本向量化层的标准化方法是:转换为小写字母并删除标点符号,利用空格拆分(也可以提供自定义函数,见书P280
dataset=[
'I write, erase, rewrite','Erase again, and then','A poppy blooms.',
] # 语料库
text_vectorization.adapt(dataset) # 使用adapt方法就能建立索引
print(text_vectorization.get_vocabulary())
['', '[UNK]', 'erase', 'write', 'then', 'rewrite', 'poppy', 'i', 'blooms', 'and', 'again', 'a']
索引0表示不是一个单词(比如填充符),索引1表示未知的单词(以前没见过的,不再语料库里的单词)。词表元素按频率排序。
然后对一个例句进行编码:
vocabulary=text_vectorization.get_vocabulary()
test_sentence='I write, rewrite, and still rewrite again'
encode_sentence=text_vectorization(test_sentence)
Out[3]: <tf.Tensor: shape=(7,), dtype=int64, numpy=array([ 7, 3, 5, 9, 1, 5, 10], dtype=int64)>
然后再解码:
inverse_vocab=dict(enumerate(vocabulary))
decode_sentence=' '.join(inverse_vocab[int(i)] for i in encode_sentence)
Out[7]: 'i write rewrite and [UNK] rewrite again'
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。