赞
踩
Jordan RNN于1986年提出:《SERIAL ORDER: A PARALLEL DISTRmUTED PROCESSING APPROACH》
Elman RNN于1990年提出:《Finding Structure in Time》
《LSTM原始论文:Long Short-Term Memory》
《GRU原始论文:Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation》
为什么有了神经网络还需要有循环神经网络?
在普通的神经网络中,信息的传递是单向的,这种限制虽然使得网络变得更容易学习,但在一定程度上也减弱了神经网络模型的能力。特别是在很多现实任务中,网络的输出不仅和当前时刻的输入相关,也和其过去一段时间的输出相关。此外,普通网络难以处理时序数据,比如视频、语音、文本等,时序数据的长度一般是不固定的,而前馈神经网络要求输入和输出的维数都是固定的,不能任意改变。因此,当处理这一类和时序相关的问题时,就需要一种能力更强的模型。
循环神经网络(Recurrent Neural Network,RNN)是一类具有短期记忆能力的神经网络。在循环神经网络中,神经元不但可以接受其它神经元的信息,也可以接受自身的信息,形成具有环路的网络结构。换句话说:神经元的输出可以在下一个时间步直接作用到自身。
RNN(Recurrent Neural Network), 中文称作循环神经网络, 它一般以序列数据为输入, 通过网络内部的结构设计有效捕捉序列之间的关系特征, 一般也是以序列形式进行输出.
DNN算法(全连接神经网络)、CNN算法(卷积神经网络)的输出都是只考虑前一个输入的影响而不考虑其它时刻输入的影响,比如简单的猫、狗、手写数字等单个物体的识别具有较好的效果。
因为RNN结构能够很好利用序列之间的关系, 因此针对自然界具有连续性的输入序列, 如人类的语言, 语音等进行很好的处理, 广泛应用于NLP领域的各项任务, 如文本分类, 情感分析, 意图识别, 机器翻译等.
以一个用户意图识别的例子进行简单的分析:
RNN模型的分类
{
按照输入和输出的结构进行分类
{
“N vs N” RNN
“N vs 1” RNN
“1 vs N” RNN
“N vs M” RNN
按照
R
N
N
的内部构造进行分类
{
传统RNN
LSTM
Bi-LSTM
GRU
Bi-GRU
它是RNN最基础的结构形式, 最大的特点就是: 输入和输出序列是等长的. 由于这个限制的存在, 使其适用范围比较小, 可用于生成等长度的合辙诗句.
有时候我们要处理的问题输入是一个序列,而要求输出是一个单独的值而不是序列,应该怎样建模呢?我们只要在最后一个隐层输出h上进行线性变换就可以了,大部分情况下,为了更好的明确结果, 还要使用sigmoid或者softmax进行处理. 这种结构经常被应用在文本分类问题上.
如果输入不是序列而输出为序列的情况怎么处理呢?我们最常采用的一种方式就是使该输入作用于每次的输出之上. 这种结构可用于将图片生成文字任务等.
这是一种不限输入输出长度的RNN结构, 它由编码器和解码器两部分组成, 两者的内部结构都是某类RNN, 它也被称为seq2seq架构. 输入数据首先通过编码器, 最终输出一个隐含变量c, 之后最常用的做法是使用这个隐含变量c作用在解码器进行解码的每一步上, 以保证输入信息被有效利用.
seq2seq架构最早被提出应用于机器翻译, 因为其输入输出不受限制,如今也是应用最广的RNN模型结构. 在机器翻译, 阅读理解, 文本摘要等众多领域都进行了非常多的应用实践.
通过简化图,我们看到RNN比传统的神经网络多了一个循环圈,这个循环表示的就是在下一个时间步(Time Step)上会返回作为输入的一部分,我们把RNN在时间点上展开,得到的图形如下:
对于一个模型而言,最重要的四个部分:输入、输出、参数、对应运算关系。
有了这个RNN基本单元,整个RNN网络也就呼之欲出了。
内部结构分析: 我们把目光集中在中间的方块部分, 它的输入有两部分, 分别是
h
t
−
1
h_{t-1}
ht−1 以及
X
t
X_{t}
Xt, 代表上一时间步的隐层输出, 以及此时间步的输入, 它们进入RNN结构体后, 会"融合"到一起, 这种融合我们根据结构解释可知, 是将二者进行拼接, 形成新的张量
[
h
t
−
1
,
X
t
]
[h_{t-1}, X_{t}]
[ht−1,Xt], 之后这个新的张量将通过一个全连接层(线性层), 该层使用tanh作为激活函数, 最终得到该时间步的输出
h
t
h_t
ht, 它将作为下一个时间步的输入和
X
t
+
1
X_{t+1}
Xt+1一起进入结构体. 以此类推.
RNN传统结构:
Feed Forward Neural Network | Recurrent Neural Network | Passing Hidden State to next time step |
---|---|---|
传统RNN输入有两部分, 分别是
h
t
−
1
h_{t-1}
ht−1以及
x
t
x_t
xt, 代表上一时间步的隐层输出, 以及此时间步的输入, 它们进入RNN结构体后, 会"融合"/concat到一起, 这种融合我们根据结构解释可知, 是将二者进行拼接, 形成新的张量
[
x
t
,
h
t
−
1
]
[x_t, h_{t-1}]
[xt,ht−1], 之后这个新的张量将通过一个全连接层(线性层), 该层使用tanh作为激活函数, 最终得到该时间步的输出
h
t
h_t
ht, 它将作为下一个时间步的输入和
x
t
+
1
x_{t+1}
xt+1一起进入结构体. 以此类推.
# 导入工具包 import torch import torch.nn as nn # 实例化一个RNN对象 rnn = nn.RNN(input_size=5, hidden_size=6, num_layers=1) # input_size:输入张量x中特征维度的大小(输入WordEmbedding的维度);hidden_size:隐层张量h中特征维度的大小(隐藏层神经元的个数);num_layers: 隐含层的层数. # 初始化输入 input = torch.randn(1, 3, 5) # 初始化一个输入张量【batch_size=1表示当前批次的样本数量;3表示样本序列长度/sequence_lenght;5表示WordEmbedding维度】此处的WordEmbedding维度要与rnn对象的input_size一致 h0 = torch.randn(1, 3, 6) # 初始化一个初始隐藏层张量【1表示隐藏层层数num_layers,要与与rnn对象的num_layers一致;3表示样本序列长度/sequence_lenght;6表示隐藏层神经元的个数(隐层张量的特征维度的大小)】 print("input.shape = {0}\ninput = \n{1}".format(input.shape, input)) # 利用RNN模型计算输出 output, hn = rnn(input, h0) print("\noutput.shape = {0}\noutput = \n{1}".format(output.shape, output)) print("\nhn.shape = {0}\nhn = \n{1}".format(hn.shape, hn))
打印结果:
input.shape = torch.Size([1, 3, 5]) input = tensor([[[-0.9448, 0.4040, -1.5022, -1.3403, -0.9938], [-0.5331, 0.0470, -0.3628, 0.3317, -0.0419], [-0.9932, -0.1746, -1.2205, 0.8281, 0.3448]]]) output.shape = torch.Size([1, 3, 6]) output = tensor([[[-0.5093, 0.0121, 0.7261, 0.7477, -0.8238, -0.0444], [ 0.0887, 0.2913, 0.2533, 0.4546, -0.0650, 0.1601], [-0.5954, 0.4542, 0.1506, 0.5844, -0.3318, -0.6880]]], grad_fn=<StackBackward0>) hn.shape = torch.Size([1, 3, 6]) hn = tensor([[[-0.5093, 0.0121, 0.7261, 0.7477, -0.8238, -0.0444], [ 0.0887, 0.2913, 0.2533, 0.4546, -0.0650, 0.1601], [-0.5954, 0.4542, 0.1506, 0.5844, -0.3318, -0.6880]]], grad_fn=<StackBackward0>) Process finished with exit code 0
nn.RNN类初始化主要参数解释:
nn.RNN类实例化对象主要参数解释:
Jordan Network于1986年提出。《SERIAL ORDER: A PARALLEL DISTRmUTED PROCESSING APPROACH》
Elman Network于1990年提出,公认的叫法为RNN。《Finding Structure in Time》
如果在训练过程中发生了梯度消失,权重无法被更新,最终导致训练失败; 梯度爆炸所带来的梯度过大,大幅度更新网络参数,在极端情况下,结果会溢出(NaN值).
import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' import tensorflow as tf import numpy as np from tensorflow import keras from tensorflow.keras import layers import time tf.random.set_seed(22) np.random.seed(22) assert tf.__version__.startswith('2.') batch_size = 500 # 每次训练500个句子 total_words = 10000 # the most frequest words max_review_len = 80 # 设置句子长度,如果有的句子的长度不到80则补齐,如果有的句子超过80则截断 embedding_len = 100 # 每个单词转为向量后的向量维度 # 一、获取数据集 (X_train, Y_train), (X_val, Y_val) = keras.datasets.imdb.load_data(num_words=total_words) print('X_train[0] = {0},\nY_train[0] = {1}'.format(X_train[0], Y_train[0])) print('X_train.shpae = {0},Y_train.shpae = {1}------------type(X_train) = {2},type(Y_train) = {3}'.format(X_train.shape, Y_train.shape, type(X_train), type(Y_train))) # 二、数据处理 # 2.1 # 设置句子统一长度 X_train = keras.preprocessing.sequence.pad_sequences(X_train, maxlen=max_review_len) # 设置句子长度 [b, 80] X_val = keras.preprocessing.sequence.pad_sequences(X_val, maxlen=max_review_len) # 设置句子长度 print('X_train.shpae = {0},Y_train.shpae = {1},tf.reduce_max(Y_train) = {2},tf.reduce_min(Y_train) = {3}'.format(X_train.shape, Y_train.shape, tf.reduce_max(Y_train), tf.reduce_min(Y_train))) # 2.1 处理训练集为batch模型 db_train = tf.data.Dataset.from_tensor_slices((X_train, Y_train)) db_train = db_train.shuffle(1000).batch(batch_size, drop_remainder=True) # 通过 drop_remainder=True 把 最后一个不满足batch_size大小的batch丢弃掉 db_val = tf.data.Dataset.from_tensor_slices((X_val, Y_val)) db_val = db_val.batch(batch_size, drop_remainder=True) # 通过 drop_remainder=True 把 最后一个不满足batch_size大小的batch丢弃掉 print('db_train = {0},len(db_train) = {1}'.format(db_train, len(db_train))) class MyRNN(keras.Model): def __init__(self, output_dim): super(MyRNN, self).__init__() # ***********************************************************memoryCell*********************************************************** # [b, 64] # 用于保存上一次的隐藏状态的输出值h_{t-1},作为计算本次的输出值h_t时的输入值之一 # 使用多个memoryCell串联即实现Deep的作用 self.memoryCell01 = [tf.zeros([batch_size, output_dim])] # 初始化memoryCell01,维度为 [b, 64] self.memoryCell02 = [tf.zeros([batch_size, output_dim])] # 初始化memoryCell02,维度为 [b, 64] # ***********************************************************Embedding*********************************************************** # 将每一个句子(维度为[80,1],80表示每个句子包含的word数量,1表示1个word)变换为wordEmbedding(维度为[80,100],80表示每个句子包含的word数量,100表示每个wordEmbedding的维度) # [b, 80, 1] => [b, 80, 100] # input_dim:表示输入维度,即设定词库总单词数量;b # input_length:表示每个句子统一长度(包含的单词数量);80 # output_dim:表示输出维度,即每个单词转为向量后的向量维度;100 self.embedding = layers.Embedding(input_dim=total_words, input_length=max_review_len, output_dim=embedding_len) # ***********************************************************RNNCell Layer*********************************************************** # [b, 80, 100]=>[b, 64] self.rnn_cell01 = layers.SimpleRNNCell(output_dim, dropout=0.2) # output_dim: dimensionality of the output space. 隐藏状态的维度;dropout 防止过拟合 self.rnn_cell02 = layers.SimpleRNNCell(output_dim, dropout=0.2) # ***********************************************************全连接层*********************************************************** # [b, 64] => [b, 1] self.outlayer = layers.Dense(1) def call(self, inputs, training=None): """ net(x) net(x, training=True) :train mode net(x, training=False): test mode :param inputs: [b, 80, 1] :param training: :return: """ # ***********************************************************Embedding*********************************************************** # embedding: [b, 80, 1] => [b, 80, 100] wordEmbeddings = self.embedding(inputs) # inputs 为1个batch的句子文本 print('\nwordEmbeddings.shape = {0}, wordEmbeddings = {1}'.format(wordEmbeddings.shape, wordEmbeddings)) # rnn cell compute # ***********************************************************RNNCell Layer*********************************************************** # [b, 80, 100] => [b, 1, 64],每个句子都从降维:[80, 100]=>[1, 64] memoryCell01 = self.memoryCell01 memoryCell02 = self.memoryCell02 wordEmbedding_index = 0 for wordEmbedding in tf.unstack(wordEmbeddings, axis=1): # wordEmbedding: [b, 100],将每个句子中的80个单词展开,即按读取该句子的时间轴展开 # 隐含状态:out01/out02: [b, 64] # h_t = x_t×w_{xh}+h_{t-1}×w_{hh};其中:x_t=wordEmbedding;h_{t-1}=memoryCell01;输出值h_t = out01 out01, memoryCell01_current = self.rnn_cell01(wordEmbedding, memoryCell01, training=training) # training=True 表示模式是训练模式,dropout功能有效,默认是True memoryCell01 = memoryCell01_current # 并将h_t替代memoryCell01中的旧的h_{t-1}用于下个单词 # 将rnn_cell01的输出值out01传入下一个rnn_cell02提升RNNCell Layer的提取能力 out02, memoryCell02_current = self.rnn_cell02(out01, memoryCell02, training=training) # training=True 表示模式是训练模式,dropout功能有效,默认是True memoryCell02 = memoryCell02_current # 并将h_t替代memoryCell02中的旧的h_{t-1}用于下个单词 if wordEmbedding_index == 0: print('wordEmbedding.shape = {0}, wordEmbedding = {1}'.format(wordEmbedding.shape, wordEmbedding)) print('out01.shape = {0}, out01 = {1}'.format(out01.shape, out01)) print('out02.shape = {0}, out02 = {1}'.format(out02.shape, out02)) wordEmbedding_index += 1 # ***********************************************************全连接层*********************************************************** # out: [b, 1, 64] => [b, 1, 1] out_logit = self.outlayer(out02) # out02代表了每个句子的语义信息的提取 print('out_logit.shape = {0}, out_logit = {1}'.format(out_logit.shape, out_logit)) out_prob = tf.sigmoid(out_logit) # p(y is pos|x) print('out_prob.shape = {0}, out_prob = {1}, {2}'.format(out_prob.shape, out_prob, '\n')) return out_prob def main(): output_dim = 64 # 设定输出的隐藏状态维度 [b, 100] => [b,64] epochs = 4 t0 = time.time() network = MyRNN(output_dim) # 不需要设置from_logits=True,因为MyRNN()中已经设定了激活函数层 out_prob = tf.sigmoid(X) # metrics=['accuracy']表示打印测试数据 network.compile(optimizer=keras.optimizers.Adam(0.001), loss=tf.losses.BinaryCrossentropy(), metrics=['accuracy']) print('\n***********************************************************训练network:开始***********************************************************') network.fit(db_train, epochs=epochs, validation_data=db_val) print('***********************************************************训练network:结束***********************************************************') print('\n***********************************************************评估network(其实训练时已经评估):开始***********************************************************') network.evaluate(db_val) # 评估模型 print('***********************************************************评估network(其实训练时已经评估):结束***********************************************************') t1 = time.time() print('total time cost:', t1 - t0) if __name__ == '__main__': main()
打印结果:
X_train[0] = [1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32], Y_train[0] = 1 X_train.shpae = (25000,),Y_train.shpae = (25000,)------------type(X_train) = <class 'numpy.ndarray'>,type(Y_train) = <class 'numpy.ndarray'> X_train.shpae = (25000, 80),Y_train.shpae = (25000,),tf.reduce_max(Y_train) = 1,tf.reduce_min(Y_train) = 0 db_train = <BatchDataset shapes: ((500, 80), (500,)), types: (tf.int32, tf.int64)>,len(db_train) = 50 ***********************************************************训练network:开始*********************************************************** Epoch 1/4 wordEmbeddings.shape = (500, 80, 100), wordEmbeddings = Tensor("my_rnn/embedding/embedding_lookup/Identity:0", shape=(500, 80, 100), dtype=float32) wordEmbedding.shape = (500, 100), wordEmbedding = Tensor("my_rnn/unstack:0", shape=(500, 100), dtype=float32) out01.shape = (500, 64), out01 = Tensor("my_rnn/simple_rnn_cell/Tanh:0", shape=(500, 64), dtype=float32) out02.shape = (500, 64), out02 = Tensor("my_rnn/simple_rnn_cell_1/Tanh:0", shape=(500, 64), dtype=float32) out_logit.shape = (500, 1), out_logit = Tensor("my_rnn/dense/BiasAdd:0", shape=(500, 1), dtype=float32) out_prob.shape = (500, 1), out_prob = Tensor("my_rnn/Sigmoid:0", shape=(500, 1), dtype=float32), wordEmbeddings.shape = (500, 80, 100), wordEmbeddings = Tensor("my_rnn/embedding/embedding_lookup/Identity:0", shape=(500, 80, 100), dtype=float32) wordEmbedding.shape = (500, 100), wordEmbedding = Tensor("my_rnn/unstack:0", shape=(500, 100), dtype=float32) out01.shape = (500, 64), out01 = Tensor("my_rnn/simple_rnn_cell/Tanh:0", shape=(500, 64), dtype=float32) out02.shape = (500, 64), out02 = Tensor("my_rnn/simple_rnn_cell_1/Tanh:0", shape=(500, 64), dtype=float32) out_logit.shape = (500, 1), out_logit = Tensor("my_rnn/dense/BiasAdd:0", shape=(500, 1), dtype=float32) out_prob.shape = (500, 1), out_prob = Tensor("my_rnn/Sigmoid:0", shape=(500, 1), dtype=float32), 50/50 [==============================] - ETA: 0s - loss: 0.6942 - accuracy: 0.5303 wordEmbeddings.shape = (500, 80, 100), wordEmbeddings = Tensor("my_rnn/embedding/embedding_lookup/Identity:0", shape=(500, 80, 100), dtype=float32) wordEmbedding.shape = (500, 100), wordEmbedding = Tensor("my_rnn/unstack:0", shape=(500, 100), dtype=float32) out01.shape = (500, 64), out01 = Tensor("my_rnn/simple_rnn_cell/Tanh:0", shape=(500, 64), dtype=float32) out02.shape = (500, 64), out02 = Tensor("my_rnn/simple_rnn_cell_1/Tanh:0", shape=(500, 64), dtype=float32) out_logit.shape = (500, 1), out_logit = Tensor("my_rnn/dense/BiasAdd:0", shape=(500, 1), dtype=float32) out_prob.shape = (500, 1), out_prob = Tensor("my_rnn/Sigmoid:0", shape=(500, 1), dtype=float32), 50/50 [==============================] - 11s 125ms/step - loss: 0.6938 - accuracy: 0.5309 - val_loss: 0.5607 - val_accuracy: 0.7175 Epoch 2/4 50/50 [==============================] - 5s 98ms/step - loss: 0.4480 - accuracy: 0.7937 - val_loss: 0.4222 - val_accuracy: 0.8073 Epoch 3/4 50/50 [==============================] - 5s 99ms/step - loss: 0.2625 - accuracy: 0.8933 - val_loss: 0.4523 - val_accuracy: 0.8001 Epoch 4/4 50/50 [==============================] - 5s 98ms/step - loss: 0.1500 - accuracy: 0.9448 - val_loss: 0.5610 - val_accuracy: 0.8037 ***********************************************************训练network:结束*********************************************************** ***********************************************************评估network(其实训练时已经评估):开始*********************************************************** 50/50 [==============================] - 1s 23ms/step - loss: 0.5610 - accuracy: 0.8037 ***********************************************************评估network(其实训练时已经评估):结束*********************************************************** total time cost: 26.676692247390747 Process finished with exit code 0
import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' import tensorflow as tf import numpy as np from tensorflow import keras from tensorflow.keras import layers import time tf.random.set_seed(22) np.random.seed(22) assert tf.__version__.startswith('2.') batch_size = 500 # 每次训练500个句子 total_words = 10000 # the most frequest words max_review_len = 80 # 设置句子长度,如果有的句子的长度不到80则补齐,如果有的句子超过80则截断 embedding_len = 100 # 每个单词转为向量后的向量维度 # 一、获取数据集 (X_train, Y_train), (X_val, Y_val) = keras.datasets.imdb.load_data(num_words=total_words) print('X_train[0] = {0},\nY_train[0] = {1}'.format(X_train[0], Y_train[0])) print('X_train.shpae = {0},Y_train.shpae = {1}------------type(X_train) = {2},type(Y_train) = {3}'.format(X_train.shape, Y_train.shape, type(X_train), type(Y_train))) # 二、数据处理 # 2.1 # 设置句子统一长度 X_train = keras.preprocessing.sequence.pad_sequences(X_train, maxlen=max_review_len) # 设置句子长度 [b, 80] X_val = keras.preprocessing.sequence.pad_sequences(X_val, maxlen=max_review_len) # 设置句子长度 print('X_train.shpae = {0},Y_train.shpae = {1},tf.reduce_max(Y_train) = {2},tf.reduce_min(Y_train) = {3}'.format(X_train.shape, Y_train.shape, tf.reduce_max(Y_train), tf.reduce_min(Y_train))) # 2.1 处理训练集为batch模型 db_train = tf.data.Dataset.from_tensor_slices((X_train, Y_train)) db_train = db_train.shuffle(1000).batch(batch_size, drop_remainder=True) # 通过 drop_remainder=True 把 最后一个不满足batch_size大小的batch丢弃掉 db_val = tf.data.Dataset.from_tensor_slices((X_val, Y_val)) db_val = db_val.batch(batch_size, drop_remainder=True) # 通过 drop_remainder=True 把 最后一个不满足batch_size大小的batch丢弃掉 print('db_train = {0},len(db_train) = {1}'.format(db_train, len(db_train))) class MyRNN(keras.Model): def __init__(self, output_dim): super(MyRNN, self).__init__() # ***********************************************************Embedding*********************************************************** # transform text to embedding representation # 将每一个句子(维度为[80,1],80表示每个句子包含的word数量,1表示1个word)变换为wordEmbedding(维度为[80,100],80表示每个句子包含的word数量,100表示每个wordEmbedding的维度) # [b, 80, 1] => [b, 80, 100] # input_dim:表示输入维度,即设定词库总单词数量;b # input_length:表示每个句子统一长度(包含的单词数量);80 # output_dim:表示输出维度,即每个单词转为向量后的向量维度;100 self.embedding = layers.Embedding(input_dim=total_words, input_length=max_review_len, output_dim=embedding_len) # ***********************************************************RNN神经网络结构:SimpleRNN 表示SimpleRNN连接层*********************************************************** # [b, 80, 100]=>[b, 64] self.rnn = keras.Sequential([ # output_dim: dimensionality of the output space. 隐藏状态的维度;dropout 防止过拟合 # return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence. # unroll: Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. # Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences. layers.SimpleRNN(output_dim, dropout=0.5, return_sequences=True, unroll=True), layers.SimpleRNN(output_dim, dropout=0.5, unroll=True) ]) # ***********************************************************全连接层*********************************************************** # [b, 64] => [b, 1] self.outlayer = layers.Dense(1) def call(self, inputs, training=None): """ net(x) net(x, training=True) :train mode net(x, training=False): test :param inputs: [b, 80] :param training: :return: """ # ***********************************************************Embedding*********************************************************** # embedding: [b, 80, 1] => [b, 80, 100] x_wordEmbeddings = self.embedding(inputs) # inputs 为1个batch的句子文本 print('\nx_wordEmbeddings.shape = {0}, x_wordEmbeddings = {1}'.format(x_wordEmbeddings.shape, x_wordEmbeddings)) # ***********************************************************RNN神经网络结构计算*********************************************************** out = self.rnn(x_wordEmbeddings) # x: [b, 80, 100] => [b, 64] print('out.shape = {0}, out = {1}'.format(out.shape, out)) out_logit = self.outlayer(out) # 隐含状态=>0/1 out: [b, 64] => [b, 1] print('out_logit.shape = {0}, out_logit = {1}'.format(out_logit.shape, out_logit)) out_prob = tf.sigmoid(out_logit) # p(y is pos|x) print('out_prob.shape = {0}, out_prob = {1}, {2}'.format(out_prob.shape, out_prob, '\n')) return out_prob def main(): output_dim = 64 # 设定输出的隐藏状态维度 [b, 100] => [b,64] epochs = 4 t0 = time.time() network = MyRNN(output_dim) # 不需要设置from_logits=True,因为MyRNN()中已经设定了激活函数层 out_prob = tf.sigmoid(X) # metrics=['accuracy']表示打印测试数据 network.compile(optimizer=keras.optimizers.Adam(0.001), loss=tf.losses.BinaryCrossentropy(), metrics=['accuracy']) print('\n***********************************************************训练network:开始***********************************************************') network.fit(db_train, epochs=epochs, validation_data=db_val) print('***********************************************************训练network:结束***********************************************************') print('\n***********************************************************评估network(其实训练时已经评估):开始***********************************************************') network.evaluate(db_val) # 评估模型 print('***********************************************************评估network(其实训练时已经评估):结束***********************************************************') t1 = time.time() print('total time cost:', t1 - t0) if __name__ == '__main__': main()
打印结果:
X_train[0] = [1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32], Y_train[0] = 1 X_train.shpae = (25000,),Y_train.shpae = (25000,)------------type(X_train) = <class 'numpy.ndarray'>,type(Y_train) = <class 'numpy.ndarray'> X_train.shpae = (25000, 80),Y_train.shpae = (25000,),tf.reduce_max(Y_train) = 1,tf.reduce_min(Y_train) = 0 db_train = <BatchDataset shapes: ((500, 80), (500,)), types: (tf.int32, tf.int64)>,len(db_train) = 50 ***********************************************************训练network:开始*********************************************************** Epoch 1/4 x_wordEmbeddings.shape = (500, 80, 100), x_wordEmbeddings = Tensor("my_rnn/embedding/embedding_lookup/Identity:0", shape=(500, 80, 100), dtype=float32) out.shape = (500, 64), out = Tensor("my_rnn/sequential/simple_rnn_1/simple_rnn_cell_1/Tanh_79:0", shape=(500, 64), dtype=float32) out_logit.shape = (500, 1), out_logit = Tensor("my_rnn/dense/BiasAdd:0", shape=(500, 1), dtype=float32) out_prob.shape = (500, 1), out_prob = Tensor("my_rnn/Sigmoid:0", shape=(500, 1), dtype=float32), x_wordEmbeddings.shape = (500, 80, 100), x_wordEmbeddings = Tensor("my_rnn/embedding/embedding_lookup/Identity:0", shape=(500, 80, 100), dtype=float32) out.shape = (500, 64), out = Tensor("my_rnn/sequential/simple_rnn_1/simple_rnn_cell_1/Tanh_79:0", shape=(500, 64), dtype=float32) out_logit.shape = (500, 1), out_logit = Tensor("my_rnn/dense/BiasAdd:0", shape=(500, 1), dtype=float32) out_prob.shape = (500, 1), out_prob = Tensor("my_rnn/Sigmoid:0", shape=(500, 1), dtype=float32), 50/50 [==============================] - ETA: 0s - loss: 0.7086 - accuracy: 0.5031 x_wordEmbeddings.shape = (500, 80, 100), x_wordEmbeddings = Tensor("my_rnn/embedding/embedding_lookup/Identity:0", shape=(500, 80, 100), dtype=float32) out.shape = (500, 64), out = Tensor("my_rnn/sequential/simple_rnn_1/simple_rnn_cell_1/Tanh_79:0", shape=(500, 64), dtype=float32) out_logit.shape = (500, 1), out_logit = Tensor("my_rnn/dense/BiasAdd:0", shape=(500, 1), dtype=float32) out_prob.shape = (500, 1), out_prob = Tensor("my_rnn/Sigmoid:0", shape=(500, 1), dtype=float32), 50/50 [==============================] - 11s 129ms/step - loss: 0.7084 - accuracy: 0.5034 - val_loss: 0.6804 - val_accuracy: 0.5906 Epoch 2/4 50/50 [==============================] - 5s 94ms/step - loss: 0.6384 - accuracy: 0.6291 - val_loss: 0.4407 - val_accuracy: 0.7966 Epoch 3/4 50/50 [==============================] - 5s 95ms/step - loss: 0.4024 - accuracy: 0.8191 - val_loss: 0.4072 - val_accuracy: 0.8284 Epoch 4/4 50/50 [==============================] - 5s 94ms/step - loss: 0.2899 - accuracy: 0.8829 - val_loss: 0.4479 - val_accuracy: 0.8289 ***********************************************************训练network:结束*********************************************************** ***********************************************************评估network(其实训练时已经评估):开始*********************************************************** 50/50 [==============================] - 1s 24ms/step - loss: 0.4479 - accuracy: 0.8289 ***********************************************************评估network(其实训练时已经评估):结束*********************************************************** total time cost: 26.05630612373352 Process finished with exit code 0
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。