赞
踩
参考文章:知乎 - tf.nn.dynamic_rnn 详解
简单提一下,用TensorFlow实现RNN系列结构,基本就是定义一个cell,然后调用一个RNN函数,就获得输出了。而且,cell定义成什么类型基本就是什么类型的RNN了。
一、TensorFlow关于RNN函数的定义
- tf.nn.dynamic_rnn(
- cell, # RNN记忆单元
- inputs, # 序列输入
- sequence_length=None, # 序列长度,即时序长度
- initial_state=None, # RNN初始化状态
- dtype=None, # 数据类型
- parallel_iterations=None, # 并行执行迭代次数
- swap_memory=False, # 用于多GUP并行训练模型
- time_major=False, # 规定输入、输出的shape
- scope=None # 变量作用域,默认"run"
- )
其中,cell就决定了RNN的类型,例如,普通RNN的cell,那就是原始的RNN结构,如果是LSTM的cell,那这个就是LSTM,如果是GRU的cell,那这就是GRU。
inpus就是我们实际的输入数据。sequence_length是输入的数据长度,也是对应的时序长度。
time_major决定了输入输出的shape。怎么理解这句话呢?
通常我们的输入(即这里的inputs)和输出(一会儿提到的output)的shape是`[batch_size, max_time, embedding_size]`。如果time_major为True,那么这个函数就认为输入和输出的shape是`[max_time, batch_size, embedding_size]`,如果为False,那么这个函数就认为输入和输出的shape是`[batch_size, max_time, embedding_size]`。
这个参数默认值是False。
这个函数的返回值有两个,一个是RNN的输出,一个是RNN每个时刻的隐状态。
但是这个输出还挺讲究的,实际需要根据cell的类型,决定输出的类型。
为什么呢?比如LSTM中,细胞状态和输出是不一致的,但在GRU中,这两个值就是一致的,所以就产生了,输出的shape稍有区别。
二、简单实现原始RNN模型
话不多说,先上代码:
- import tensorflow as tf
- import numpy as np
-
- # 输入
- X = np.random.randn(3, 6, 4)
- X[1, 4:] = 0
-
- # 序列长度
- X_lengths = [6, 4, 6]
-
- # 隐藏层神经元个数,决定了输出的最后一维维度
- rnn_hidden_size = 5
-
- # 定义RNN的cell
- cell = tf.nn.rnn_cell.BasicRNNCell(num_units=rnn_hidden_size)
-
- # 执行RNN
- o, s = tf.nn.dynamic_rnn(cell=cell, inputs=X, sequence_length=X_lengths, dtype=tf.float64)
-
- sess = tf.Session()
- sess.run(tf.global_variables_initializer())
-
- print(sess.run([tf.shape(o), o]))
- print(sess.run([tf.shape(s), s]))
整个代码还是比较简单的,正如一开始所说,定义一个cell,调一个RNN函数就完成了。
看一下输出:
- [array([3, 6, 5]), array([[[ 0.17310878, 0.82633802, -0.43270981, 0.06905287,
- -0.44548788],
- [ 0.76207901, 0.14140098, -0.95680809, 0.72005081,
- -0.15649403],
- [ 0.79388277, -0.97675279, -0.33644186, 0.95320421,
- -0.60705681],
- [ 0.00563942, -0.02027826, 0.89590012, -0.22456675,
- -0.40984772],
- [-0.02721506, -0.87714997, -0.43034662, -0.93520363,
- 0.94834008],
- [ 0.25531945, 0.93336422, -0.92178408, 0.30199629,
- -0.92172056]],
-
- [[ 0.28433064, -0.89897588, 0.4130407 , 0.55888719,
- -0.40204589],
- [ 0.41459254, 0.3597689 , 0.9548185 , -0.00866829,
- 0.50680063],
- [-0.06959048, -0.4649923 , 0.94124415, 0.08926017,
- 0.33270379],
- [ 0.69817465, 0.95005181, 0.70850335, 0.5241701 ,
- -0.53791173],
- [ 0. , 0. , 0. , 0. ,
- 0. ],
- [ 0. , 0. , 0. , 0. ,
- 0. ]],
-
- [[-0.12966264, -0.32701574, -0.74199627, -0.49359511,
- -0.32056881],
- [-0.44894617, -0.6809439 , 0.63751225, 0.11421618,
- 0.12798053],
- [-0.13901253, 0.86462562, -0.49524682, -0.77128572,
- -0.71333543],
- [ 0.16060172, -0.47568445, -0.54749102, 0.39206036,
- 0.46851311],
- [-0.94127998, -0.37428214, 0.9176711 , -0.75276436,
- 0.52876751],
- [ 0.60046028, -0.76555278, 0.69193852, 0.76096789,
- -0.72530337]]])]
- [array([3, 5]), array([[ 0.25531945, 0.93336422, -0.92178408, 0.30199629, -0.92172056],
- [ 0.69817465, 0.95005181, 0.70850335, 0.5241701 , -0.53791173],
- [ 0.60046028, -0.76555278, 0.69193852, 0.76096789, -0.72530337]])]
可以看到,RNN的输出有两项,第一项是o(即output),代表RNN每个时间步的隐状态输出,这在RNN或LSTM或GRU里都是一致的,没有区别。相对有区别的是s(即state),代表最终状态。这里目前看不出来什么问题,一会儿到LSTM了提一下。
分析讨论下模型的输出:
对于output,输出的维度是 [3, 6, 5] 。
这个3我们可以理解为batch_size;
6可以理解为最长时间步,也即最长的序列长度,可以看到在第二个样本的数据中,出现了两行0,这是因为我们在sequence_length中指定了,对于第二个样本,序列长度只到4;
5可以理解为embedding的维度,至于为什么是5,因为我们在定义RNN的cell的时候,给了一个参数 num_units,这个值即确定了输出的维度。
对于state,输出的维度是 [3, 5]。
仔细观察下,就会发现,state其实就是output中,每个样本的最后一个输出。对于第二个样本,由于序列长度为4,所以最终的state就对应时序为4时的输出。
三、实现LSTM模型
还是先贴代码:
- import tensorflow as tf
- import numpy as np
-
- # 输入
- X = np.random.randn(3, 6, 4)
- X[1, 4:] = 0
-
- # 序列长度
- X_lengths = [6, 4, 6]
-
- # 隐藏层神经元个数,决定了输出的最后一维维度
- rnn_hidden_size = 5
-
- # 定义LSTM的cell
- cell = tf.nn.rnn_cell.LSTMCell(num_units=rnn_hidden_size)
-
- # 执行RNN
- o, s = tf.nn.dynamic_rnn(cell=cell, inputs=X, sequence_length=X_lengths, dtype=tf.float64)
-
- sess = tf.Session()
- sess.run(tf.global_variables_initializer())
-
- print(sess.run([tf.shape(o), o]))
- print(sess.run([tf.shape(s), s]))
仔细找找,就会发现,和前面实现RNN的模型相比,就只是修改了一下,cell的定义。。。
看看输出有什么不一样:
- [array([3, 6, 5]), array([[[ 0.0970524 , 0.04119286, -0.08474181, -0.18085976,
- -0.07741971],
- [-0.02223423, -0.10350601, -0.08190117, -0.09554744,
- 0.00527808],
- [ 0.09914153, -0.04312684, -0.06642732, -0.14581045,
- -0.08470683],
- [ 0.0531387 , -0.05552092, -0.12808912, -0.15655323,
- -0.02803473],
- [ 0.18570773, 0.00828065, 0.02929196, -0.02115647,
- -0.20298142],
- [ 0.097577 , 0.00784962, -0.18960612, -0.1939976 ,
- -0.02686524]],
-
- [[ 0.05894009, 0.0371513 , -0.147196 , -0.12490672,
- 0.00890823],
- [-0.13846855, 0.0048299 , -0.27920325, -0.10103866,
- 0.19092917],
- [-0.0160551 , 0.04513064, -0.28024583, -0.06436632,
- 0.12552706],
- [-0.06964844, -0.08109376, -0.04003272, 0.13113396,
- 0.0881404 ],
- [ 0. , 0. , 0. , 0. ,
- 0. ],
- [ 0. , 0. , 0. , 0. ,
- 0. ]],
-
- [[-0.07211149, 0.01887877, -0.12521735, -0.05033309,
- 0.10820924],
- [-0.08311677, -0.00736387, -0.17241497, -0.09276841,
- 0.11316439],
- [ 0.10335096, 0.00665376, -0.02476282, -0.21087149,
- -0.11191987],
- [ 0.09959326, 0.06205221, -0.01933063, -0.07346646,
- -0.11060355],
- [-0.04239206, -0.20163064, -0.02140169, -0.02752217,
- 0.04012931],
- [-0.10801504, -0.04490684, -0.02402958, 0.0872379 ,
- 0.08214646]]])]
- [array([2, 3, 5]), LSTMStateTuple(c=array([[ 0.18660455, 0.01673364, -0.43402739, -0.31572953, -0.05726592],
- [-0.18487706, -0.12826426, -0.07106543, 0.46722466, 0.19727486],
- [-0.17872193, -0.10419721, -0.04430988, 0.18415518, 0.12666562]]), h=array([[ 0.097577 , 0.00784962, -0.18960612, -0.1939976 , -0.02686524],
- [-0.06964844, -0.08109376, -0.04003272, 0.13113396, 0.0881404 ],
- [-0.10801504, -0.04490684, -0.02402958, 0.0872379 , 0.08214646]]))]
第一个是输出output,维度还是 [3, 6, 5], 和前面一样,是LSTM每个时刻的隐藏层输出。
关于state,就和前面RNN不太一样了。这里维度是 [2, 3, 5]。
和前面RNN相比多了一个 [3, 5],其原因就是,LSTM和其他RNN模型不太一样的就是,其细胞状态和输出并不是同一个。
具体是通过一个输出门将细胞状态转换为隐藏层状态(即输出)。
所以这里可以看到,state有两部分,一个是c(代表cell state)保存的每个样本的最后时刻的细胞状态。另一个是h(代表hidden state)保存的是每个样本最后时刻的隐藏层状态。
三、实现GRU模型(加入Dropout)
先贴代码:
- import tensorflow as tf
- import numpy as np
-
- # 输入
- X = np.random.randn(3, 6, 4)
- X[1, 4:] = 0
-
- # 序列长度
- X_lengths = [6, 4, 6]
-
- # 隐藏层神经元个数,决定了输出的最后一维维度
- rnn_hidden_size = 5
-
- # 定义GRU的cell
- cell = tf.nn.rnn_cell.GRUCell(num_units=rnn_hidden_size)
- cell = tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=1.0, output_keep_prob=0.8)
-
- # 执行RNN
- o, s = tf.nn.dynamic_rnn(cell=cell, inputs=X, sequence_length=X_lengths, dtype=tf.float64)
-
- sess = tf.Session()
- sess.run(tf.global_variables_initializer())
-
- print(sess.run([tf.shape(o), o]))
- print(sess.run([tf.shape(s), s]))
再再再和前面的代码相比,这里主要两方面的变化,一方面在于cell的定义,这个就比较熟了,
另一个在于,cell上加了一个dropout的wrapper。
dropout在这里主要有三个参数,第一个就是要装饰的cell,第二个就是输入数据的dropout概率,第三个是输出数据的dropout概率。对应RNN类结构,Dropout加在输入和输出上,不会在同一层的隐藏层中使用。
看看输出:
- [array([3, 6, 5]), array([[[-0.00384183, -0.2193063 , 0.0603394 , -0.09774706,
- 0.1135665 ],
- [-0.07668639, -0.39645098, 0.05858838, 0.15644397,
- 0.18005827],
- [-0.06259512, -0.63629247, 0.26543947, -0.10871268,
- 0.22657906],
- [ 0. , -0.57908255, 0.1493446 , 0.07591829,
- 0.27544994],
- [ 0.57788612, 0.29002321, -0.14544183, 0.45116937,
- -0. ],
- [ 0.53739782, 0.21176507, -0.0245097 , 0.08893773,
- -0. ]],
-
- [[-0.21511515, -0.11745295, 0.14412874, -0.06509311,
- 0.0728543 ],
- [-0.12518276, -0.41010908, 0.22555463, -0.26639758,
- 0. ],
- [-0.24114207, -0. , 0.3797578 , -0. ,
- 0.17446607],
- [-0. , -0.83807384, 0.57518991, -0.65002891,
- 0. ],
- [ 0. , 0. , 0. , 0. ,
- 0. ],
- [ 0. , 0. , 0. , 0. ,
- 0. ]],
-
- [[-0.10412225, -0.19353175, 0. , -0.06415311,
- 0.12416278],
- [-0.00095912, -0.08306708, 0.29790358, -0.24395395,
- -0.04457892],
- [ 0.10402355, -0.06345214, 0.09992852, -0.04574931,
- 0.1022426 ],
- [ 0.0832184 , -0.37382437, 0.57320086, -0. ,
- 0. ],
- [ 0. , 0.22174214, 0.15993414, 0. ,
- 0. ],
- [ 0.28724911, 0.07586199, 0.09271125, 0.29748913,
- 0.21664836]]])]
- [array([3, 5]), array([[ 0.42991826, 0.16941206, -0.01960776, 0.07115018, -0.32511025],
- [-0.09532494, -0.67045908, 0.46015193, -0.52002314, 0.08326451],
- [ 0.22979929, 0.06068959, 0.074169 , 0.23799131, 0.17331869]])]
output依旧如前面所言,代表每个时刻的隐藏层输出。
state也回到了原来 [3, 5] 的维度,这是因为在GRU中消除了输出门,细胞状态和隐藏层状态是一个值。
可以看到,输出中有一部分数据重置了。
四、实现双向LSTM
代码。。。
- import tensorflow as tf
- import numpy as np
-
- # 输入
- X = np.random.randn(3, 6, 4)
- X[1, 4:] = 0
-
- # 序列长度
- X_lengths = [6, 4, 6]
-
- # 隐藏层神经元个数,决定了输出的最后一维维度
- rnn_hidden_size = 5
-
- # 定义前后向cell
- f_cell = tf.nn.rnn_cell.LSTMCell(num_units=rnn_hidden_size)
- b_cell = tf.nn.rnn_cell.LSTMCell(num_units=rnn_hidden_size)
-
- # 调用双向RNN函数
- o, s = tf.nn.bidirectional_dynamic_rnn(cell_fw=f_cell, cell_bw=b_cell, inputs=X, sequence_length=X_lengths, dtype=tf.float64)
-
- sess = tf.Session()
- sess.run(tf.global_variables_initializer())
-
- print(sess.run([tf.shape(o), o]))
- print(sess.run([tf.shape(s), s]))
双向LSTM,主要是加入了反向一条反向的LSTM,用以捕捉某时刻的值与后序关联较大但被忽略掉的信息。
所以这里出现了两个RNN cell,这里使用的LSTMcell,可以想象,如果这里换成GRUcell,就变成双向GRU了。
这里没有加Dropout,如果要加的话,分别对前后向cell,使用如(三)所示的方法,加入DropoutWrapper。
看看输出:
- [array([2, 3, 6, 5]), (array([[[ 0.00263425, -0.10848654, -0.02888396, -0.14557681,
- -0.01867951],
- [ 0.16308206, -0.0863213 , -0.058109 , -0.34334372,
- -0.06744579],
- [ 0.16662425, -0.23343837, -0.04667022, -0.21088134,
- -0.05725204],
- [-0.00078778, 0.054576 , -0.11021806, -0.1319716 ,
- 0.11758325],
- [ 0.06200103, -0.07955454, -0.10945861, -0.17705271,
- 0.06469144],
- [ 0.03694782, 0.06173401, -0.07308133, -0.13233934,
- 0.00837825]],
-
- [[ 0.02614167, -0.27913968, -0.01494904, 0.01519353,
- 0.00592987],
- [-0.11470788, -0.08984734, -0.04153365, -0.05076508,
- 0.02336249],
- [ 0.00544636, 0.01089489, -0.07369071, -0.0196202 ,
- 0.0541166 ],
- [-0.06484562, -0.10282789, -0.02678329, -0.12910339,
- -0.07545008],
- [ 0. , 0. , 0. , 0. ,
- 0. ],
- [ 0. , 0. , 0. , 0. ,
- 0. ]],
-
- [[-0.18624838, 0.11949917, 0.02117578, -0.00978313,
- 0.02452981],
- [-0.15181061, 0.24886629, 0.0115148 , -0.01600807,
- 0.07623213],
- [-0.0149377 , 0.29534634, -0.02139397, -0.19715693,
- 0.02354277],
- [-0.22288016, 0.20719902, -0.01278846, -0.24238076,
- -0.01606708],
- [-0.47796464, 0.24820449, -0.02491049, -0.27709986,
- 0.02486796],
- [-0.26716896, 0.10445698, 0.19896813, -0.19703691,
- -0.24232671]]]), array([[[ 4.10128322e-02, -1.51443969e-01, 1.40390125e-01,
- 3.07225130e-02, -1.31468052e-01],
- [ 4.95621797e-02, -1.38678298e-02, 1.02490178e-01,
- 2.87854200e-03, -7.47962853e-02],
- [ 1.33340940e-01, -2.09927857e-01, -1.20134344e-01,
- 1.28247271e-01, -1.12649095e-01],
- [ 1.45445855e-01, -1.04384322e-01, -5.10252886e-02,
- 1.41260135e-01, -1.05015442e-01],
- [-1.79240752e-02, -6.74667093e-02, 7.90996834e-02,
- -4.50249934e-02, -3.64879099e-04],
- [-1.86599105e-01, 9.32626292e-02, 3.97538513e-02,
- -9.79307484e-02, 2.85643007e-02]],
-
- [[ 6.20878725e-02, -3.94152368e-01, 2.82636679e-02,
- 7.36741134e-02, -6.22825883e-02],
- [-2.06465763e-02, -1.15915828e-01, 9.68041335e-02,
- 8.66617924e-03, -5.96928433e-02],
- [-5.18831623e-02, -1.39275191e-01, 1.05362934e-01,
- 1.36670387e-02, -6.52792768e-02],
- [ 2.48354288e-02, -2.72565766e-01, 1.62997637e-01,
- -1.63169607e-03, 2.63353057e-02],
- [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
- 0.00000000e+00, 0.00000000e+00],
- [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
- 0.00000000e+00, 0.00000000e+00]],
-
- [[-4.73881441e-01, 1.60115004e-01, 1.14253624e-01,
- -8.33266211e-02, -3.14134248e-02],
- [-3.02000780e-01, 8.38983607e-02, 9.43078470e-02,
- -1.07013195e-02, -8.76889852e-02],
- [-4.71119928e-02, -3.33859912e-02, 1.43407943e-01,
- -3.39898348e-02, -2.57795278e-02],
- [-2.29723652e-02, -6.25440340e-02, 1.13618231e-01,
- -1.08651666e-01, 8.59181685e-02],
- [-4.48911677e-02, -1.13751661e-02, 5.12618004e-02,
- -1.04067697e-01, 5.27669741e-02],
- [-8.21400131e-02, 3.94837680e-02, 1.43507504e-01,
- -1.51388314e-01, 1.09737389e-01]]]))]
- [array([2, 2, 3, 5]), (LSTMStateTuple(c=array([[ 0.06742202, 0.17478356, -0.12198078, -0.23708507, 0.01693249],
- [-0.10441755, -0.17200567, -0.15135097, -0.32268767, -0.16683725],
- [-0.66716078, 0.32491265, 0.39393026, -0.45806153, -0.41935517]]), h=array([[ 0.03694782, 0.06173401, -0.07308133, -0.13233934, 0.00837825],
- [-0.06484562, -0.10282789, -0.02678329, -0.12910339, -0.07545008],
- [-0.26716896, 0.10445698, 0.19896813, -0.19703691, -0.24232671]])), LSTMStateTuple(c=array([[ 0.12004201, -0.29230004, 0.34609941, 0.05830209, -0.29338321],
- [ 0.41384662, -0.85269306, 0.05049701, 0.0979963 , -0.08568259],
- [-0.65500626, 0.32725095, 0.23971899, -0.28581693, -0.09397309]]), h=array([[ 0.04101283, -0.15144397, 0.14039013, 0.03072251, -0.13146805],
- [ 0.06208787, -0.39415237, 0.02826367, 0.07367411, -0.06228259],
- [-0.47388144, 0.160115 , 0.11425362, -0.08332662, -0.03141342]])))]
output的维度变成了 [2, 3, 6, 5]。与前面的LSTM相比,相当于把两个LSTM的输出([3, 6, 5])拼成一个输出了,意义还是一样的,output[0]代表正向LSTM每个时刻的隐藏层输出,output[1]代表反向LSTM每个时刻的隐藏层输出。
state的维度变成了 [2, 2, 3, 5]。与前面的LSTM相比,相当于把两个LSTM的状态([2, 3, 5])拼成一个状态了,意义也是一样,如output,state[0]代表正向状态,state[1]代表反向状态。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。