【Torch笔记】torch.nn.LSTM 使用方法

作者：小丑西瓜9 | 2024-06-14 07:15:39

踩

torch.nn.lstm

【Torch笔记】torch.nn.LSTM 使用方法

1 基本原理

LSTM，长短期记忆 RNN，是 RNN 的变体，优点在于能学习长期依赖的信息，相当于有记忆功能。

在这里插入图片描述

LSTM 的关键就是 细胞状态（cell state），水平线在图上方贯穿运行。细胞状态类似于传送带，直接在整个链上运行，只有一些少量的线性交互。信息在上面流传比较容易保持不变。

在这里插入图片描述

LSTM 有通过精心设计的称作为“门“的结构来 去除或者增加信息到细胞状态的能力。门是一种让信息选择式通过的方法。他们包含一个 sigmoid 神经网络层和一个按位的乘法操作。sigmoid 层输出 0 到 1 之间的数值，该数值控制着多少量的信息可以通过。0 代表不允许任何信息通过，1 代表任何信息都可通过。

在这里插入图片描述

LSTM 拥有三个门，来保护和控制细胞状态。

首先是遗忘门，它决定会从细胞状态中丢弃什么信息。 $f_t = 0$ 表示完全舍弃， $f_t = 1$ 表示完全保留。

在这里插入图片描述

然后是确定什么样的新信息被存放在细胞状态中。这里包含两个部分。第一，sigmoid 层称 “输入门层” 决定什么值我们将要更新。然后，一个 tanh 层创建一个新的候选值向量。 $\tilde{C}_t$ 会被加入到状态中。

在这里插入图片描述

然后就是对细胞状态进行更新。

在这里插入图片描述

最终，需要确定输出值 $h_t$ 。这个输出将会基于当前的细胞状态，但是也是一个过滤后的版本。首先，运行一个 sigmoid 层来确定隐藏状态的哪个部分将输出出去。接着，把细胞状态通过 tanh 进行处理（得到一个在 -1 到 1 之间的值）并将它和 sigmoid 门的输出相乘，最终输出该部分。

在这里插入图片描述

于是，整个传播过程由公式表示如下：
$\begin{aligned} i_{t} &=\sigma\left(W_{i i} x_{t}+b_{i i}+W_{h i} h_{t-1}+b_{h i}\right) \\ f_{t} &=\sigma\left(W_{i f} x_{t}+b_{i f}+W_{h f} h_{t-1}+b_{h f}\right) \\ g_{t} &=\tanh \left(W_{i g} x_{t}+b_{i g}+W_{h g} h_{t-1}+b_{h g}\right) \\ o_{t} &=\sigma\left(W_{i o} x_{t}+b_{i o}+W_{h o} h_{t-1}+b_{h o}\right) \\ c_{t} &=f_{t} \odot c_{t-1}+i_{t} \odot g_{t} \\ h_{t} &=o_{t} \odot \tanh \left(c_{t}\right) \end{aligned}$

2 Torch.nn.LSTM

2.1 相关参数

input_size：输入 $x$ 中的预期特征数；
hidden_size：隐藏状态 $h$ 的特征个数；
num_layers：LSTM 的层数。例如，设置 num_layers=2 意味着将两个 LSTM 堆叠在一起形成一个堆叠的 LSTM，第二个 LSTM 接收第一个 LSTM 的输出并计算最终结果。默认值：1；
bias：如果为 False，则该层不使用偏置权重 b_ih 和 b_hh。默认为 True；
batch_first：如果为 True，则输入和输出张量提供为 (batch, seq, feature) 而不是 (seq, batch, feature)。请注意，这不适用于隐藏或单元状态。默认值：False；
dropout：如果非零，则在除最后一层之外的每个 LSTM 层的输出上引入一个 Dropout 层，dropout 概率等于 dropout。默认值：0；
bidirectional：如果为 True，则成为双向 LSTM。默认值：False;
proj_size：如果大于 0，将使用具有相应大小的投影的 LSTM。默认值：0。

2.2 输入：input, (h_0, c_0)

input：输入序列的特征。对于 unbatched 输入，tensor 的 shape 为 $L, H_{in})$ ；当 batch_first=False 时，shape 为 $L, N, H_{in})$ ；当 batch_first=True 时，shape 为 $N, L, H_{in})$ ；
h_0：输入中每个元素的初始隐藏状态。对于 unbatched 输入，tensor 的 shape 为 $num_layers , H o u t ) (D * \text{num\_layers}, H_{out})$ ；否则为 $num_layers, N, H out ) (D * \text { num\_layers, N, } H_{\text {out}})$ 。没用提供时，默认为0；
c_0：输入中每个元素的初始细胞状态。对于 unbatched 输入，tensor 的 shape 为 $num_layers, H cell ) (D * \text { num\_layers, } H_{\text {cell}})$ ；否则为 $num_layers, N, H cell ) (D * \text { num\_layers, N, } H_{\text {cell}})$ 。没用提供时，默认为0。

2.3 输出：output, (h_n, c_n)

output：包含来自 LSTM 在每个 $t$ 的最后一层的输出特征 h_t。对于 unbatched 输入，shape 为 $\left(L, \ D * H_{\text {out }}\right)$ ，当 batch_first=False 时，shape 为 $L, N, D * H_{out})$ ；当 batch_first=True 时，shape 为 $N, L, D * H_{out})$ ；
h_n：序列中每个元素的最终隐藏状态。对于 unbatched 输入，tensor 的 shape 为 $num_layers, H out ) (D * \text { num\_layers, } H_{\text {out}})$ ；否则为 $num_layers, N, H out ) (D * \text { num\_layers, N, } H_{\text {out}})$ ；
c_n：输入中每个元素的初始细胞状态。对于 unbatched 输入，tensor 的 shape 为 $num_layers, H cell ) (D * \text { num\_layers, } H_{\text {cell}})$ ；否则为 $num_layers, N, H cell ) (D * \text { num\_layers, N, } H_{\text {cell}})$ 。

Reference

官网文档链接：https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#torch.nn.LSTM
LSTM介绍：https://www.jianshu.com/p/9dc9f41f0b29

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/小丑西瓜9/article/detail/716958