一键难忘520

这个屌丝很懒，什么也没留下！

热门标签

Python深度学习基于Tensorflow（8）自然语言处理基础

作者：一键难忘520 | 2024-07-24 17:31:16

踩

RNN 模型

与前后顺序有关的数据称为序列数据，对于序列数据，我们可以使用循环神经网络进行处理，循环神经网络RNN已经成功的运用于自然语言处理，语音识别，图像标注，机器翻译等众多时序问题；RNN模型有以下类别：

![[RNN.jpg]]

SRNN模型

对于全连接模型， $\in \mathcal{R}^{nd}$ ， $W_X \in \mathcal{R}^{dh}$ ， $\in \mathcal{R}^{1h}$ ： $H=f(XW_X+B)$
有 $\in \mathcal{R}^{nh}$ ，当 $X_i \in \mathcal{R}^{1d}$ ，有 $H_i = f(X_iW_X+B)$
为了让隐藏层当前位置结合上一个位置的信息，我们可以把函数更改为： $H_i=f(X_iW_X+B+H_{i-1})$
这样上一个位置的信息不够灵活，为了让信息更加灵活，最好也给上一个位置的信息加个权重进行处理，有 $H_i=f(X_iW_X+H_{i-1}W_H+B)$
最后优化一下： $H_i=f([X_i,H_{t-1}]W+B)$
如此便是RNN的核心，输出可以利用如下生成： $O_t=H_iW_O+B_O$
不加 $W_H$ 会出现这样的结果 $H_t=f(T_i+f(T_{i-1}+\dots+f(T_{1}+H_{0})))$
很明显，只是单纯的加减变换转非线性，灵活性不够，如果去掉激活函数，明显只是前一个位置的值和本位置的值相加；

由于每一个单元都对应一个隐藏单元，最后一个隐藏单元的信息结合了前面所有单元的信息，同时每一个输出单元是依据对应的隐藏单元决定的，这样可以对应多对一任务（最后一个隐藏单元）和多对多任务（所有的隐藏单元）；

SRNN的不足

对应添加 $W_{H}$ ，有这样的结果 $H_t=f(T_i+f(T_{i-1}+\dots+f(T_{1}+H_{0}W_H)W_H))\dots )W_H)$
可以发现 $W_H$ 被反复的相乘了，由于其是方阵，假设可对角化，有 $W_H=P^{-1} \Lambda P$
如果 $n$ 个 $W_H$ 相乘有 $W_H^n=P^{-1}\Lambda^n P$
这样随着 $n$ 的增大， $\Lambda$ 对角线上大于1的值就会越来越大，对角线上小于1的值就会越来越小，趋近于0；这就照成了RNN的短记忆性问题，为了延长记忆性，提取了LSTM和GRU模型；

LSTM

LSTM(Long Short-Term Memory)称为长短期记忆网络，最早是由 Hochreiter 和 Schmidhuber 在1997年提出，能够有效的解决信息的长期依赖，避免梯度消失或者爆炸；

LSTM在SRNN内部添加了很多的门控元素，这些元素可以把数据中的重要特征保存下来，可以有效的延长模型的记忆长度，但是LSTM的缺点十分明显，那就是模型结构过于复杂导致计算量太大，模型训练速度过慢。

\begin{aligned} \begin{aligned} f_{t} & = σ (W_{x f} x_{t} + W_{h f} h_{t - 1} + W_{c f} c_{t - 1} + b_{f}) \\ i_{t} & = σ (W_{x i} x_{t} + W_{h i} h_{t - 1} + W_{c i} c_{t - 1} + b_{i}) \\ o_{t} & = σ (W_{x o} x_{t} + W_{h o} h_{t - 1} + W_{c o} c_{t} + b_{o}) \\ c_{t} & = f_{t} \circ c_{t - 1} + i_{t} \circ \tanh (W_{x c} x_{t} + W_{h c} h_{t - 1} + b_{c}) \\ h_{t} & = o_{t} \circ \tanh (c_{t}) \\ y_{t} & = W_{h y} h_{t} + b_{y} \end{aligned} \end{aligned}

$\begin{align} \begin{split} f_t & = \sigma(W_{xf} x_t + W_{hf} h_{t-1} + W_{cf} c_{t-1} + b_f) \\ i_t & = \sigma(W_{xi} x_t + W_{hi} h_{t-1} + W_{ci} c_{t-1} + b_i) \\ o_t & = \sigma(W_{xo} x_t + W_{ho} h_{t-1} + W_{co} c_t + b_o) \\ c_t & = f_t \circ c_{t-1} + i_t \circ \tanh(W_{xc} x_t + W_{hc} h_{t-1} + b_c) \\ h_t & = o_t \circ \tanh(c_t) \\ y_t & = W_{hy} h_t + b_y \end{split} \end{align}$

f_{t} i_{t} o_{t} c_{t} h_{t} y_{t} = σ (W_{x f} x_{t} + W_{h f} h_{t - 1} + W_{c f} c_{t - 1} + b_{f}) = σ (W_{x i} x_{t} + W_{hi} h_{t - 1} + W_{c i} c_{t - 1} + b_{i}) = σ (W_{x o} x_{t} + W_{h o} h_{t - 1} + W_{co} c_{t} + b_{o}) = f_{t} \circ c_{t - 1} + i_{t} \circ tanh (W_{x c} x_{t} + W_{h c} h_{t - 1} + b_{c}) = o_{t} \circ tanh (c_{t}) = W_{h y} h_{t} + b_{y}

GRU

GRU(Gated Recurrent Unit)在LSTM的基础上进行了改良，通过损失了一些记忆力的方式加快训练速度。GRU相较于LSTM少了一个记忆单元，其记忆长度相对减弱了一些，但是仍然远超过RNN，遗忘问题相对不容易发生。

\begin{aligned} \begin{aligned} z_{t} & = σ (W_{x z} x_{t} + W_{h z} h_{t - 1} + b_{z}) \\ r_{t} & = σ (W_{x r} x_{t} + W_{h r} h_{t - 1} + b_{r}) \\ \tilde{h_{t}} & = \tanh (W_{x h} x_{t} + W_{h h} (r_{t} \circ h_{t - 1}) + b_{h}) \\ h_{t} & = (1 - z_{t}) \circ h_{t - 1} + z_{t} \circ \tilde{h_{t}} \\ y_{t} & = W_{h y} h_{t} + b_{y} \end{aligned} \end{aligned}

$\begin{align} \begin{split} z_t & = \sigma(W_{xz} x_t + W_{hz} h_{t-1} + b_z) \\ r_t & = \sigma(W_{xr} x_t + W_{hr} h_{t-1} + b_r) \\ \tilde{h_t} & = \tanh(W_{xh} x_t + W_{hh} (r_t \circ h_{t-1}) + b_h) \\ h_t & = (1 - z_t) \circ h_{t-1} + z_t \circ \tilde{h_t} \\ y_t & = W_{hy} h_t + b_y \end{split} \end{align}$

z_{t} r_{t} \tilde{h_{t}} h_{t} y_{t} = σ (W_{x z} x_{t} + W_{h z} h_{t - 1} + b_{z}) = σ (W_{x r} x_{t} + W_{h r} h_{t - 1} + b_{r}) = tanh (W_{x h} x_{t} + W_{hh} (r_{t} \circ h_{t - 1}) + b_{h}) = (1 - z_{t}) \circ h_{t - 1} + z_{t} \circ \tilde{h_{t}} = W_{h y} h_{t} + b_{y}

Bi-RNN

由于RNN只能单项传播信息，Bi-RNN利用两个单项RNN解决这一问题
在这里插入图片描述

Bi-RNN 模型的输出值是通过前向和后向两个的输出值拼接得到；

使用LSTM完成文本分类

这里以互联网电影资料库(Internet Movie Database)的评论来做一个评价好坏的二分类任务；

数据导入：

import numpy as np
import tensorflow as tf

maxlen = 200
max_features = 20000

(x_train, y_train), (x_val, y_val) = tf.keras.datasets.imdb.load_data(num_words=max_features)
1
2
3
4
5
6
7

数据集构建：

x_train = tf.keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_val = tf.keras.preprocessing.sequence.pad_sequences(x_val, maxlen=maxlen)

train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(64)
val_data = tf.data.Dataset.from_tensor_slices((x_val, y_val)).batch(64)
1
2
3
4
5

模型创建：

class CustomModel(tf.keras.Model):
    def __init__(self):
        super(CustomModel, self).__init__()
        self.embedding = tf.keras.layers.Embedding(max_features, 128)
        self.lstm_1 = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True))
        self.lstm_2 = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64))
        self.dense_final = tf.keras.layers.Dense(1, activation='sigmoid')

    def call(self, x):
        x = self.embedding(x)
        x = self.lstm_1(x)
        x = self.lstm_2(x)
        x = self.dense_final(x)
        return x
1
2
3
4
5
6
7
8
9
10
11
12
13
14

模型配置：

model = CustomModel()
model.compile(
    loss=tf.keras.losses.binary_crossentropy,
    optimizer='adam',
    metrics=['accuracy']
)
1
2
3
4
5
6

开始训练：

model.fit(train_data, epochs=10, validation_data=val_data)
1

训练结果：

Epoch 1/4
391/391 [==============================] - 43s 100ms/step - loss: 0.3848 - accuracy: 0.8215 - val_loss: 0.3382 - val_accuracy: 0.8538
Epoch 2/4
391/391 [==============================] - 38s 98ms/step - loss: 0.1943 - accuracy: 0.9288 - val_loss: 0.3826 - val_accuracy: 0.8511
Epoch 3/4
391/391 [==============================] - 38s 98ms/step - loss: 0.1680 - accuracy: 0.9397 - val_loss: 0.3751 - val_accuracy: 0.8548
Epoch 4/4
391/391 [==============================] - 38s 98ms/step - loss: 0.1076 - accuracy: 0.9629 - val_loss: 0.5110 - val_accuracy: 0.8176
1
2
3
4
5
6
7
8

结合使用CNN和RNN完成图片分类

这里继续使用CIFAR-10分类任务的数据

数据导入：

import matplotlib.pyplot as plt
import tensorflow as tf

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
# x_train.shape, y_train.shape, x_test.shape, y_test.shape
# ((50000, 32, 32, 3), (50000, 1), (10000, 32, 32, 3), (10000, 1))

index_name = {
    0:'airplane',
    1:'automobile',
    2:'bird',
    3:'cat',
    4:'deer',
    5:'dog',
    6:'frog',
    7:'horse',
    8:'ship',
    9:'truck'
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

数据集创建：

x_train = x_train / 255.0
x_test = x_test / 255.0

y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)
train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(64)
test_data = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(64)
1
2
3
4
5
6
7

先CNN，再RNN

创建模型：

class CustomModel(tf.keras.Model):
    def __init__(self):
        super(CustomModel, self).__init__()
        self.conv = tf.keras.layers.Conv2D(32, 3, padding='same')
        self.bn = tf.keras.layers.BatchNormalization(axis=-1)
        self.max_pool = tf.keras.layers.MaxPooling2D(strides=2, padding='same')
        self.reshape = tf.keras.layers.Reshape(target_shape=(-1, 32))
        self.lstm = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(50, return_sequences=False))
        self.flatten = tf.keras.layers.Flatten()
        self.dense_final = tf.keras.layers.Dense(10, activation='softmax')

    def call(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.max_pool(x)
        x = self.reshape(x)
        x = self.lstm(x)
        x = self.flatten(x)
        x = self.dense_final(x)
        return x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

定义模型：

model = CustomModel()
model.compile(
    loss=tf.keras.losses.CategoricalCrossentropy(),
    optimizer='adam',
    metrics=['accuracy']
)
model.fit(train_data, epochs=3, validation_data=test_data)
1
2
3
4
5
6
7

训练结果：

Epoch 1/3
782/782 [==============================] - 36s 40ms/step - loss: 1.7976 - accuracy: 0.3488 - val_loss: 2.3082 - val_accuracy: 0.2340
Epoch 2/3
782/782 [==============================] - 30s 39ms/step - loss: 1.5557 - accuracy: 0.4350 - val_loss: 1.5993 - val_accuracy: 0.4222
Epoch 3/3
782/782 [==============================] - 30s 39ms/step - loss: 1.4037 - accuracy: 0.4898 - val_loss: 1.5229 - val_accuracy: 0.4519
1
2
3
4
5
6

结合使用CNN和RNN

class CustomModel(tf.keras.Model):
    def __init__(self):
        super(CustomModel, self).__init__()
        self.flatten = tf.keras.layers.Flatten()

        self.conv = tf.keras.layers.Conv2D(32, 3, padding='same')
        self.bn = tf.keras.layers.BatchNormalization(axis=-1)
        self.max_pool = tf.keras.layers.MaxPooling2D(strides=2, padding='same')
        self.dense_final_1 = tf.keras.layers.Dense(60, activation='relu')
        self.reshape = tf.keras.layers.Reshape(target_shape=[-1, 32])
        
        self.lstm = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(50, return_sequences=False))
        self.dense_final_2 = tf.keras.layers.Dense(60, activation='relu')

        self.concat = tf.keras.layers.Concatenate()
        self.dense_final = tf.keras.layers.Dense(10, activation='softmax')

    def call(self, x):
        x1 = self.conv(x)
        x1 = self.bn(x1)
        x1 = self.max_pool(x1)
        x1 = self.flatten(x1)
        x1 = self.dense_final_1(x1)
        
        x2 = self.reshape(x)
        x2 = self.lstm(x2)
        x2 = self.flatten(x2)
        x2 = self.dense_final_2(x2)

        x = self.concat([x1, x2])
        x = self.dense_final(x)
        return x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

定义模型：

model = CustomModel()
model.compile(
    loss=tf.keras.losses.CategoricalCrossentropy(),
    optimizer='adam',
    metrics=['accuracy']
)

model.fit(train_data, epochs=3, validation_data=test_data)
1
2
3
4
5
6
7
8

训练结果：

Epoch 1/3
782/782 [==============================] - 19s 22ms/step - loss: 1.5181 - accuracy: 0.4642 - val_loss: 1.4310 - val_accuracy: 0.4992
Epoch 2/3
782/782 [==============================] - 16s 21ms/step - loss: 1.1988 - accuracy: 0.5753 - val_loss: 1.4605 - val_accuracy: 0.5062
Epoch 3/3
782/782 [==============================] - 16s 21ms/step - loss: 1.0663 - accuracy: 0.6258 - val_loss: 1.5351 - val_accuracy: 0.5000
1
2
3
4
5
6

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/一键难忘520/article/detail/876014