当前位置:   article > 正文

Coursera自然语言处理专项课程04:Natural Language Processing with Attention Models笔记 Week02

Coursera自然语言处理专项课程04:Natural Language Processing with Attention Models笔记 Week02

Natural Language Processing with Attention Models

Course Certificate

在这里插入图片描述

本文是学习这门课 Natural Language Processing with Attention Models的学习笔记,如有侵权,请联系删除。

在这里插入图片描述

Text Summarization

Compare RNNs and other sequential models to the more modern Transformer architecture, then create a tool that generates text summaries.

Learning Objectives


  • Describe the three basic types of attention
  • Name the two types of layers in a Transformer
  • Define three main matrices in attention
  • Interpret the math behind scaled dot product attention, causal attention, and multi-head attention
  • Use articles and their summaries to create input features for training a text summarizer
  • Build a Transformer decoder model (GPT-2)

Transformers vs RNNs

在这里插入图片描述

In the image above, you can see a typical RNN that is used to translate the English sentence “How are you?” to its French equivalent, “Comment allez-vous?”. One of the biggest issues with these RNNs, is that they make use of sequential computation. That means, in order for your code to process the word “you”, it has to first go through “How” and “are”. Two other issues with RNNs are the:

  • Loss of information: For example, it is harder to keep track of whether the subject is singular or plural as you move further away from the subject.
  • Vanishing Gradient: when you back-propagate, the gradients can become really small and as a result, your model will not be learning much.

In contrast, transformers are based on attention and don’t require any sequential computation per layer, only a single step is needed. Additionally, the gradient steps that need to be taken from the last output to the first input in a transformer is just one. For RNNs, the number of steps increases with longer sequences. Finally, transformers don’t suffer from vanishing gradients problems that are related to the length of the sequences.

We are going to talk more about how the attention component works with transformers. So don’t worry about it for now

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/351274
推荐阅读
相关标签