赞
踩
Course Certificate
本文是学习这门课 Natural Language Processing with Attention Models的学习笔记,如有侵权,请联系删除。
Compare RNNs and other sequential models to the more modern Transformer architecture, then create a tool that generates text summaries.
In the image above, you can see a typical RNN that is used to translate the English sentence “How are you?” to its French equivalent, “Comment allez-vous?”. One of the biggest issues with these RNNs, is that they make use of sequential computation. That means, in order for your code to process the word “you”, it has to first go through “How” and “are”. Two other issues with RNNs are the:
In contrast, transformers are based on attention and don’t require any sequential computation per layer, only a single step is needed. Additionally, the gradient steps that need to be taken from the last output to the first input in a transformer is just one. For RNNs, the number of steps increases with longer sequences. Finally, transformers don’t suffer from vanishing gradients problems that are related to the length of the sequences.
We are going to talk more about how the attention component works with transformers. So don’t worry about it for now
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。