当前位置:   article > 正文

cs224(2019)-assignment4(作业4)_cs224n assignment4

cs224n assignment4

作业4主要是NMT任务。

 

(1)完成对不同长度的句子的填充操作,使其保持相同长度。(utils.py)文件

  1. def pad_sents(sents, pad_token):
  2. sents_padded = []
  3. ### YOUR CODE HERE (~6 Lines)
  4. max_len = len(sents[0])
  5. for sentence in sents[1:] :
  6. flag = len(sentence)
  7. if flag > max_len:
  8. max_len = flag
  9. for sentence in sents:
  10. if len(sentence)< max_len:
  11. for i in range(len(sentence),max_len):
  12. sentence.append(pad_token)
  13. sents_padded.append(sentence)
  14. ### END YOUR CODE
  15. return sents_padded

由于它没有给测试,所以自己造个句子测试。

  1. l = [['i','want','hate','you'],['i','think','you','are','bad'],['i','like','you']]
  2. print(pad_sents(l,'0')) #sentence,pad_token
  3. output:
  4. [['i', 'want', 'hate', 'you', '0'], ['i', 'think', 'you', 'are', 'bad'], ['i', 'like', 'you', '0', '0']]

(2)利用nn.embedding结构初始化source和target(model_embeddings.py)文件

  1. class ModelEmbeddings(nn.Module):
  2. def __init__(self, embed_size, vocab):
  3. super(ModelEmbeddings, self).__init__()
  4. self.embed_size = embed_size
  5. # default values
  6. self.source = None
  7. self.target = None
  8. src_pad_token_idx = vocab.src['<pad>']
  9. tgt_pad_token_idx = vocab.tgt['<pad>']
  10. ### YOUR CODE HERE (~2 Lines)
  11. self.source = nn.Embedding(len(vocab.src),self.embed_size,src_pad_token_idx)
  12. self.target = nn.Embedding(len(vocab.tgt),self.embed_size,tgt_pad_token_idx)
  13. ### END YOUR CODE

(3) 建立NMT的网络结构,其中encoder利用的是Bi-LSTM,coder利用的是LSTM,加了一个多头attention机制。

  1. class NMT(nn.Module):
  2. def __init__(self, embed_size, hidden_size, vocab, dropout_rate=0.2):
  3. super(NMT, self).__init__()
  4. self.model_embeddings = ModelEmbeddings(embed_size, vocab)
  5. ## 初始化ModelEmbeddings类
  6. self.hidden_size = hidden_size
  7. self.dropout_rate = dropout_rate
  8. self.vocab = vocab
  9. # default values
  10. self.encoder = None
  11. self.decoder = None
  12. self.h_projection = None
  13. self.c_projection = None
  14. self.att_projection = None
  15. self.combined_output_projection = None
  16. self.target_vocab_projection = None
  17. self.dropout = None
  18. ### YOUR CODE HERE (~8 Lines)
  19. self.encoder = nn.LSTM(embed_size,hidden_size,bidirectional=True)
  20. self.decoder = nn.LSTMCell(hidden_size+embed_size,hidden_size,bias=True)
  21. self.h_projection = nn.Linear(hidden_size*2,hidden_size,bias=False)
  22. self.c_projection = nn.Linear(hidden_size*2,hidden_size,bias=False)
  23. self.att_projection = nn.Linear(hidden_size*2,hidden_size,bias=False)
  24. self.combined_output_projection = nn.Linear(hidden_size*3,hidden_size,bias=False)
  25. self.target_vocab_projection = nn.Linear(hidden_size,len(vocab.tgt),bias=False)
  26. self.dropout = nn.Dropout(dropout_rate)
  27. ### END YOUR CODE

这块看起来还是比较简单的,因为各个连接层的维度,PDF作业中已经说明了。

(d)主要是建立encode过程

         input:source_sentence

        output:每个h,最后一个h和c(最后一个h和c主要是用作为decoder的input的)(每个h主要是用来做attention的)

  1. def encode(self, source_padded: torch.Tensor, source_lengths: List[int]) -> Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
  2. enc_hiddens, dec_init_state = None, None
  3. ### YOUR CODE HERE
  4. source_embeddings = self.model_embeddings.source(source_padded)
  5. #->(src_len, b, e)
  6. X = pack_padded_sequence(source_embeddings,source_lengths,batch_first=False,enforce_sorted=False)
  7. ### ->进行压缩处理,按列压缩,每列是一个句子
  8. ### batch_first=False(default,变成 b scr_len *)
  9. ### enforce_sorted=True(default,句子按照source_lengths排序)
  10. ### https://www.cnblogs.com/sbj123456789/p/9834018.html
  11. enc_hiddens,(last_hidden,last_cell) = self.encoder(X) ### Enconder
  12. ### enc_hiddens(src_len b, h*2)
  13. enc_hiddens = pad_packed_sequence(enc_hiddens,batch_first=True)
  14. enc_hiddens = enc_hiddens[0] ### ->(b,src_len,h*2)
  15. init_decoder_hidden = self.h_projection(torch.cat((last_hidden[0],last_hidden[1]),1))
  16. init_decoder_cell = self.c_projection(torch.cat((last_cell[0],last_cell[1]),1))
  17. dec_init_state = (init_decoder_hidden,init_decoder_cell)
  18. ### END YOUR CODE
  19. return enc_hiddens, dec_init_state

(e)主要是建立encode过程

       (1)因为attention机制利用的是

                                                                          

               所以后面那部分可以先计算出来,然后在等待每一个时间点的 h_dec,这块就是self.att_projection层的作用

(2)

  1. def decode(self, enc_hiddens: torch.Tensor, enc_masks: torch.Tensor,
  2. dec_init_state: Tuple[torch.Tensor, torch.Tensor], target_padded: torch.Tensor) -> torch.Tensor:
  3. # Chop of the <END> token for max length sentences.
  4. target_padded = target_padded[:-1]
  5. # Initialize the decoder state (hidden and cell)
  6. dec_state = dec_init_state
  7. # Initialize previous combined output vector o_{t-1} as zero
  8. batch_size = enc_hiddens.size(0) ### 句子个数
  9. o_prev = torch.zeros(batch_size, self.hidden_size, device=self.device)
  10. combined_outputs = []
  11. ### YOUR CODE HERE
  12. enc_hiddens_proj = self.att_projection(enc_hiddens)
  13. ###
  14. Y = self.model_embeddings.target(target_padded) #->(tgt_len, b, e)
  15. for ids,y_t in enumerate(torch.split(Y,1,0)): #->(1,b,e)
  16. y_t = torch.squeeze(y_t) #->(b,e)
  17. Ybar_t = torch.cat((y_t,o_prev),1) #->(b,e) +(b,h) = (b,e+h)
  18. ###增加一个维度
  19. o_t,cell,sate = self.step(Ybar_t,dec_state,enc_hiddens,enc_hiddens_proj,enc_masks)
  20. combined_outputs.append(o_t[0])
  21. o_prev = o_t[0]
  22. combined_outputs = torch.stack(combined_outputs,dim=0) ## ->(tgt_len, b, e)
  23. ### END YOUR CODE
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小丑西瓜9/article/detail/629816
推荐阅读
相关标签
  

闽ICP备14008679号