赞
踩
在上一个教程中我们说到了NNLM,但是NNLM虽然考虑的一个词前面的词对它的影响,但是没有办法顾忌到后面的词,而且计算量较大,所以可以使用Word2vec中的一个模型CBOW。
目标:通过周围的词预测中心词
w
(
t
)
w(t)
w(t)
目标函数: J = ∑ ω ∈ c o r p u s P ( w ∣ c o n t e n t ( w ) ) J = \sum_{\omega\in corpus}P(w|content(w)) J=∑ω∈corpusP(w∣content(w))
输入:上下文单词的onehot,假设单词向量空间dim为V,上下文单词个数为C,所以输入矩阵维度为 C × V C\times V C×V
PROJECTION:输入的向量每个词的onehot乘上输入权重矩阵W( V × N V\times N V×N)相加求平均作为隐层向量维度为 1 × N 1\times N 1×N
输出:用上面得到的向量乘上输出权重矩阵W’(
N
×
V
N\times V
N×V),得到输出向量维度为
1
×
V
1\times V
1×V,即概率向量
举个例子:
import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim corpus = """We are about to study the idea of a computational process. Computational processes are abstract beings that inhabit computers. As they evolve, processes manipulate other abstract things called data. The evolution of a process is directed by a pattern of rules called a program. People create programs to direct processes. In effect, we conjure the spirits of the computer with our spells.""" # 模型参数 window_size = 2 embeding_dim = 100 hidden_dim = 128 # 数据预处理 sentences = corpus.split() # 分词 words = list(set(sentences)) word_dict = {word: i for i, word in enumerate(words)} # 每个词对应的索引 data = [] # 准备数据 for i in range(window_size, len(sentences)-window_size): content = [sentences[i-1], sentences[i-2], sentences[i+1], sentences[i+2]] target = sentences[i] data.append((content, target)) print(data[:5]) # 处理输入数据 def make_content_vector(content, word_to_ix): idx = [word_to_ix[w] for w in content] return torch.LongTensor(idx) # CBOW模型 class CBOW(nn.Module): def __init__(self, vocab_size, n_dim, window_size, hidden_dim): super(CBOW, self).__init__() self.embedding = nn.Embedding(vocab_size, n_dim) self.linear1 = nn.Linear(2*n_dim*window_size, hidden_dim) self.linear2 = nn.Linear(hidden_dim, vocab_size) def forward(self, X): embeds = self.embedding(X).view(1, -1) out = F.relu(self.linear1(embeds)) out = self.linear2(out) log_probs = F.log_softmax(out, dim=1) return log_probs # 训练模型 model = CBOW(len(word_dict), embeding_dim, window_size, hidden_dim) if torch.cuda.is_available(): model = model.cuda() criterion = nn.NLLLoss() optimizer = optim.SGD(model.parameters(), lr=0.001) for epoch in range(500): total_loss = 0 for content, target in data: content_vector = make_content_vector(content, word_dict) target = torch.tensor([word_dict[target]], dtype=torch.long) if torch.cuda.is_available(): content_vector = content_vector.cuda() target = target.cuda() optimizer.zero_grad() log_probs = model(content_vector) loss = criterion(log_probs, target) loss.backward() optimizer.step() total_loss += loss.item() if (epoch + 1) % 100 == 0: print('Epoch:', '%03d' % (epoch + 1), 'cost =', '{:.6f}'.format(loss))
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。