赞
踩
1. Explanation
To address this, the transformer adds a vector to each input embedding. These vectors follow a specific pattern that the model learns, which helps it determine the position of each word, or the distance between different words in the sequence. The intuition here is that adding these values to the embeddings provides meaningful distances between the embedding vectors once they’re projected into Q/K/V vectors and during dot-product attention.
interweaves the two signals
2. Code
Code from https://www.tensorflow.org/tutorials/text/transformer
import numpy as np import matplotlib.pyplot as plt # Code from https://www.tensorflow.org/tutorials/text/transformer def get_angles(pos, i, d_model): angle_rates = 1 / np.power(10000, (2 * (i//2)) / np.float32(d_model)) return pos * angle_rates def positional_encoding(position, d_model): angle_rads = get_angles(np.arange(position)[:, np.newaxis], np.arange(d_model)[np.newaxis, :], d_model) # apply sin to even indices in the array; 2i angle_rads[:, 0::2] = np.sin(angle_rads[:, 0::2]) # apply cos to odd indices in the array; 2i+1 angle_rads[:, 1::2] = np.cos(angle_rads[:, 1::2]) pos_encoding = angle_rads[np.newaxis, ...] return pos_encoding # visualisation tokens = 10 dimensions = 64 pos_encoding = positional_encoding(tokens, dimensions) print (pos_encoding.shape) plt.figure(figsize=(12,8)) plt.pcolormesh(pos_encoding[0], cmap='viridis') plt.xlabel('Embedding Dimensions') plt.xlim((0, dimensions)) plt.ylim((tokens,0)) plt.ylabel('Token Position') plt.colorbar() plt.show()
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。