当前位置: article > 正文

KGE-CL/ Contrastive Learning of Knowledge Graph Embeddings 阅读笔记_contrastive object detection using knowledge graph

作者：不正经 | 2024-06-02 15:14:50

踩

contrastive object detection using knowledge graph embeddings

作者在知识图谱嵌入任务中，针对于不同三元组中相关实体和实体-关系对间的语义相似性的问题，提出一个对于知识图谱嵌入简单且高效的对比学习框架，使用该框架作为一个额外的约束项一同训练KGE，这可以使不同三元组中相关实体和实体-关系对间的语义距离变小，因此可以提升知识图谱嵌入的表现。

0. 前言

Topic

data mine; topic mine;

problems in previous work

大多数以前的知识图谱嵌入模型忽略不同三元组中相关实体和实体-关系对间的语义相似性。

motivation

challenge

1. 作者试图解决什么问题？

提出一个对于知识图谱嵌入简单且高效的对比学习框架，这可以使不同三元组中相关实体和实体-关系对间的语义距离变小，因此可以提升知识图谱嵌入的表现。

2. 这篇论文的关键元素是什么？

contrastive learning；Knowledge graph embedding；

3. 论文中有什内容可以“为你所用”？

作者中使用对比学习方法额外在损失函数上加约束的方式可以借鉴。

4.有哪些参考文献你想继续研究？

5. 还存在什么问题

相当于在原有的KGE基础上，附加了一个约束；
使用对比学习的方式，似乎需要大量的数据预处理工作（寻找正实例对，和负实例对）；

0.1 待学习知识

Contrastive Learning

1 背景知识

Contrastive Learning

The key idea of contrastive learning is pulling the semantically close pairs together and push apart the negative pairs.

2 模型

正样例选择

对于一个头实体来说，具有相同关系和尾实体的头实体；尾实体相似。
对一个实体-关系对来说，具有相同另一个实体的实体-关系对；

编码器

采用一个2层的MLP

对比损失

根据以前存在的框架，使用以下函数：
$\mathrm{CL}\left(\mathbf{z}_{i}, \mathbf{z}_{i}^{+}\right)=\frac{-1}{|P(i)|} \log \frac{\sum_{z_{i}^{+} \in P(i)} e^{\operatorname{sim}\left(\mathbf{z}_{i}, \mathbf{z}_{i}^{+}\right) / \tau}}{\sum_{z_{j} \in N(i)} e^{\operatorname{sim}\left(\mathbf{z}_{i}, \mathbf{z}_{j}\right) / \tau}}$

其中，一个实例 $z_{i}$ 和它的所有正样例 $z_{i}^{+}$ ，黑体为其表示向量。sim是余弦相似度，P(i) 是 minibatch 中所有正例的集合，N(i) 是 batch 中所有负例的集合。

总损失函数为：

\begin{aligned} L_{c} (h_{i}, r_{j}, t_{k}) & = C L (h_{i}, h_{i}^{+}) + C L (t_{k}, t_{k}^{+}) \\ + C L (h_{i} R_{j}, {(h_{i} R_{j})}^{+}) \\ + C L (R_{j} {\bar{t}}_{k}, {(R_{j} {\bar{t}}_{k})}^{+}) \end{aligned}

$\begin{aligned} \mathcal{L}_{c}\left(h_{i}, r_{j}, t_{k}\right) &=\mathrm{CL}\left(\mathbf{h}_{i}, \mathbf{h}_{i}^{+}\right)+\mathrm{CL}\left(\mathbf{t}_{k}, \mathbf{t}_{k}^{+}\right) \\ &+\mathrm{CL}\left(\mathbf{h}_{i} \mathbf{R}_{j},\left(\mathbf{h}_{i} \mathbf{R}_{j}\right)^{+}\right) \\ &+\mathrm{CL}\left(\mathbf{R}_{j} \overline{\mathbf{t}}_{k},\left(\mathbf{R}_{j} \overline{\mathbf{t}}_{k}\right)^{+}\right) \end{aligned}$

L_{c} (h_{i}, r_{j}, t_{k}) = C L (h_{i}, h_{i}^{+}) + C L (t_{k}, t_{k}^{+}) + C L (h_{i} R_{j}, (h_{i} R_{j})^{+}) + C L (R_{j} \overline{t}_{k}, (R_{j} \overline{t}_{k})^{+})

损失求导

\begin{array}{l} h_{i}^{t + 1} = h_{i}^{t} - η \frac{\partial C L (h_{i}, h_{i}^{+})}{\partial h_{i}} \\ = h_{i}^{t} + \frac{η \sum_{h_{i}^{+} \in P (i)} h_{i}^{+}}{τ | P (i) |} - \frac{η \sum_{h_{j} \in N (i)} (e^{(h_{i} \cdot h_{j} / τ)} h_{j})}{τ | P (i) | \sum_{h_{j} \in N (i)} e^{(h_{i} \cdot h_{j} / τ)}} \end{array}

$\begin{array}{l}\mathbf{h}_{i}^{t+1}=\mathbf{h}_{i}^{t}-\eta \frac{\partial \mathrm{CL}\left(\mathbf{h}_{i}, \mathbf{h}_{i}^{+}\right)}{\partial \mathbf{h}_{i}} \\ =\mathbf{h}_{i}^{t}+\frac{\eta \sum_{h_{i}^{+} \in P(i)} \mathbf{h}_{i}^{+}}{\tau|P(i)|}-\frac{\eta \sum_{h_{j} \in N(i)}\left(e^{\left(\mathbf{h}_{i} \cdot \mathbf{h}_{j} / \tau\right)} \mathbf{h}_{j}\right)}{\tau|P(i)| \sum_{h_{j} \in N(i)} e^{\left(\mathbf{h}_{i} \cdot \mathbf{h}_{j} / \tau\right)}}\end{array}$

h_{i}^{t + 1} = h_{i}^{t} - η \frac{\partial C L ( h _{i} , h _{i}^{+} )}{\partial h _{i}} = h_{i}^{t} + \frac{η \sum _{h_{i}^{+} \in P (i)} h _{i}^{+}}{τ ∣ P ( i ) ∣} - \frac{η \sum _{h_{j} \in N (i)} ( e ^{(h_{i} \cdot h_{j} / τ)} h _{j} )}{τ ∣ P ( i ) ∣ \sum _{h_{j} \in N (i)} e ^{(h_{i} \cdot h_{j} / τ)}}

通过求导后梯度下降的方向，可以看到，实例的优化方向和向所有正实例的方式一致，远离负样例的方向；

加权对比损失

作者发现不同的对比损失项对于不同的知识图谱有着不同的影响，所以作者设计一个超参数进行调整：

\begin{aligned} L_{c}^{w} (h_{i}, r_{j}, t_{k}) & = α_{h} C L (h_{i}, h_{i}^{+}) + α_{t} C L (t_{k}, t_{k}^{+}) \\ + α_{h r} C L (h_{i} R_{j}, {(h_{i} R_{j})}^{+}) \\ + α_{t r} C L (R_{j} {\bar{t}}_{k}, {(R_{j} {\bar{t}}_{k})}^{+}) \end{aligned}

$\begin{aligned} \mathcal{L}_{c}^{w}\left(h_{i}, r_{j}, t_{k}\right) &=\alpha_{h} \mathrm{CL}\left(\mathbf{h}_{i}, \mathbf{h}_{i}^{+}\right)+\alpha_{t} \mathrm{CL}\left(\mathbf{t}_{k}, \mathbf{t}_{k}^{+}\right) \\ &+\alpha_{h r} \mathrm{CL}\left(\mathbf{h}_{i} \mathbf{R}_{j},\left(\mathbf{h}_{i} \mathbf{R}_{j}\right)^{+}\right) \\ &+\alpha_{t r} \mathrm{CL}\left(\mathbf{R}_{j} \overline{\mathbf{t}}_{k},\left(\mathbf{R}_{j} \overline{\mathbf{t}}_{k}\right)^{+}\right) \end{aligned}$

L_{c}^{w} (h_{i}, r_{j}, t_{k}) = α_{h} C L (h_{i}, h_{i}^{+}) + α_{t} C L (t_{k}, t_{k}^{+}) + α_{h r} C L (h_{i} R_{j}, (h_{i} R_{j})^{+}) + α_{t r} C L (R_{j} \overline{t}_{k}, (R_{j} \overline{t}_{k})^{+})

训练目标

$\mathcal{L}\left(h_{i}, r_{j}, t_{k}\right)=\mathcal{L}_{s}+\mathcal{L}_{r}+\mathcal{L}_{c}^{w}$

其中 $\mathcal{L}_{s}$ 衡量评分函数输出 $f\left(h_{i}, r_{j}, t_{k}\right)$ 和标签 $X_{ijk}$ 之间差异的损失.(也就是原本的KGE损失函数，此处作者用的是full multiclass log-loss)， $\mathcal{L}_{r}$ 是正则项。

3 实验

常规实验结果，效果不错

作者还展示了RESCAL-DURA和自己提出的方法RESCAL-CL在WN18RR上每种关系的效果对比，展示了其模型效果的普遍性。

作者还额外的提供了T-SNE的可视化展示：

其中，可以看到由于 RESCAL-DURA 无法捕获具有相同实体的对的语义相似性，因此具有相同尾部实体的对的分布仍然很宽。而作者的RESCAL-CL分布更为紧凑。

参考链接

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/不正经/article/detail/663201