当前位置:   article > 正文

简单的贝叶斯高斯张量分解模型和代码实现

贝叶斯高斯张量

1、简单的贝叶斯高斯张量分解模型

先来看一个极简的模型。
y i j t ∼ N ( ∑ s = 1 r u i s v j s x t s , τ − 1 ) y_{ijt} \sim \mathcal{N}(\sum_{s=1}^r u_{is} v_{js} x_{ts},\tau^{-1}) yijtN(s=1ruisvjsxts,τ1)

  • 模型参数的先验分布:
    u i , v j , x t ∼ N ( 0 , [ d i a g ( λ ) ] − 1 ) , ∀ i , j , t u_i,v_j,x_t \sim \mathcal{N}(0,[diag(\lambda)]^{-1}),\forall i,j,t ui,vj,xtN(0,[diag(λ)]1),i,j,t
    τ ∼ G a m m a ( α , β ) \tau \sim Gamma(\alpha,\beta) τGamma(α,β)
  • 超参数的先验分布:
    λ s ∼ G a m m a ( α , β ) , s = 1 , 2 , … , r \lambda_s \sim Gamma(\alpha,\beta),s=1,2,\dots,r λsGamma(α,β),s=1,2,,r
2.2.1 推导模型参数 u i u_i ui 的后验分布

就模型参数 u i u_i ui 而言,似然来自 Y : j t \mathcal{Y}_{:jt} Y:jt 中被观测到的元素:
L ( Y : j t ∣ u i , V , X , τ ) \mathcal{L} (\mathcal{Y}_{:jt} | u_i,V,X,\tau) L(Y:jtui,V,X,τ)
∝ ∏ : , j , t e − 1 2 τ ( y i j t − u i T w j t ) 2 \Large \propto \prod_{:,j,t} e^{- \frac{1}{2} \tau(y_{ijt} - u_i^Tw_{jt})^2} :,j,te21τ(yijtuiTwjt)2
是很多高斯分布的乘积。其中 u i T w j t = ∑ s = 1 r u i s v j s x t s , w j t = ( v j ⊛ x t ) , ⊛ u_i^T w_{jt} = \sum_{s=1}^r u_{is} v_{js} x_{ts} ,w_{jt} = (v_{j} \circledast x_t) ,\circledast uiTwjt=s=1ruisvjsxts,wjt=(vjxt), 代表点乘。
∝ ∏ : , j , t e − 1 2 τ ( y i j t − u i T w j t ) ( y i j t − u i T w j t ) T \Large \propto \prod_{:,j,t} e^{- \frac{1}{2} \tau (y_{ijt} - u_i^Tw_{jt}) (y_{ijt} - u_i^Tw_{jt})^T} :,j,te21τ(yijtuiTwjt)(yijtuiTwjt)T
∝ e − 1 2 u i T ( τ ∑ : , j , t w j t w j t T ) u i + 1 2 u i T ( τ ∑ : , j , t y i j t w j t ) \Large \propto e^{- \frac{1}{2} u_i^T (\tau \sum_{:,j,t} w_{jt} w_{jt}^T) u_i + \frac{1}{2} u_i^T (\tau \sum_{:,j,t} y_{ijt} w_{jt})} e21uiT(τ:,j,twjtwjtT)ui+21uiT(τ:,j,tyijtwjt)
可以得到关于 u i u_i ui 的多元正态分布
由于 u i ∼ N ( 0 , [ d i a g ( λ ) ] − 1 ) u_i \sim \mathcal{N}(0,[diag(\lambda)]^{-1}) uiN(0,[diag(λ)]1)
p ( u i ∣ λ ) ∝ e − 1 2 u i T d i a g ( λ ) u i \Large p(u_i | \lambda) \propto e^{-\frac{1}{2} u_i^T diag(\lambda) u_i} p(uiλ)e21uiTdiag(λ)ui
根据贝叶斯准则 p o s t e r i o r ∝ p r i o r × l i k e h o o d posterior \propto prior \times likehood posteriorprior×likehood
p ( u i ∣ V , X , τ , Y : j , t , λ ) ∝ p ( u i ∣ λ ) × L ( Y : j , t ∣ u i , V , X , τ ) p(u_i | V,X,\tau,\mathcal{Y}_{:j,t},\lambda) \propto p(u_i | \lambda) \times \mathcal{L}(\mathcal{Y}_{:j,t} |u_i,V,X,\tau) p(uiV,X,τ,Y:j,t,λ)p(uiλ)×L(Y:j,tui,V,X,τ)
∝ e − 1 2 u i T d i a g ( λ ) u i e − 1 2 u i T ( τ ∑ : , j , t w j t w j t T ) u i + 1 2 u i T ( τ ∑ : , j , t y i j t w j t ) \Large \propto e^{-\frac{1}{2} u_i^T diag(\lambda) u_i} e^{- \frac{1}{2} u_i^T (\tau \sum_{:,j,t} w_{jt} w_{jt}^T) u_i + \frac{1}{2} u_i^T (\tau \sum_{:,j,t} y_{ijt} w_{jt})} e21uiTdiag(λ)uie21uiT(τ:,j,twjtwjtT)ui+21uiT(τ:,j,tyijtwjt)
∝ e − 1 2 u i T [ d i a g ( λ ) + τ ∑ : , j , t w j t w j t T ] u i e 1 2 u i T ( τ ∑ : , j , t y i j t w j t ) \Large \propto e^{-\frac{1}{2} u_i^T [diag(\lambda) + \tau \sum_{:,j,t} w_{jt} w_{jt}^T] u_i} e^{\frac{1}{2} u_i^T (\tau \sum_{:,j,t} y_{ijt} w_{jt})} e21uiT[diag(λ)+τ:,j,twjtwjtT]uie21uiT(τ:,j,tyijtwjt)
Λ ~ u = d i a g ( λ ) + τ ∑ : , j , t w j t w j t T \widetilde{\Lambda}_u = diag(\lambda) + \tau \sum_{:,j,t} w_{jt} w_{jt}^T Λ u=diag(λ)+τ:,j,twjtwjtT
∝ e − 1 2 u i T Λ ~ u u i + 1 2 u i T ( τ ∑ : , j , t y i j t w j t ) \Large \propto e^{-\frac{1}{2} u_i^T \widetilde{\Lambda}_u u_i + \frac{1}{2} u_i^T (\tau \sum_{:,j,t} y_{ijt} w_{jt})} e21uiTΛ uui+21uiT(τ:,j,tyijtwjt)
∝ e − 1 2 ( u i − u ~ u ) T Λ ~ u ( u i − u ~ u ) \Large \propto e^{-\frac{1}{2} (u_i - \widetilde{u}_u)^T \widetilde{\Lambda}_u (u_i - \widetilde{u}_u)} e21(uiu u)TΛ u(uiu u)
其中 u ~ u = τ Λ ~ u − 1 ∑ : , j , t y i j t w j t \widetilde{u}_u = \tau \widetilde{\Lambda}_u^{-1} \sum_{:,j,t} y_{ijt} w_{jt} u u=τΛ u1:,j,tyijtwjt
即: u i ∼ N ( u ~ u , Λ ~ u − 1 ) u_i \sim \mathcal{N}(\widetilde{u}_u,\widetilde{\Lambda}_u^{-1}) uiN(u u,Λ u1)

2.2.2 推导模型参数 v j v_j vj 的后验分布

u i u_i ui 的推导类似
Λ ~ v = d i a g ( λ ) + τ ∑ i , : , t w i t w i t T \widetilde{\Lambda}_v = diag(\lambda) + \tau \sum_{i,:,t} w_{it} w_{it}^T Λ v=diag(λ)+τi,:,twitwitT
其中 v ~ v = τ Λ ~ v − 1 ∑ i , : , t y i j t w i t \widetilde{v}_v = \tau \widetilde{\Lambda}_v^{-1} \sum_{i,:,t} y_{ijt} w_{it} v v=τΛ v1i,:,tyijtwit
即: v j ∼ N ( v ~ v , Λ ~ v − 1 ) v_j \sim \mathcal{N}(\widetilde{v}_v,\widetilde{\Lambda}_v^{-1}) vjN(v v,Λ v1)

2.2.3 推导模型参数 x t x_t xt 的后验分布

u i u_i ui 的推导类似
Λ ~ x = d i a g ( λ ) + τ ∑ i , j , : w i j w i j T \widetilde{\Lambda}_x = diag(\lambda) + \tau \sum_{i,j,:} w_{ij} w_{ij}^T Λ x=diag(λ)+τi,j,:wijwijT
其中 x ~ x = τ Λ ~ x − 1 ∑ i , j , : y i j t w i j \widetilde{x}_x = \tau \widetilde{\Lambda}_x^{-1} \sum_{i,j,:} y_{ijt} w_{ij} x x=τΛ x1i,j,:yijtwij
即: x t ∼ N ( x ~ x , Λ ~ x − 1 ) x_t \sim \mathcal{N}(\widetilde{x}_x,\widetilde{\Lambda}_x^{-1}) xtN(x x,Λ x1)

2.2.4 推导模型参数 τ \tau τ 的后验分布

已知先验 p ( τ ∣ α 0 , β 0 ) = p ( τ ∣ α 0 , β 0 ) = ( β 0 ) α 0 Γ ( α 0 ) ( τ ) α 0 − 1 e − β 0 τ \Large p(\tau | \alpha_0,\beta_0) = p(\tau | \alpha_0,\beta_0) = \frac{(\beta_0)^{\alpha_0}}{\Gamma(\alpha_0)} (\tau)^{\alpha_0-1} e^{-\beta_0\tau} p(τα0,β0)=p(τα0,β0)=Γ(α0)(β0)α0(τ)α01eβ0τ
就模型参数 τ \tau τ 而言,似然主要来自于 Y \mathcal{Y} Y
L ( Y ∣ τ , U , V , X ) ∝ ∏ i = 1 m ∏ j = 1 n ∏ t = 1 f τ 1 / 2 e − 1 2 τ ( y i j t − ∑ s = 1 r u i s v j s x t s ) 2 \Large \mathcal{L}(\mathcal{Y} | \tau,U,V,X) \propto \prod_{i=1}^m \prod_{j=1}^n \prod_{t=1}^f \tau^{1/2} e^{-\frac{1}{2} \tau (y_{ijt} - \sum_{s=1}^r u_{is} v_{js} x_{ts})^2 } L(Yτ,U,V,X)i=1mj=1nt=1fτ1/2e21τ(yijts=1ruisvjsxts)2
∝ τ 1 2 ( m + n + f ) e − 1 2 τ ∑ i , j , t ∈ Ω ( y i j t − ∑ s = 1 r u i s v j s x t s ) 2 \Large \propto \tau^{\frac{1}{2}(m + n + f)} e^{-\frac{1}{2} \tau \sum_{i,j,t \in \Omega}(y_{ijt} - \sum_{s=1}^r u_{is} v_{js} x_{ts})^2 } τ21(m+n+f)e21τi,j,tΩ(yijts=1ruisvjsxts)2
所以: p ( τ ∣ − ) ∝ L ( Y ∣ τ , U , V , X ) × p ( τ ∣ α 0 , β 0 ) \large p(\tau|-) \propto \mathcal{L}(\mathcal{Y} | \tau,U,V,X) \times p(\tau | \alpha_0,\beta_0) p(τ)L(Yτ,U,V,X)×p(τα0,β0)
∝ τ 1 2 ( m + n + f ) e − 1 2 τ ∑ i , j , t ∈ Ω ( y i j t − ∑ s = 1 r u i s v j s x t s ) 2 ( τ ) α 0 − 1 e − β 0 τ \Large \propto \tau^{\frac{1}{2}(m + n + f)} e^{-\frac{1}{2} \tau \sum_{i,j,t \in \Omega}(y_{ijt} - \sum_{s=1}^r u_{is} v_{js} x_{ts})^2 } (\tau)^{\alpha_0-1} e^{-\beta_0\tau} τ21(m+n+f)e21τi,j,tΩ(yijts=1ruisvjsxts)2(τ)α01eβ0τ
∝ τ 1 2 ( m + n + f ) + α 0 − 1 e − 1 2 τ [ β 0 + ∑ i , j , t ∈ Ω ( y i j t − ∑ s = 1 r u i s v j s x t s ) 2 ] \Large \propto \tau^{\frac{1}{2}(m + n + f) + \alpha_0 -1} e^{-\frac{1}{2} \tau \left[\beta_0 + \sum_{i,j,t \in \Omega}(y_{ijt} - \sum_{s=1}^r u_{is} v_{js} x_{ts})^2 \right] } τ21(m+n+f)+α01e21τ[β0+i,j,tΩ(yijts=1ruisvjsxts)2]

在张量分解的贝叶斯网络中,可以通过 τ ∼ G a m m a ( α ~ , β ~ ) \tau \sim Gamma(\widetilde{\alpha},\widetilde{\beta}) τGamma(α ,β ) 对参数 τ \tau τ 进行采样更新,其中:
α ~ = a 0 + 1 2 ∑ i , j , t ∈ Ω 1 ( y i j t ≠ 0 ) \widetilde{\alpha} = a_0 + \frac{1}{2} \sum_{i,j,t \in \Omega} 1 (y_{ijt} \neq 0) α =a0+21i,j,tΩ1(yijt=0)
β ~ = β 0 + 1 2 ∑ i , j , t ∈ Ω ( y i j t − ∑ s = 1 r u i s v j s x t s ) 2 \widetilde{\beta} = \beta_0 + \frac{1}{2} \sum_{i,j,t \in \Omega} (y_{ijt} - \sum_{s=1}^r u_{is} v_{js} x_{ts})^2 β =β0+21i,j,tΩ(yijts=1ruisvjsxts)2

2.2.5 推导模型参数 λ \lambda λ 的后验分布

已知先验分布 p ( λ s ∣ α 0 , β 0 ) = p ( λ s ∣ α 0 , β 0 ) = ( β 0 ) α 0 Γ ( α 0 ) ( λ s ) α 0 − 1 e − β 0 λ s \Large p(\lambda_s | \alpha_0,\beta_0) = p(\lambda_s | \alpha_0,\beta_0) = \frac{(\beta_0)^{\alpha_0}}{\Gamma(\alpha_0)} (\lambda_s)^{\alpha_0-1} e^{-\beta_0 \lambda_s} p(λsα0,β0)=p(λsα0,β0)=Γ(α0)(β0)α0(λs)α01eβ0λs
就模型参数 λ s \lambda_s λs 而言,其似然主要来自于 U , V , X U,V,X U,V,X
尽管超参数 λ \lambda λ 与参数 τ \tau τ 都被假设服从伽马分布,但不同的是,参数 λ \lambda λ 作为一个向量,对应多元正态分布中的协方差矩阵,在这里,不妨以 u i u_i ui 为例,先写一下多元正态分布的形式
p ( u i ∣ λ ) = ∣ d i a g ( λ ) ∣ 1 / 2 ( 2 π ) r / 2 e − 1 2 u i T d i a g ( λ ) u i \Large p(u_i | \lambda) = \frac{|diag(\lambda)|^{1/2}}{(2\pi)^{r/2}} e^{-\frac{1}{2} u_i^T diag(\lambda) u_i} p(uiλ)=(2π)r/2diag(λ)1/2e21uiTdiag(λ)ui
从这条公式中,对于任意 λ s   s = 1 , 2 , … , r \lambda_{s} ~ s=1,2,\dots,r λs s=1,2,,r
p ( u i s ∣ λ s ) ∝ ( λ s ) 1 / 2 e − 1 2 λ s u i s 2 \Large p(u_{is}| \lambda_{s}) \propto (\lambda_{s})^{1/2} e^{-\frac{1}{2} \lambda_{s} u_{is}^2} p(uisλs)(λs)1/2e21λsuis2
所以 L ( U , V , X ∣ λ s ) ∝ ∏ i = 1 m ( λ s ) 1 / 2 e − 1 2 λ s u i s 2 ∏ j = 1 n ( λ s ) 1 / 2 e − 1 2 λ s v j s 2 ∏ t = 1 f ( λ s ) 1 / 2 e − 1 2 λ s x t s 2 \Large \mathcal{L}(U,V,X | \lambda_s) \propto \prod_{i=1}^m (\lambda_{s})^{1/2} e^{-\frac{1}{2} \lambda_{s} u_{is}^2} \prod_{j=1}^n (\lambda_{s})^{1/2} e^{-\frac{1}{2} \lambda_{s} v_{js}^2} \prod_{t=1}^f (\lambda_{s})^{1/2} e^{-\frac{1}{2} \lambda_{s} x_{ts}^2} L(U,V,Xλs)i=1m(λs)1/2e21λsuis2j=1n(λs)1/2e21λsvjs2t=1f(λs)1/2e21λsxts2
∝ ( λ s ) ( m + n + f ) e − 1 2 λ s [ ∑ i = 1 m u i s 2 + ∑ j = 1 n v j s 2 + ∑ t = 1 f x t s 2 ] \Large \propto (\lambda_s)^{(m+n+f)} e^{-\frac{1}{2} \lambda_s \left[ \sum_{i=1}^m u_{is}^2 + \sum_{j=1}^n v_{js}^2 + \sum_{t=1}^f x_{ts}^2 \right]} (λs)(m+n+f)e21λs[i=1muis2+j=1nvjs2+t=1fxts2]
所以 p ( λ s ∣ U , V , X , α 0 , β 0 ) = p ( λ s ∣ α 0 , β 0 ) × L ( U , V , X ∣ λ s ) \large p(\lambda_s | U,V,X,\alpha_0,\beta_0) = p(\lambda_s | \alpha_0,\beta_0) \times \mathcal{L}(U,V,X | \lambda_s) p(λsU,V,X,α0,β0)=p(λsα0,β0)×L(U,V,Xλs)
∝ ( λ s ) ( m + n + f ) e − 1 2 λ s [ ∑ i = 1 m u i s 2 + ∑ j = 1 n v j s 2 + ∑ t = 1 f x t s 2 ] ( λ s ) α 0 − 1 e − β 0 λ s \Large \propto (\lambda_s)^{(m+n+f)} e^{-\frac{1}{2} \lambda_s \left[ \sum_{i=1}^m u_{is}^2 + \sum_{j=1}^n v_{js}^2 + \sum_{t=1}^f x_{ts}^2 \right]} (\lambda_s)^{\alpha_0-1} e^{-\beta_0 \lambda_s} (λs)(m+n+f)e21λs[i=1muis2+j=1nvjs2+t=1fxts2](λs)α01eβ0λs
∝ ( λ s ) ( m + n + f ) + α 0 − 1 e − λ s [ 1 2 ( ∑ i = 1 m u i s 2 + ∑ j = 1 n v j s 2 + ∑ t = 1 f x t s 2 ) + β 0 ] \Large \propto (\lambda_s)^{(m+n+f) + \alpha_0-1} e^{- \lambda_s \left[\frac{1}{2} (\sum_{i=1}^m u_{is}^2 + \sum_{j=1}^n v_{js}^2 + \sum_{t=1}^f x_{ts}^2 )+ \beta_0 \right]} (λs)(m+n+f)+α01eλs[21(i=1muis2+j=1nvjs2+t=1fxts2)+β0]

超参数 λ s ∼ G a m m a ( α ~ , β ~ ) , s = 1 , 2 , … , r \lambda_s \sim Gamma(\widetilde{\alpha},\widetilde{\beta}),s=1,2,\dots,r λsGamma(α ,β ),s=1,2,,r
α ~ = α 0 + 1 2 ( m + n + f ) \widetilde{\alpha} = \alpha_0 + \frac{1}{2} (m+n+f) α =α0+21(m+n+f)
β ~ = β 0 + 1 2 ( ∑ i = 1 m u i s 2 + ∑ j = 1 n v j s 2 + ∑ t = 1 f x t s 2 ) \widetilde{\beta} = \beta_0 + \frac{1}{2} (\sum_{i=1}^m u_{is}^2 + \sum_{j=1}^n v_{js}^2 + \sum_{t=1}^f x_{ts}^2) β =β0+21(i=1muis2+j=1nvjs2+t=1fxts2)

代码请见GitHub,如有帮助不要吝啬star哦~

参考文献

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Gausst松鼠会/article/detail/176604
推荐阅读
相关标签
  

闽ICP备14008679号