赞
踩
1.前向和后向概率的关系
(1)前向概率:
α
t
(
i
)
=
P
(
y
1
,
y
2
,
⋯
y
t
,
q
t
=
i
∣
λ
)
\alpha_{t}(i)=P\left(y_{1}, y_{2}, \cdots y_{t}, q_{t}=i | \lambda\right)
αt(i)=P(y1,y2,⋯yt,qt=i∣λ)
(2)后向概率:
β
t
(
i
)
=
P
(
y
t
+
1
,
y
t
+
2
,
⋯
,
y
T
∣
q
t
=
i
,
λ
)
\beta_{t}(i)=P\left(y_{t+1}, y_{t+2}, \cdots, y_{T} | q_{t}=i, \lambda\right)
βt(i)=P(yt+1,yt+2,⋯,yT∣qt=i,λ)
(3)关系:
P
(
i
t
=
q
i
,
O
∣
λ
)
=
P
(
O
∣
i
t
=
q
i
,
λ
)
P
(
i
t
=
q
i
∣
λ
)
=
P
(
o
1
,
⋯
o
t
,
o
t
+
1
,
⋯
o
T
∣
i
t
=
q
i
,
λ
)
P
(
i
t
=
q
i
∣
λ
)
=
P
(
o
1
,
⋯
o
t
∣
i
t
=
q
i
,
λ
)
P
(
o
t
+
1
,
⋯
o
T
∣
i
t
=
q
i
,
λ
)
P
(
i
t
=
q
i
∣
λ
)
=
P
(
o
1
,
⋯
o
t
,
i
t
=
q
i
∣
λ
)
P
(
o
t
+
1
,
⋯
o
T
∣
i
t
=
q
i
,
λ
)
=
α
t
(
i
)
β
t
(
i
)
2.单个状态的概率
给定模型
λ
\lambda
λ以及观测序列
O
O
O,在时刻t处于状态
q
i
q_i
qi的概率,记:
γ
t
(
i
)
=
P
(
i
t
=
q
i
∣
O
,
λ
)
\gamma_{t}(i)=P\left(i_{t}=q_{i} | O, \lambda\right)
γt(i)=P(it=qi∣O,λ)
根据前向后向概率的定义:
P
(
i
t
=
q
i
,
O
∣
λ
)
=
α
t
(
i
)
β
t
(
i
)
γ
t
(
i
)
=
P
(
i
t
=
q
i
∣
O
,
λ
)
=
P
(
i
t
=
q
i
,
O
∣
λ
)
P
(
O
∣
λ
)
γ
t
(
i
)
=
α
t
(
i
)
β
t
(
i
)
P
(
O
∣
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
i
=
1
N
α
t
(
i
)
β
t
(
i
)
γ
\gamma
γ的意义:
在每个时刻t选择在该时刻最有可能出现的状态
1
^
t
∗
\hat{\mathbf{1}}_{\mathbf{t}}^{*}
1^t∗,从而得到一个状态序列
I
∗
=
{
i
1
∗
,
i
2
∗
⋯
i
T
∗
}
I^{*}=\left\{i_{1}^{*}, i_{2}^{*} \cdots i_{\mathrm{T}}^{*}\right\}
I∗={i1∗,i2∗⋯iT∗},将他作为预测的结果。
给定模型和观测序列,时刻t处于
q
i
q_i
qi的概率为:
γ
t
(
i
)
=
α
t
(
i
)
β
t
(
i
)
P
(
O
∣
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
t
=
1
N
α
t
(
i
)
β
t
(
i
)
\gamma_{t}(i)=\frac{\alpha_{t}(i) \beta_{t}(i)}{P(O | \lambda)}=\frac{\alpha_{t}(i) \beta_{t}(i)}{\sum_{t=1}^{N} \alpha_{t}(i) \beta_{t}(i)}
γt(i)=P(O∣λ)αt(i)βt(i)=∑t=1Nαt(i)βt(i)αt(i)βt(i)
3.两个状态的概率
ξ
t
(
i
,
j
)
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
∣
O
,
λ
)
=
P
(
i
t
=
q
t
,
i
t
+
1
=
q
j
,
O
∣
λ
)
P
(
O
∣
λ
)
=
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
O
∣
λ
)
∑
i
=
1
N
∑
j
=
1
N
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
O
∣
λ
)
P
(
i
t
=
q
i
,
i
t
+
1
=
q
j
,
O
∣
λ
)
=
α
t
(
i
)
a
i
j
b
j
o
t
1
β
t
+
1
(
j
)
4.期望
在观测O下状态i出现的期望:
∑
t
=
1
T
γ
t
(
i
)
\sum_{t=1}^{T} \gamma_{t}(i)
t=1∑Tγt(i)
在观测O下状态i转移到状态j的期望:
∑
t
=
1
T
−
1
ξ
t
(
i
,
j
)
\sum_{t=1}^{T-1} \xi_{t}(i, j)
t=1∑T−1ξt(i,j)
5.学习算法:
若训练数据包含观测序列和状态序列,则HMM的学习非常简单,是监督学习,若训练数据只有观测序列,则HMM的学习需要使用EM算法,是非监督学习。
假设已给定训练数据包含S个长度相同的观测序列和对应的观测序列
{
(
O
1
,
I
1
)
,
(
O
2
,
I
2
)
…
(
O
s
,
I
s
)
}
\left\{\left(\mathrm{O}_{1}, \mathrm{I}_{1}\right),\left(\mathrm{O}_{2}, \mathrm{I}_{2}\right) \ldots\right. \left.\left(O_{s}, I_{s}\right)\right\}
{(O1,I1),(O2,I2)…(Os,Is)},那么,可以直接利用Bernoulli大数定理的结论“频率的极限是概率”,给出HMM的参数估计。
(1)监督学习:
初始概率:
π
^
i
=
∣
q
i
∣
∑
i
∣
q
i
∣
\hat{\pi}_{i}=\frac{\left|q_{i}\right|}{\sum_{i}\left|q_{i}\right|}
π^i=∑i∣qi∣∣qi∣
转移概率:
a
^
i
j
=
∣
q
i
j
∣
∑
j
=
1
N
∣
q
i
j
∣
\hat{a}_{i j}=\frac{\left|q_{i j}\right|}{\sum_{j=1}^{N}\left|q_{i j}\right|}
a^ij=∑j=1N∣qij∣∣qij∣
观测概率:
b
^
i
k
=
∣
s
i
k
∣
∑
k
=
1
M
∣
s
i
k
∣
\hat{b}_{i k}=\frac{\left|s_{i k}\right|}{\sum_{k=1}^{M}\left|s_{i k}\right|}
b^ik=∑k=1M∣sik∣∣sik∣
(2)Baum-Welch算法
所有观测数据写成
O
=
(
o
1
,
o
2
…
o
T
)
\mathrm{O}=\left(\mathrm{o}_{1}, \mathrm{o}_{2} \dots \mathrm{o}_{\mathrm{T}}\right)
O=(o1,o2…oT),所有隐数据写成
I
=
(
i
1
,
i
2
…
i
T
)
\mathrm{I}=\left(\mathrm{i}_{1}, \mathrm{i}_{2} \dots \mathrm{i}_{\mathrm{T}}\right)
I=(i1,i2…iT),完全数据是
(
O
,
I
)
=
(
o
1
,
o
2
…
o
T
,
i
1
,
i
2
…
i
T
)
(\mathrm{O}, \mathrm{I})=\left(\mathrm{o}_{1}, \mathrm{o}_{2} \dots \mathrm{o}_{\mathrm{T}}, \mathrm{i}_{1}, \mathrm{i}_{2} \dots \mathrm{i}_{\mathrm{T}}\right)
(O,I)=(o1,o2…oT,i1,i2…iT),完全数据的对数似然是
ln
P
(
O
,
I
∣
λ
)
\ln \mathrm{P}(\mathrm{O}, \mathrm{I} | \lambda)
lnP(O,I∣λ)
假设
λ
ˉ
\bar{\lambda}
λˉ是HMM参数当前的估计值,
λ
\lambda
λ是当前的参数。
Q
(
λ
,
λ
ˉ
)
=
∑
I
(
ln
P
(
O
,
I
∣
λ
)
)
P
(
I
∣
O
,
λ
ˉ
)
=
∑
I
ln
P
(
O
,
I
∣
λ
)
P
(
O
,
I
∣
λ
ˉ
)
P
(
O
,
λ
ˉ
)
∝
∑
I
ln
P
(
O
,
I
∣
λ
)
P
(
O
,
I
∣
λ
ˉ
)
EM过程:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。