赞
踩
朴素贝叶斯法是基于贝叶斯定理与特征条件独立假设的分类方法。
设输入空间
χ
∈
R
n
\chi \in R^n
χ∈Rn是n维向量集合,输出空间
y
=
{
c
1
,
c
2
,
⋯
,
c
K
}
y=\{c_1,c_2,\cdots,c_K\}
y={c1,c2,⋯,cK}为类标记集合。X是定义在输入空间
χ
\chi
χ上的随机向量,Y是定义在输出空间
y
y
y上的随机变量。
P
(
X
,
Y
)
P(X,Y)
P(X,Y)是X和Y的联合概率分布。训练数据集
T
=
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
⋯
,
(
x
N
,
y
N
)
}
T=\{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\}
T={(x1,y1),(x2,y2),⋯,(xN,yN)}
由
P
(
X
,
Y
)
P(X,Y)
P(X,Y)独立同分布产生。
先验概率分布
P
(
Y
=
c
k
)
,
k
=
1
,
2
,
⋯
,
K
P(Y=c_k),k=1,2,\cdots,K
P(Y=ck),k=1,2,⋯,K
条件概率分布
P
(
X
=
x
∣
Y
=
c
k
)
=
P
(
X
(
1
)
=
x
(
1
)
,
X
(
2
)
=
x
(
2
)
,
⋯
,
X
(
n
)
=
x
(
n
)
∣
Y
=
c
k
)
,
k
=
1
,
2
,
⋯
,
K
P(X=x|Y=c_k)=P(X^{(1)}=x^{(1)},X^{(2)}=x^{(2)},\cdots,X^{(n)}=x^{(n)}|Y=c_k),k=1,2,\cdots,K
P(X=x∣Y=ck)=P(X(1)=x(1),X(2)=x(2),⋯,X(n)=x(n)∣Y=ck),k=1,2,⋯,K
朴素贝叶斯法对条件概率分布做了条件独立性的假设。条件独立性假设是
P
(
X
=
x
∣
Y
=
c
k
)
=
P
(
X
(
1
)
=
x
(
1
)
,
X
(
2
)
=
x
(
2
)
,
⋯
,
X
(
n
)
=
x
(
n
)
∣
Y
=
c
k
)
=
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
P(X=x|Y=c_k)=P(X^{(1)}=x^{(1)},X^{(2)}=x^{(2)},\cdots,X^{(n)}=x^{(n)}|Y=c_k)\\ \qquad\qquad\qquad\qquad =\prod \limits_{j=1}^nP(X^{(j)}=x^{(j)}|Y=c_k)
P(X=x∣Y=ck)=P(X(1)=x(1),X(2)=x(2),⋯,X(n)=x(n)∣Y=ck)=j=1∏nP(X(j)=x(j)∣Y=ck)
条件独立假设可解释为:用于分类的特征在类确定的条件下都是条件独立的。
后验概率分布
P
(
Y
=
c
k
∣
X
=
x
)
=
P
(
X
=
x
∣
Y
=
c
k
)
P
(
Y
=
c
k
)
∑
k
P
(
X
=
x
∣
Y
=
c
k
)
P
(
Y
=
c
k
)
P(Y=c_k|X=x)=\frac{P(X=x|Y=c_k)P(Y=c_k)}{\sum \limits_{k}P(X=x|Y=c_k)P(Y=c_k)}
P(Y=ck∣X=x)=k∑P(X=x∣Y=ck)P(Y=ck)P(X=x∣Y=ck)P(Y=ck)
根据条件独立假设有
P
(
Y
=
c
k
∣
X
=
x
)
=
P
(
X
=
x
∣
Y
=
c
k
)
P
(
Y
=
c
k
)
∑
k
P
(
X
=
x
∣
Y
=
c
k
)
P
(
Y
=
c
k
)
=
P
(
Y
=
c
k
)
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
∑
k
P
(
Y
=
c
k
)
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
,
k
=
1
,
2
,
⋯
,
,
K
P(Y=c_k|X=x)=\frac{P(X=x|Y=c_k)P(Y=c_k)}{\sum \limits_{k}P(X=x|Y=c_k)P(Y=c_k)}\\ \qquad\qquad\qquad\qquad =\frac{P(Y=c_k)\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k)}{\sum \limits_{k}P(Y=c_k)\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k)},k=1,2,\cdots,,K
P(Y=ck∣X=x)=k∑P(X=x∣Y=ck)P(Y=ck)P(X=x∣Y=ck)P(Y=ck)=k∑P(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck)P(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck),k=1,2,⋯,,K
朴素贝叶斯分类器可表示为
y
=
f
(
x
)
=
arg max
c
k
P
(
Y
=
c
k
)
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
∑
k
P
(
Y
=
c
k
)
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
y=f(x)=\argmax \limits_{c_k}\frac{P(Y=c_k)\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k)}{\sum \limits_{k}P(Y=c_k)\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k)}
y=f(x)=ckargmaxk∑P(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck)P(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck)
进一步(分母不变)
y
=
f
(
x
)
=
arg max
c
k
P
(
Y
=
c
k
)
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
y=f(x)=\argmax \limits_{c_k}{P(Y=c_k)\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k)}
y=f(x)=ckargmaxP(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck)
在朴素贝叶斯法中,学习意味着估计
P
(
Y
=
c
k
)
P(Y=c_k)
P(Y=ck)和
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
P(X^{(j)}=x^{(j)}|Y=c_k)
P(X(j)=x(j)∣Y=ck)
应用极大似然估计法估计相应的概率,
P
(
Y
=
c
k
)
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
N
,
k
=
1
,
2
,
⋯
,
,
K
P(Y=c_k)=\frac{\sum \limits_{i=1}^{N}I(y_i=c_k)}{N},k=1,2,\cdots,,K
P(Y=ck)=Ni=1∑NI(yi=ck),k=1,2,⋯,,K
设第j个特征
x
(
j
)
x^{(j)}
x(j)可能取值的集合为
{
a
j
1
,
a
j
2
,
⋯
,
a
j
s
j
}
\{a_{j1},a_{j2},\cdots,a_{jsj}\}
{aj1,aj2,⋯,ajsj},条件概率
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
P(X^{(j)}=x^{(j)}|Y=c_k)
P(X(j)=x(j)∣Y=ck)的极大似然估计为
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
∑
i
=
1
N
I
(
y
i
=
c
k
)
P(X^{(j)}=a_{jl}|Y=c_k)=\frac{\sum \limits_{i=1}^{N} I(x^{(j)}_{i}=a_{jl},y_i=c_k)}{\sum \limits_{i=1}^{N}I(y_i=c_k)}
P(X(j)=ajl∣Y=ck)=i=1∑NI(yi=ck)i=1∑NI(xi(j)=ajl,yi=ck)
输入:训练数据
T
=
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
⋯
,
(
x
N
,
y
N
)
}
,
T=\{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\},
T={(x1,y1),(x2,y2),⋯,(xN,yN)},其中
x
i
=
{
x
i
(
1
)
,
x
i
(
2
)
,
⋯
,
x
i
(
n
)
}
T
,
x
i
(
j
)
x_i=\{x_i^{(1)},x_i^{(2)},\cdots,x_i^{(n)}\}^T,x_i^{(j)}
xi={xi(1),xi(2),⋯,xi(n)}T,xi(j)是第
i
i
i个样本的第
j
j
j个特征,
x
i
(
j
)
∈
{
a
j
1
,
a
j
2
,
⋯
,
a
j
S
j
}
,
a
j
l
x_i^{(j)}\in \{a_{j1},a_{j2},\cdots,a_{jS_j}\},a_{jl}
xi(j)∈{aj1,aj2,⋯,ajSj},ajl是第
j
j
j特征可能取的第
l
l
l个值,
j
=
1
,
2
,
⋯
,
n
,
l
=
1
,
2
,
⋯
,
S
j
,
y
i
∈
{
c
1
,
c
2
,
⋯
,
c
K
}
j=1,2,\cdots,n,l=1,2,\cdots,S_j,y_i\in \{c_1,c_2,\cdots,c_K\}
j=1,2,⋯,n,l=1,2,⋯,Sj,yi∈{c1,c2,⋯,cK};实例
x
x
x
输出:实例
x
x
x的分类
(1)计算先验概率和条件概率
P
(
Y
=
c
k
)
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
N
,
k
=
1
,
2
,
⋯
,
,
K
P(Y=c_k)=\frac{\sum \limits_{i=1}^{N}I(y_i=c_k)}{N},k=1,2,\cdots,,K
P(Y=ck)=Ni=1∑NI(yi=ck),k=1,2,⋯,,K
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
∑
i
=
1
N
I
(
y
i
=
c
k
)
,
j
=
1
,
2
,
⋯
,
n
;
l
=
1
,
2
,
⋯
,
S
j
;
k
=
1
,
2
,
⋯
,
K
P(X^{(j)}=a_{jl}|Y=c_k)=\frac{\sum \limits_{i=1}^{N} I(x^{(j)}_{i}=a_{jl},y_i=c_k)}{\sum \limits_{i=1}^{N}I(y_i=c_k)},j=1,2,\cdots,n;l=1,2,\cdots,S_j;k=1,2,\cdots,K
P(X(j)=ajl∣Y=ck)=i=1∑NI(yi=ck)i=1∑NI(xi(j)=ajl,yi=ck),j=1,2,⋯,n;l=1,2,⋯,Sj;k=1,2,⋯,K
(2)对于给定的实例
x
=
{
x
(
1
)
,
x
(
2
)
,
⋯
,
x
(
n
)
}
T
,
x=\{x^{(1)},x^{(2)},\cdots,x^{(n)}\}^T,
x={x(1),x(2),⋯,x(n)}T,计算
P
(
Y
=
c
k
)
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
,
k
=
1
,
2
,
⋯
,
K
P(Y=c_k)\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k),k=1,2,\cdots,K
P(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck),k=1,2,⋯,K
(3)确定实例
x
x
x的类
y
=
arg max
c
k
P
(
Y
=
c
k
)
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
y=\argmax \limits_{c_k}{P(Y=c_k)\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k)}
y=ckargmaxP(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck)
\qquad
采用极大似然估计可能会出现所要估计的概率值为0的情况,进而影响后验概率的计算结果,使分类产生偏差。解决这一问题的方法是采用贝叶斯估计。
P
λ
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
+
λ
∑
i
=
1
N
I
(
y
i
=
c
k
)
+
S
j
λ
\qquad P_{\lambda}(X^{(j)}=a_{jl}|Y=c_k)=\frac{\sum\limits_{i=1}^{N}I(x_i^{(j)}=a_{jl},y_i=c_k)+\lambda}{\sum\limits_{i=1}^{N}I(y_i=c_k)+S_j\lambda}
Pλ(X(j)=ajl∣Y=ck)=i=1∑NI(yi=ck)+Sjλi=1∑NI(xi(j)=ajl,yi=ck)+λ
其中
λ
≥
0
\lambda\geq0
λ≥0
当
λ
=
0
\lambda=0
λ=0时,是极大似然估计。
当
λ
=
1
\lambda=1
λ=1时,称为拉普拉斯平滑。
先验概率的贝叶斯估计为
P
λ
(
Y
=
c
k
)
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
+
λ
N
+
K
λ
\qquad P_{\lambda}(Y=c_k)=\frac{\sum\limits_{i=1}^{N}I(y_i=c_k)+\lambda}{N+K\lambda}
Pλ(Y=ck)=N+Kλi=1∑NI(yi=ck)+λ
对于上例,采用拉普拉斯平滑估计概率。
##朴素贝叶斯
x1=[1,1,1,1,1,2,2,2,2,2,3,3,3,3,3]
x2=['S','M','M','S','S','S','M','M','L','L','L','M','M','L','L']
y=[-1,-1,1,1,-1,-1,-1,1,1,1,1,1,1,1,-1]
data={'X1':x1,'X2':x2,'Y':y}
df=pd.DataFrame(data)
A1={1,2,3}
A2={'S','M','L'}
C={1,-1}
def priorPro(y):
'''先验概率p(y)'''
C=y.unique()
pro_y={}
for c_k in C:
pro=sum(y==c_k)/len(y)
pro_y[c_k]=pro
return pro_y
def conditionalPro(x,y):
'''条件概率p(X=x|Y=y)'''
a=list(x.unique())
c=list(y.unique())
inter=pd.concat([x,y],axis=1)
conditionalpro={}
for c_k in c:
subpro={}
for a_j in a:
num=len(inter[inter.iloc[:,1]==c_k])
num1=len(inter[(inter.iloc[:,0]==a_j)&(inter.iloc[:,1]==c_k)])
pro=num1/num
subpro[a_j]=pro
conditionalpro[c_k]=subpro
return pd.DataFrame(conditionalpro)
整理结果
priorpro=priorPro(df['Y'])
a1=conditionalPro(df['X1'],df['Y'])
a2=conditionalPro(df['X2'],df['Y'])
a1['变量']=1
a2['变量']=2
conPro=pd.concat([a1,a2])
conpro=conPro.reset_index()
conpro.rename(columns={'index':'X_value'},inplace=True)
def pred(x):
'''预测'''
postpros={}
for c_k in list(C):
postpro=priorpro[c_k]
for i in range(len(x)):
postpro*=conpro.loc[(conpro['X_value']==x[i])&(conpro['变量']==i+1),c_k].values[0]
postpros[c_k]=postpro
for key, val in postpros.items():
if val==max(postpros.values()):
max_key=key
return max_key,postpros
x_sample=[2,'S']
pred(x_sample)
引入 λ \lambda λ。
##拉普拉斯平滑 def priorPro_lap(y,lam=1): '''先验概率p(y)''' C=y.unique() pro_y={} for c_k in C: pro=(sum(y==c_k)+lam)/(len(y)+len(C)*lam) pro_y[c_k]=pro return pro_y def conditionalPro_lap(x,y,lam=1): '''条件概率p(X=x|Y=y)''' a=list(x.unique()) c=list(y.unique()) inter=pd.concat([x,y],axis=1) conditionalpro={} for c_k in c: subpro={} for a_j in a: num=len(inter[inter.iloc[:,1]==c_k]) num1=len(inter[(inter.iloc[:,0]==a_j)&(inter.iloc[:,1]==c_k)]) pro=(num1+lam)/(num+len(a)*lam) subpro[a_j]=pro conditionalpro[c_k]=subpro return pd.DataFrame(conditionalpro) priorpro_lap=priorPro_lap(df['Y'],lam=1) a1=conditionalPro_lap(df['X1'],df['Y']) a2=conditionalPro_lap(df['X2'],df['Y']) a1['变量']=1 a2['变量']=2 conPro=pd.concat([a1,a2]) conpro=conPro.reset_index() conpro.rename(columns={'index':'value'},inplace=True) def pred(x): '''预测''' postpros={} for c_k in list(C): postpro=priorpro_lap[c_k] for i in range(len(x)): postpro*=conpro.loc[(conpro['value']==x[i])&(conpro['变量']==i+1),c_k].values[0] postpros[c_k]=postpro for key, val in postpros.items(): if val==max(postpros.values()): max_key=key return max_key,postpros
概率 P ( Y = c ∣ X = x ) P(Y=c|X=x) P(Y=c∣X=x)最大值所对应的 c c c就是我们所预测的 X X X的 Y Y Y值。根据贝叶斯公式有 P ( Y = c ∣ X = x ) = P ( X = x , Y = c ) P ( X = x ) ⇒ 贝 叶 斯 定 理 = P ( X = x ∣ Y = c ) P ( Y = c ) ∑ c P ( X = x ∣ Y = c ) P ( Y = c ) ⇒ 特 征 条 件 独 立 = ∏ j P ( X ( j ) = x ( j ) ∣ Y = c ) P ( Y = c ) ∑ c ∏ j P ( X ( j ) = x ( j ) ∣ Y = c ) P ( Y = c ) \;P(Y=c|X=x)=\frac{P(X=x,Y=c)}{P(X=x)}\\ \xRightarrow{贝叶斯定理}\;=\frac{P(X=x|Y=c)P(Y=c)}{\sum\limits_{c}P(X=x|Y=c)P(Y=c)}\\ \xRightarrow{特征条件独立}=\frac{\prod\limits_{j}P(X^{(j)}=x^{(j)}|Y=c)P(Y=c)}{\sum\limits_{c}\prod\limits_{j}P(X^{(j)}=x^{(j)}|Y=c)P(Y=c)} P(Y=c∣X=x)=P(X=x)P(X=x,Y=c)贝叶斯定理 =c∑P(X=x∣Y=c)P(Y=c)P(X=x∣Y=c)P(Y=c)特征条件独立 =c∑j∏P(X(j)=x(j)∣Y=c)P(Y=c)j∏P(X(j)=x(j)∣Y=c)P(Y=c)
赞
踩
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。