当前位置:   article > 正文

机器学习 | 向量化_机器学习向量化

机器学习向量化

前言

下面对线性回归模型、代价函数、梯度下降算法等基础概念进行向量化。在这里我们讨论的都是这些概念的最一般的形式,毕竟,数学家都喜欢这么做。

一、线性回归模型

线性回归模型最一般的形式为:

h θ ( x ) = θ 0 x 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n h_\theta(x)=\theta_0x_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n hθ(x)=θ0x0+θ1x1+θ2x2++θnxn

θ = [ θ 0 θ 1 ⋮ θ n ] \theta=\left[

θ0θ1θn
\right] θ=θ0θ1θn   ,    y = [ y ( 1 ) y ( 2 ) ⋮ y ( m ) ] y=\left[
y(1)y(2)y(m)
\right]
y=y(1)y(2)y(m)

对于 x 1 , x 2 , ⋯   , x n x_1, x_2,\cdots,x_n x1,x2,,xn 如何用向量表示,其实就我目前所知,有两种不同的设法,讨论如下:


第一种:

X = [ x 0 x 1 ⋮ x n ] X=\left[

x0x1xn
\right] X=x0x1xn

h θ ( x ) = θ T X = X T θ h_\theta(x)=\theta^TX=X^T\theta hθ(x)=θTX=XTθ

这种 X X X 的设法可以让 h θ ( x ) h_\theta(x) hθ(x) 的表达式简化一些。


第二种则更实用一些(做编程作业时):

X = [ x 0 x 1 ( 1 ) ⋯ x n ( 1 ) x 0 x 1 ( 2 ) ⋯ x n ( 2 ) ⋮ ⋮ ⋮ x 0 x 1 ( m ) ⋯ x n ( m ) ] X=\left[

x0x1(1)xn(1)x0x1(2)xn(2)x0x1(m)xn(m)
\right] X=x0x0x0x1(1)x1(2)x1(m)xn(1)xn(2)xn(m)

对于这种设法,可以理解为以特征为列,以数据样本为行。

并且此时的 h θ ( x ) h_\theta(x) hθ(x) 需要改写一下:

h θ ( x ( i ) ) = θ 0 x 0 ( i ) + θ 1 x 1 ( i ) + θ 2 x 2 ( i ) + ⋯ + θ n x n ( i ) h_\theta(x^{(i)})=\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+\theta_2x_2^{(i)}+\cdots+\theta_nx_n^{(i)} hθ(x(i))=θ0x0(i)+θ1x1(i)+θ2x2(i)++θnxn(i)

然后令 h θ ( x ) = [ h θ ( x ( 1 ) ) h θ ( x ( 2 ) ) ⋮ h θ ( x ( m ) ) ] h_\theta(x)=\left[

hθ(x(1))hθ(x(2))hθ(x(m))
\right] hθ(x)=hθ(x(1))hθ(x(2))hθ(x(m))

于是有, h θ ( x ) = X θ h_\theta(x)=X\theta hθ(x)=Xθ


在接下来的向量化推导中我都将采用第二种设法。

二、代价函数

代价函数的一般形式:

J ( θ ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J \left( \theta\right) = \frac{1}{2m}\sum\limits_{i=1}^m \left( h_{\theta}(x^{(i)})-y^{(i)} \right)^{2} J(θ)=2m1i=1m(hθ(x(i))y(i))2

其中 θ = [ θ 0 θ 1 ⋮ θ n ] \theta=\left[

θ0θ1θn
\right] θ=θ0θ1θn

对代价函数进行向量化的结果为:

J ( θ ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 = 1 2 m ( X θ − y ) ⋅ ( X θ − y ) J \left( \theta \right)=\frac{1}{2m}\sum\limits_{i=1}^m \left( h_{\theta}(x^{(i)})-y^{(i)} \right)^{2}=\frac{1}{2m}(X\theta-y)\cdot(X\theta-y) J(θ)=2m1i=1m(hθ(x(i))y(i))2=2m1(Xθy)(Xθy)

注意这里 ( X θ − y ) (X\theta-y) (Xθy) ( X θ − y ) (X\theta-y) (Xθy) 之间使用的是点积

具体推导过程如下:

X θ − y = h θ ( x ) − y = [ h θ ( x ( 1 ) ) h θ ( x ( 2 ) ) ⋮ h θ ( x ( m ) ) ] − [ y ( 1 ) y ( 2 ) ⋮ y ( m ) ] = [ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ] X\theta-y=h_\theta{(x)}-y=\left[

hθ(x(1))hθ(x(2))hθ(x(m))
\right]-\left[
y(1)y(2)y(m)
\right]= \left[
hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)
\right] Xθy=hθ(x)y=hθ(x(1))hθ(x(2))hθ(x(m))y(1)y(2)y(m)=hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)

于是有,

( X θ − y ) ⋅ ( X θ − y ) = [ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ] ⋅ [ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ] (X\theta-y)\cdot(X\theta-y)= \left[

hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)
\right] \cdot \left[
hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)
\right] (Xθy)(Xθy)=hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)

= ( h θ ( x ( 1 ) ) − y ( 1 ) ) 2 + ( h θ ( x ( 2 ) ) − y ( 2 ) ) 2 + ⋯ + ( h θ ( x ( m ) ) − y ( m ) ) 2 \\= \left( h_{\theta}(x{(1)})-y^{(1)} \right)^{2}+\left( h_{\theta}(x^{(2)})-y^{(2)} \right)^{2}+\cdots+\left( h_{\theta}(x^{(m)})-y^{(m)} \right)^{2} =(hθ(x(1))y(1))2+(hθ(x(2))y(2))2++(hθ(x(m))y(m))2

= ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 \\ \\=\sum\limits_{i=1}^m \left( h_{\theta}(x^{(i)})-y^{(i)} \right)^{2} =i=1m(hθ(x(i))y(i))2

总结:

J ( θ ) = 1 2 m ( X θ − y ) ⋅ ( X θ − y ) J \left( \theta \right)=\frac{1}{2m}(X\theta-y)\cdot(X\theta-y) J(θ)=2m1(Xθy)(Xθy)

…有没有隐约地感觉到数学的美妙之处?

三、梯度下降算法

梯度下降函数的一般形式为:

θ j : = θ j − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \theta_j:=\theta_j-\alpha\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j θj:=θjαm1i=1m(hθ(x(i))y(i))xj(i)其中, 0 ≤ j ≤ n 0\leq j \leq n 0jn

将上式进行向量化后的结果:

θ : = θ − α 1 m X T ( X θ − y ) \theta :=\theta-\alpha\frac{1}{m}X^T(X\theta-y) θ:=θαm1XT(Xθy)

其中 θ = [ θ 0 θ 1 ⋮ θ n ] \theta=\left[

θ0θ1θn
\right] θ=θ0θ1θn   ,    X = [ x 0 ( 1 ) x 1 ( 1 ) ⋯ x n ( 1 ) x 0 ( 2 ) x 1 ( 2 ) ⋯ x n ( 2 ) ⋮ ⋮ ⋮ x 0 ( m ) x 1 ( m ) ⋯ x n ( m ) ] X=\left[
x0(1)x1(1)xn(1)x0(2)x1(2)xn(2)x0(m)x1(m)xn(m)
\right]
X=x0(1)x0(2)x0(m)x1(1)x1(2)x1(m)xn(1)xn(2)xn(m)
  ,    y = [ y ( 1 ) y ( 2 ) ⋮ y ( m ) ] y=\left[
y(1)y(2)y(m)
\right]
y=y(1)y(2)y(m)

推导过程如下:

θ : = θ − α 1 m δ \theta :=\theta-\alpha\frac{1}{m}\delta θ:=θαm1δ,显然, δ = [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i ) ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 1 ( i ) ⋮ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x n ( i ) ] \delta=\left[

i=1m(hθ(x(i))y(i))x0(i)i=1m(hθ(x(i))y(i))x1(i)i=1m(hθ(x(i))y(i))xn(i)
\right] δ=i=1m(hθ(x(i))y(i))x0(i)i=1m(hθ(x(i))y(i))x1(i)i=1m(hθ(x(i))y(i))xn(i)

则有 δ = [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i ) ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 1 ( i ) ⋮ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x n ( i ) ] \delta= \left[

i=1m(hθ(x(i))y(i))x0(i)i=1m(hθ(x(i))y(i))x1(i)i=1m(hθ(x(i))y(i))xn(i)
\right]\quad δ=i=1m(hθ(x(i))y(i))x0(i)i=1m(hθ(x(i))y(i))x1(i)i=1m(hθ(x(i))y(i))xn(i)

= [ ( h θ ( x ( 1 ) ) − y ( 1 ) ) x 0 ( 1 ) + ( h θ ( x ( 2 ) ) − y ( 2 ) ) x 0 ( 2 ) + ⋯ + ( h θ ( x ( m ) ) − y ( m ) ) x 0 ( m ) ( h θ ( x ( 1 ) ) − y ( 1 ) ) x 1 ( 1 ) + ( h θ ( x ( 2 ) ) − y ( 2 ) ) x 1 ( 2 ) + ⋯ + ( h θ ( x ( m ) ) − y ( m ) ) x 1 ( m ) ⋮ ( h θ ( x ( 1 ) ) − y ( 1 ) ) x n ( 1 ) + ( h θ ( x ( 2 ) ) − y ( 2 ) ) x n ( 2 ) + ⋯ + ( h θ ( x ( m ) ) − y ( m ) ) x n ( m ) ] \\= \left[

(hθ(x(1))y(1))x0(1)+(hθ(x(2))y(2))x0(2)++(hθ(x(m))y(m))x0(m)(hθ(x(1))y(1))x1(1)+(hθ(x(2))y(2))x1(2)++(hθ(x(m))y(m))x1(m)(hθ(x(1))y(1))xn(1)+(hθ(x(2))y(2))xn(2)++(hθ(x(m))y(m))xn(m)
\right] =(hθ(x(1))y(1))x0(1)+(hθ(x(2))y(2))x0(2)++(hθ(x(m))y(m))x0(m)(hθ(x(1))y(1))x1(1)+(hθ(x(2))y(2))x1(2)++(hθ(x(m))y(m))x1(m)(hθ(x(1))y(1))xn(1)+(hθ(x(2))y(2))xn(2)++(hθ(x(m))y(m))xn(m)

= [ x 0 ( 1 ) x 0 ( 2 ) ⋯ x 0 ( m ) x 1 ( 1 ) x 1 ( 2 ) ⋯ x 1 ( m ) ⋮ ⋮ ⋮ x n ( 1 ) x n ( 2 ) ⋯ x n ( m ) ] [ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ] \\=\left[

x0(1)x0(2)x0(m)x1(1)x1(2)x1(m)xn(1)xn(2)xn(m)
\right] \left[
hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)
\right] =x0(1)x1(1)xn(1)x0(2)x1(2)xn(2)x0(m)x1(m)xn(m)hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)

很明显, [ x 0 ( 1 ) x 0 ( 2 ) ⋯ x 0 ( m ) x 1 ( 1 ) x 1 ( 2 ) ⋯ x 1 ( m ) ⋮ ⋮ ⋮ x n ( 1 ) x n ( 2 ) ⋯ x n ( m ) ] = X T \left[

x0(1)x0(2)x0(m)x1(1)x1(2)x1(m)xn(1)xn(2)xn(m)
\right]=X^T x0(1)x1(1)xn(1)x0(2)x1(2)xn(2)x0(m)x1(m)xn(m)=XT ,又因为 X θ − y = [ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ] X\theta-y= \left[
hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)
\right]
Xθy=hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)

δ = X T ( X θ − y ) \delta=X^T(X\theta-y) δ=XT(Xθy)

总结:

θ : = θ − α 1 m X T ( X θ − y ) \theta :=\theta-\alpha\frac{1}{m}X^T(X\theta-y) θ:=θαm1XT(Xθy)



(完)
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家小花儿/article/detail/347647
推荐阅读
相关标签
  

闽ICP备14008679号