赞
踩
主要讲的是无约束情况下的最小值问题。涉及到如下:
我们之前在高等数学中学过关于f(x)的泰勒展开如下:
定义:
lim
x
→
a
h
k
(
x
)
=
0
\lim\limits_{x\to a}h_k(x)=0
x→alimhk(x)=0
f
(
x
)
=
f
(
a
)
+
f
′
(
a
)
(
x
−
a
)
+
f
′
′
(
a
)
2
!
(
x
−
a
)
2
+
⋯
+
f
(
k
)
(
a
)
k
!
(
x
−
a
)
k
+
h
k
(
x
)
(
x
−
a
)
k
f(x)=f(a)+f′(a)(x−a)+f″
f(x)=f(a)+f′(a)(x−a)+2!f′′(a)(x−a)2+⋯+k!f(k)(a)(x−a)k+hk(x)(x−a)k
hessian matrix
具有对称性假设有一个m维度向量函数
f
(
x
)
=
[
f
1
(
x
)
f
2
(
x
)
⋯
f
m
(
x
)
]
T
f(x)=^T
f(x)=[f1(x)f2(x)⋯fm(x)]T[列向量],其中
x
=
[
x
1
x
2
⋯
x
n
]
T
x=^T
x=[x1x2⋯xn]T是一个n维输入向量,雅可比矩阵J是一个
m
×
n
m\times n
m×n的矩阵,其元素由函数的偏导数组成:雅可比矩阵第i行第j列表示的是
f
i
(
x
)
f_i(x)
fi(x)对
x
i
x_i
xi的偏导
J
i
j
=
∂
f
i
(
x
)
∂
x
j
Jij=∂xj∂fi(x)
本质上就是函数值 f i ( x ) f_i(x) fi(x)对 x i x_i xi的每个元素求导:
第一步假设 f i ( x ) f_i(x) fi(x)是常数, ∂ f i ( x ) ∂ X \frac{\partial f_i(x)}{\partial X} ∂X∂fi(x)为分子布局,遵循标量不变,向量拉伸原则
XY拉伸术,分子布局,X
横向拉,Y纵向拉,可得如下:
∂
f
i
(
x
)
∂
X
=
[
∂
f
i
(
x
)
∂
x
1
∂
f
i
(
x
)
∂
x
2
⋯
∂
f
i
(
x
)
∂
x
n
]
∂X∂fi(x)=[∂x1∂fi(x)∂x2∂fi(x)⋯∂xn∂fi(x)]
第二步假设 f ( x ) f(x) f(x)为向量, ∂ f ( x ) ∂ X \frac{\partial f(x)}{\partial X} ∂X∂f(x)为分子布局,遵循标量不变,向量拉伸原则
XY拉伸术,分子布局,X
横向拉,Y 纵向拉,可得如下:
J
=
[
∂
f
1
(
x
)
∂
x
1
∂
f
1
(
x
)
∂
x
2
⋯
∂
f
1
(
x
)
∂
x
n
∂
f
2
(
x
)
∂
x
1
∂
f
2
(
x
)
∂
x
2
⋯
∂
f
2
(
x
)
∂
x
n
⋮
⋮
⋯
⋮
∂
f
m
(
x
)
∂
x
1
∂
f
m
(
x
)
∂
x
2
⋯
∂
f
m
(
x
)
∂
x
n
]
J=
∂x1∂f1(x)∂x1∂f2(x)⋮ ∂x1∂fm(x)∂x2∂f1(x)∂x2∂f2(x)⋮∂x2∂fm(x)⋯⋯⋯⋯∂xn∂f1(x)∂xn∂f2(x)⋮∂xn∂fm(x)
泰勒公式1阶展开可得:
f
(
x
+
Δ
x
)
=
f
(
x
)
+
f
′
(
x
)
Δ
x
f(x+Δx)=f(x)+f′(x)Δx
转换成雅可比矩阵可得:
f
(
x
+
Δ
x
)
=
f
(
x
)
+
J
j
k
Δ
x
;
J
j
k
=
∂
f
j
(
x
)
∂
x
k
f(x+Δx)=f(x)+JjkΔx;Jjk=∂xk∂fj(x)
我们已经知道了函数的二阶泰勒展开表示如下:
F
(
x
+
Δ
x
)
≈
F
(
x
)
+
(
Δ
x
)
T
∇
F
(
x
)
+
1
2
(
Δ
x
)
T
H
j
k
(
Δ
x
)
F(x+Δx)≈F(x)+(Δx)T∇F(x)+21(Δx)THjk(Δx)
Python代码如下:
def newton_raphson(f, f_prime, x0, tol=1e-10, max_iter=100): x = x0 for i in range(max_iter): fx = f(x) fpx = f_prime(x) # Newton-Raphson iteration x_new = x - fx / fpx print(f"Iteration {i + 1}: x = {x_new}") if abs(x_new - x) < tol: return x_new x = x_new raise ValueError("Newton-Raphson method did not converge") # Define the function and its first derivative f = lambda x: x ** 2 - 9 f_prime = lambda x: 2 * x # Initial guesses initial_guesses = [2, -2] # Find the roots for x0 in initial_guesses: root = newton_raphson(f, f_prime, x0) print(f"The root starting from {x0} is: {root}")
Iteration 1: x = 3.25
Iteration 2: x = 3.0096153846153846
Iteration 3: x = 3.000015360039322
Iteration 4: x = 3.0000000000393214
Iteration 5: x = 3.0
The root starting from 2 is: 3.0
Iteration 1: x = -3.25
Iteration 2: x = -3.0096153846153846
Iteration 3: x = -3.000015360039322
Iteration 4: x = -3.0000000000393214
Iteration 5: x = -3.0
The root starting from -2 is: -3.0
对于无约束问题的梯度下降,我们一般有两种方法:
运用泰勒一阶信息,迭代方向为负梯度方向:
运用泰勒二阶信息,迭代方向为牛顿方向:迭代步长为 α 1 = 1 \alpha_1=1 α1=1
hessian matrix->
H
j
k
H_{jk}
Hjk可逆:我们定义凸函数为
f
(
x
)
f(x)
f(x),凸集为
K
\mathrm{K}
K,我们的目的是为了求得凸函数
f
(
x
)
f(x)
f(x)的最小值
min
x
∈
K
f
(
x
)
,
K
:
A
x
=
b
x∈Kminf(x),K:Ax=b
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。