赞
踩
本文是 2020人工神经网络第一次作业 的参考答案第二部分
原题要求设计一个神经网络对于下面图中的3类模式进行分类。期望输出分别使用:
(
1
,
−
1
,
−
1
)
T
,
(
−
1
,
1
,
−
1
)
T
,
(
−
1
,
−
1
,
1
)
T
\left( {1, - 1, - 1} \right)^T ,\left( { - 1,1, - 1} \right)^T ,\left( { - 1, - 1,1} \right)^T
(1,−1,−1)T,(−1,1,−1)T,(−1,−1,1)T
来表示。
▲ 三类样本在坐标系中的分布
根据题目给定的三类样本处于坐标系的位置,可以知道它们各自的数据如下表所示:
样本 | x1 | x2 | 种类 | 期望输出 |
---|---|---|---|---|
1 | 0.25 | 0.25 | 1 | (1,-1,-1) |
2 | 0.75 | 0.125 | 1 | (1,-1,-1) |
3 | 0.25 | 0.75 | 1 | (1,-1,-1) |
4 | 0.5 | 0.125 | 2 | (-1,1,-1) |
5 | 0.75 | 0.25 | 2 | (-1,1,-1) |
6 | 0.25 | 0.75 | 2 | (-1,1,-1) |
7 | 0.25 | 0.5 | 3 | (-1,-1,1) |
8 | 0.5 | 0.5 | 3 | (-1,-1,1) |
9 | 0.75 | 0.75 | 3 | (-1,-1,1) |
显然,这三类是线性不可分的。所以采用带有隐层的BF网络、或者RBF网络完成类别的求解。
参考在 第一道题的参考答案 中给出的BP网络程序,建立不同中间隐层结构的BP网络来进行分类。
▲ 网络模型
学习速率 η = 0.5 \eta = 0.5 η=0.5,使用最基本的BP算法训练上述模型。输出误差下降曲线为:
▲ 网络训练误差下降曲线
最终,在就个样本中,始终存在;一个误差样本,无法消除。
#!/usr/local/bin/python # -*- coding: gbk -*- #============================================================ # HW12BP.PY -- by Dr. ZhuoQing 2020-11-17 # # Note: #============================================================ from headm import * #------------------------------------------------------------ # Samples data construction x_data = array([[0.25, 0.25],[0.75, 0.125],[0.25, 0.75], [0.5, 0.125], [0.75, 0.25], [0.25, 0.75], [0.25, 0.5], [0.5, 0.5], [0.75, 0.5]]) y_data = array([[1,-1,-1],[1,-1,-1],[1,-1,-1], [-1,1,-1],[-1,1,-1],[-1,1,-1], [-1,-1,1],[-1,-1,1],[-1,-1,1]]).T #------------------------------------------------------------ def shuffledata(X, Y): id = list(range(X.shape[0])) random.shuffle(id) return X[id], (Y.T[id]).T #------------------------------------------------------------ # Define and initialization NN def initialize_parameters(n_x, n_h, n_y): random.seed(2) W1 = random.randn(n_h, n_x) * 0.5 # dot(W1,X.T) W2 = random.randn(n_y, n_h) * 0.5 # dot(W2,Z1) b1 = zeros((n_h, 1)) # Column vector b2 = zeros((n_y, 1)) # Column vector parameters = {'W1':W1, 'b1':b1, 'W2':W2, 'b2':b2} return parameters #------------------------------------------------------------ # Forward propagattion # X:row->sample; # Z2:col->sample def forward_propagate(X, parameters): W1 = parameters['W1'] b1 = parameters['b1'] W2 = parameters['W2'] b2 = parameters['b2'] Z1 = dot(W1, X.T) + b1 # X:row-->sample; Z1:col-->sample A1 = 1/(1+exp(-Z1)) Z2 = dot(W2, A1) + b2 # Z2:col-->sample # A2 = 1/(1+exp(-Z2)) # A:col-->sample A2 = Z2 # Linear output cache = {'Z1':Z1, 'A1':A1, 'Z2':Z2, 'A2':A2} return Z2, cache #------------------------------------------------------------ # Calculate the cost # A2,Y: col->sample def calculate_cost(A2, Y, parameters): err = [x1-x2 for x1,x2 in zip(A2.T, Y.T)] cost = [dot(e,e) for e in err] return mean(cost) #------------------------------------------------------------ # Backward propagattion def backward_propagate(parameters, cache, X, Y): m = X.shape[0] # Number of the samples W1 = parameters['W1'] W2 = parameters['W2'] A1 = cache['A1'] A2 = cache['A2'] dZ2 = (A2 - Y) #* (A2 * (1-A2)) dW2 = dot(dZ2, A1.T) / m db2 = sum(dZ2, axis=1, keepdims=True) / m dZ1 = dot(W2.T, dZ2) * (A1 * (1-A1)) dW1 = dot(dZ1, X) / m db1 = sum(dZ1, axis=1, keepdims=True) / m grads = {'dW1':dW1, 'db1':db1, 'dW2':dW2, 'db2':db2} return grads #------------------------------------------------------------ # Update the parameters def update_parameters(parameters, grads, learning_rate): W1 = parameters['W1'] b1 = parameters['b1'] W2 = parameters['W2'] b2 = parameters['b2'] dW1 = grads['dW1'] db1 = grads['db1'] dW2 = grads['dW2'] db2 = grads['db2'] W1 = W1 - learning_rate * dW1 W2 = W2 - learning_rate * dW2 b1 = b1 - learning_rate * db1 b2 = b2 - learning_rate * db2 parameters = {'W1':W1, 'b1':b1, 'W2':W2, 'b2':b2} return parameters #------------------------------------------------------------ # Define the training DISP_STEP = 500 def train(X, Y, num_iterations, learning_rate, print_cost=False): # random.seed(3) n_x = 2 n_y = 3 n_h = 5 lr = learning_rate parameters = initialize_parameters(n_x, n_h, n_y) XX,YY = shuffledata(X, Y) costdim = [] for i in range(0, num_iterations): A2, cache = forward_propagate(XX, parameters) cost = calculate_cost(A2, YY, parameters) grads = backward_propagate(parameters, cache, XX, YY) parameters = update_parameters(parameters, grads, lr) if print_cost and i % DISP_STEP == 0: printf('Cost after iteration:%i: %f'%(i, cost)) costdim.append(cost) if cost < 0.1: break return parameters, costdim #------------------------------------------------------------ parameter,costdim = train(x_data, y_data, 10000, 0.5, True) A2, cache = forward_propagate(x_data, parameter) #printf(A2, y_data) A22 = array([[1 if e >= 0 else -1 for e in l] for l in A2]) res = [1 if any(x1!=x2) else 0 for x1,x2 in zip(A22.T, y_data.T)] printf(res, sum(res)) #------------------------------------------------------------ plt.plot(arange(len(costdim))*DISP_STEP, costdim) plt.xlabel("Step(10)") plt.ylabel("Cost") plt.grid(True) plt.tight_layout() plt.show() #------------------------------------------------------------ # END OF FILE : HW12BP.PY #============================================================
构造神经网络模型如下图所示:中间两个隐层的神经元个数为5,传递函数为sigmoid函数。输出层的传递函数为线性函数。
▲ 网络模型
▲ 网络误差下降曲线
最速仍然存在一个样本的误差。
#!/usr/local/bin/python # -*- coding: gbk -*- #============================================================ # HW12BP2.PY -- by Dr. ZhuoQing 2020-11-17 # # Note: #============================================================ from headm import * #------------------------------------------------------------ # Samples data construction x_data = array([[0.25, 0.25],[0.75, 0.125],[0.25, 0.75], [0.5, 0.125], [0.75, 0.25], [0.25, 0.75], [0.25, 0.5], [0.5, 0.5], [0.75, 0.5]]) y_data = array([[1,-1,-1],[1,-1,-1],[1,-1,-1], [-1,1,-1],[-1,1,-1],[-1,1,-1], [-1,-1,1],[-1,-1,1],[-1,-1,1]]).T #------------------------------------------------------------ def shuffledata(X, Y): id = list(range(X.shape[0])) random.shuffle(id) return X[id], (Y.T[id]).T #------------------------------------------------------------ # Define and initialization NN def initialize_parameters(n_x, n_h, n_h1, n_y): random.seed(int(time.time())) W1 = random.randn(n_h, n_x) * 0.5 # dot(W1,X.T) W2 = random.randn(n_h1, n_h) * 0.5 # dot(W2,Z1) W3 = random.randn(n_y, n_h1) * 0.5 # dot(W2,Z1) b1 = zeros((n_h, 1)) # Column vector b2 = zeros((n_h1, 1)) # Column vector b3 = zeros((n_y, 1)) # Column vector parameters = {'W1':W1, 'b1':b1, 'W2':W2, 'b2':b2, 'W3':W3, 'b3':b3} return parameters #------------------------------------------------------------ # Forward propagattion def forward_propagate(X, parameters): W1 = parameters['W1'] b1 = parameters['b1'] W2 = parameters['W2'] b2 = parameters['b2'] W3 = parameters['W3'] b3 = parameters['b3'] Z1 = dot(W1, X.T) + b1 # X:row-->sample; Z1:col-->sample A1 = 1/(1+exp(-Z1)) Z2 = dot(W2, A1) + b2 # Z2:col-->sample A2 = 1/(1+exp(-Z2)) # A:col-->sample Z3 = dot(W3, A2) + b3 # Z2:col-->sample # A3 = 1/(1+exp(-Z3)) # A:col-->sample A3 = Z3 # Linear output cache = {'Z1':Z1, 'A1':A1, 'Z2':Z2, 'A2':A2, 'Z3':Z3, 'A3':A3} return Z3, cache #------------------------------------------------------------ # Calculate the cost def calculate_cost(A3, Y, parameters): err = [x1-x2 for x1,x2 in zip(A3.T, Y.T)] cost = [dot(e,e) for e in err] return mean(cost) #------------------------------------------------------------ # Backward propagattion def backward_propagate(parameters, cache, X, Y): m = X.shape[0] # Number of the samples W1 = parameters['W1'] W2 = parameters['W2'] W3 = parameters['W3'] A1 = cache['A1'] A2 = cache['A2'] A3 = cache['A3'] dZ3 = (A3 - Y) #* (A2 * (1-A2)) dW3 = dot(dZ3, A2.T) / m db3 = sum(dZ3, axis=1, keepdims=True) / m dZ2 = dot(W3.T, dZ3) * (A2 * (1-A2)) dW2 = dot(dZ2, A1.T) / m db2 = sum(dZ2, axis=1, keepdims=True) / m dZ1 = dot(W2.T, dZ2) * (A1 * (1-A1)) dW1 = dot(dZ1, X) / m db1 = sum(dZ1, axis=1, keepdims=True) / m grads = {'dW1':dW1, 'db1':db1, 'dW2':dW2, 'db2':db2, 'dW3':dW3, 'db3':db3} return grads #------------------------------------------------------------ # Update the parameters def update_parameters(parameters, grads, learning_rate): W1 = parameters['W1'] b1 = parameters['b1'] W2 = parameters['W2'] b2 = parameters['b2'] W3 = parameters['W3'] b3 = parameters['b3'] dW1 = grads['dW1'] db1 = grads['db1'] dW2 = grads['dW2'] db2 = grads['db2'] dW3 = grads['dW3'] db3 = grads['db3'] W1 = W1 - learning_rate * dW1 W2 = W2 - learning_rate * dW2 W3 = W3 - learning_rate * dW3 b1 = b1 - learning_rate * db1 b2 = b2 - learning_rate * db2 b3 = b3 - learning_rate * db3 parameters = {'W1':W1, 'b1':b1, 'W2':W2, 'b2':b2, 'W3':W3, 'b3':b3} return parameters #------------------------------------------------------------ # Define the training DISP_STEP = 500 def train(X, Y, num_iterations, learning_rate, print_cost=False): # random.seed(3) n_x = 2 n_y = 3 n_h = 5 n_h1 = 5 lr = learning_rate parameters = initialize_parameters(n_x, n_h, n_h1, n_y) XX,YY = shuffledata(X, Y) costdim = [] for i in range(0, num_iterations): A3, cache = forward_propagate(XX, parameters) cost = calculate_cost(A3, YY, parameters) grads = backward_propagate(parameters, cache, XX, YY) parameters = update_parameters(parameters, grads, lr) if print_cost and i % DISP_STEP == 0: printf('Cost after iteration:%i: %f'%(i, cost)) costdim.append(cost) if cost < 0.1: break return parameters, costdim #------------------------------------------------------------ parameter,costdim = train(x_data, y_data, 10000, 0.5, True) A3, cache = forward_propagate(x_data, parameter) #printf(A3, y_data) A33 = array([[1 if e >= 0 else -1 for e in l] for l in A3]) res = [1 if any(x1!=x2) else 0 for x1,x2 in zip(A33.T, y_data.T)] printf(res, sum(res)) #------------------------------------------------------------ plt.plot(arange(len(costdim))*DISP_STEP, costdim) plt.xlabel("Step(10)") plt.ylabel("Cost") plt.grid(True) plt.tight_layout() plt.show() #------------------------------------------------------------ # END OF FILE : HW12BP2.PY #============================================================
使用nntool建立人工神经网络。在MATLAB中输入变量:
xx =
0.2500 0.7500 0.2500 0.5000 0.7500 0.2500 0.2500 0.5000 0.7500
0.2500 0.1250 0.7500 0.1250 0.2500 0.7500 0.5000 0.5000 0.5000
yy =
1 1 1 -1 -1 -1 -1 -1 -1
-1 -1 -1 1 1 1 -1 -1 -1
-1 -1 -1 -1 -1 -1 1 1 1
▲ 建立的人工神经网络结构
利用train(network1, xx, yy)训练神经网络。
▲ 训练网络和训练性能图
使用sim(network1, xx)来获得网络的实际输出:
1.4681 1.1752 0.7062 1.4357 0.9579 0.7062 1.5045 1.5136 0.5356
-3.2535 -1.6658 -2.1844 -2.6814 -1.6547 -2.1844 -2.9406 -3.3396 -2.4400
0.7569 -0.1324 0.7854 1.0982 -0.2075 0.7854 0.4232 0.2140 -0.6326
网络输出的均方误差为:3.548
string = ('1.4681 1.1752 0.7062 1.4357 0.9579 0.7062 1.5045 1.5136 0.5356',\ '-3.2535 -1.6658 -2.1844 -2.6814 -1.6547 -2.1844 -2.9406 -3.3396 -2.4400',\ '0.7569 -0.1324 0.7854 1.0982 -0.2075 0.7854 0.4232 0.2140 -0.6326',) strdim = [[float(s) for s in str.split() if len(s) > 0] for str in string] #printf(strdim) var = array(strdim) y_data = array([[1,-1,-1],[1,-1,-1],[1,-1,-1], [-1,1,-1],[-1,1,-1],[-1,1,-1], [-1,-1,1],[-1,-1,1],[-1,-1,1]]).T printf(var) err = [x1-x2 for x1,x2 in zip(y_data.T, var.T)] printf(err) printf(mean([dot(e,e)/3 for e in err]))
隐层选取 N = 9 N = 9 N=9个神经元。个数与样本的个数相同,因此,每个神经元的中心与样本相同。
根据样本都分布在(0,1)×(0,1)之内,所以选择隐层神经元的方差(宽度) σ = 0.5 \sigma = 0.5 σ=0.5。
x_data = array([[0.25, 0.25],[0.75, 0.125],[0.25, 0.75], [0.5, 0.125], [0.75, 0.25], [0.25, 0.75], [0.25, 0.5], [0.5, 0.5], [0.75, 0.5]]) y_data = array([[1,-1,-1],[1,-1,-1],[1,-1,-1], [-1,1,-1],[-1,1,-1],[-1,1,-1], [-1,-1,1],[-1,-1,1],[-1,-1,1]]) #------------------------------------------------------------ # calculate the rbf hide layer output # x: Input vector # H: Hide node : row->sample # sigma: variance of RBF def rbf_hide_out(x, H, sigma): Hx = H - x Hxx = [exp(-dot(e,e)/(sigma**2)) for e in Hx] return Hxx #------------------------------------------------------------ Hdim = array([rbf_hide_out(x, x_data, 0.5) for x in x_data]).T W = dot(y_data.T, dot(linalg.inv(eye(9)*0.001+dot(Hdim.T,Hdim)),Hdim.T)) yy = dot(W, Hdim) yy1 = array([[1 if e > 0 else -1 for e in l] for l in yy]) printf(yy1) err = [1 if any(x1!=x2) else 0 for x1,x2 in zip(yy1.T, y_data)] printf(err)
对于每个样本,可以计算出中间隐层的输出: h ˉ i \bar h_i hˉi,它们组成隐层输出矩阵: H = [ h 1 , h 2 , ⋯ , h 9 ] H = \left[ {h_1 ,h_2 , \cdots ,h_9 } \right] H=[h1,h2,⋯,h9]
RBF网络的隐层,再通过隐层到输出层的矩阵 W W W,可以计算出样本的输出: W ⋅ H = Y W \cdot H = Y W⋅H=Y
由此,可以得到:
W
=
Y
⋅
H
−
1
W = Y \cdot H^{ - 1}
W=Y⋅H−1
通过实际运算可以知道,
H
−
1
H^{ - 1}
H−1是奇异矩阵。在上述计算公式中增加正则项:
W
=
Y
⋅
(
H
+
λ
⋅
I
)
−
1
W = Y \cdot \left( {H + \lambda \cdot I} \right)^{ - 1}
W=Y⋅(H+λ⋅I)−1
在实际计算中,选取
λ
=
0.001
\lambda = 0.001
λ=0.001。由此可以获得W。
经过实际检验,对于所有九个样本输入,获得输出的结果为:
[[ 1 1 -1 -1 -1 -1 -1 -1 -1]
[-1 -1 -1 1 1 -1 -1 -1 -1]
[-1 -1 -1 -1 -1 -1 1 1 1]]
其中存在两个错误样本。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。