赞
踩
现在有这样的一个迷宫,在任意一个房间中放进一个agent,使用非监督学习的方法,agent和环境之间不断地互动,训练得到相应的Q矩阵。
import numpy as np import random R = np.ones((12,12)) R = R*-1 R[0,3]=R[3,0]=0 R[1,2]=R[2,1]=0 R[2,5]=R[5,2]=0 R[6,7]=R[7,6]=0 R[3,6]=R[6,3]=0 R[3,4]=R[4,3]=0 R[1,4]=R[4,1]=0 R[7,4]=R[4,7]=0 R[5,8]=R[8,5]=0 R[8,9]=R[9,8]=0 R[9,10]=R[10,9]=0 R[10,11]=R[11,10]=100 γ = 0.8 Q = np.zeros((12, 12)) Q = np.matrix(Q) for i in range(3000): state = random.randint(0, 11) while True: r_pos_action = [] for action in range(12): if R[state, action] >= 0: r_pos_action.append(action) next_state = r_pos_action[random.randint(0, len(r_pos_action) - 1)] Q[state, next_state] = R[state, next_state] + γ *(Q[next_state]).max() state = next_state if state==11: break print('Q矩阵:') print(Q) state = 0 c = state action1 = [] while not (state == 11 and action == 11): action1.append(Q[state,:].argmax()) action = Q[state,:].argmax() state = Q[state,:].argmax() action1.insert(0, c) print('路径:') for i in action1[:-1]: print(i,'-->',end='') print(action1[-1])
训练得到的Q矩阵如下图所示:
路径:
0 -->3 -->4 -->1 -->2 -->5 -->8 -->9 -->10 -->11
Agent的行走路径如下图所示:
将迷宫改成这样,
import numpy as np import random R = np.ones((12,12)) R = R*-1 R[4,5]=R[5,4]=0 R[7,10]=R[10,7]=0 R[0,3]=R[3,0]=0 R[1,2]=R[2,1]=0 R[2,5]=R[5,2]=0 R[6,7]=R[7,6]=0 R[3,6]=R[6,3]=0 R[3,4]=R[4,3]=0 R[1,4]=R[4,1]=0 R[7,4]=R[4,7]=0 R[5,8]=R[8,5]=0 R[8,9]=R[9,8]=0 R[9,10]=R[10,9]=0 R[10,11]=R[11,10]=100 γ = 0.8 Q = np.zeros((12, 12)) Q = np.matrix(Q) for i in range(3000): state = random.randint(0, 11) while True: r_pos_action = [] for action in range(12): if R[state, action] >= 0: r_pos_action.append(action) next_state = r_pos_action[random.randint(0, len(r_pos_action) - 1)] Q[state, next_state] = R[state, next_state] + γ *(Q[next_state]).max() state = next_state if state==11: break print('Q矩阵:') print(Q) state = 0 c = state action1 = [] while not (state == 11 and action == 11): action1.append(Q[state,:].argmax()) action = Q[state,:].argmax() state = Q[state,:].argmax() action1.insert(0, c) print('路径:') for i in action1[:-1]: print(i,'-->',end='') print(action1[-1])
训练得到的Q矩阵如下图所示:
路径:
0 -->3 -->4 -->7 -->10 -->11
Agent的行走路径如下图所示:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。