当前位置:   article > 正文

python实现 Qlearning算法 完整的输入输出测试数据_q-learning maze python

q-learning maze python

Q-learning是一种强化学习算法,用于解决基于动作-奖励机制的问题。以下是一个简单的 Python 实现 Q-learning 算法的示例,以解决一个简单的迷宫问题。

  1. import numpy as np
  2. # 创建迷宫示例,用数字表示迷宫状态
  3. # 0表示可通行的空格,1表示障碍物,9表示目标点
  4. maze = np.array([
  5. [0, 0, 0, 1],
  6. [1, 1, 0, 1],
  7. [0, 0, 0, 0],
  8. [1, 0, 9, 1]
  9. ])
  10. # 定义Q-table,初始化为0
  11. q_table = np.zeros((maze.shape[0] * maze.shape[1], 4))
  12. # 定义参数
  13. alpha = 0.1 # 学习率
  14. gamma = 0.9 # 折扣因子
  15. epsilon = 0.1 # epsilon-greedy策略中的随机探索率
  16. # 定义动作:上下左右
  17. actions = ['up', 'down', 'left', 'right']
  18. # 获取当前状态下的可执行动作
  19. def get_possible_actions(state):
  20. row, col = np.where(maze == 0)
  21. current_state = np.ravel_multi_index(state, maze.shape)
  22. actions = []
  23. if (state[0] > 0) and (maze[state[0] - 1, state[1]] == 0):
  24. actions.append('up')
  25. if (state[0] < maze.shape[0] - 1) and (maze[state[0] + 1, state[1]] == 0):
  26. actions.append('down')
  27. if (state[1] > 0) and (maze[state[0], state[1] - 1] == 0):
  28. actions.append('left')
  29. if (state[1] < maze.shape[1] - 1) and (maze[state[0], state[1] + 1] == 0):
  30. actions.append('right')
  31. return actions
  32. # 选择动作
  33. def choose_action(state):
  34. if np.random.uniform(0, 1) < epsilon:
  35. return np.random.choice(actions)
  36. else:
  37. possible_actions = get_possible_actions(state)
  38. state_idx = np.ravel_multi_index(state, maze.shape)
  39. q_values = q_table[state_idx][[actions.index(action) for action in possible_actions]]
  40. return possible_actions[np.argmax(q_values)]
  41. # 更新Q-table
  42. def update_q_table(state, action, reward, new_state):
  43. state_idx = np.ravel_multi_index(state, maze.shape)
  44. new_state_idx = np.ravel_multi_index(new_state, maze.shape)
  45. action_idx = actions.index(action)
  46. max_future_q = np.max(q_table[new_state_idx])
  47. current_q = q_table[state_idx][action_idx]
  48. new_q = (1 - alpha) * current_q + alpha * (reward + gamma * max_future_q)
  49. q_table[state_idx][action_idx] = new_q
  50. # Q-learning主循环
  51. for episode in range(1000):
  52. state = np.where(maze == 0)
  53. state = (state[0][0], state[1][0])
  54. done = False
  55. while not done:
  56. action = choose_action(state)
  57. if action == 'up':
  58. new_state = (state[0] - 1, state[1])
  59. elif action == 'down':
  60. new_state = (state[0] + 1, state[1])
  61. elif action == 'left':
  62. new_state = (state[0], state[1] - 1)
  63. else:
  64. new_state = (state[0], state[1] + 1)
  65. if maze[new_state] == 9:
  66. reward = 10
  67. done = True
  68. else:
  69. reward = -1
  70. update_q_table(state, action, reward, new_state)
  71. state = new_state
  72. # 输出训练后的Q-table
  73. print("训练后的Q-table:")
  74. print(q_table)

这段代码实现了一个简单的 Q-learning 算法来解决一个简化的迷宫问题。这个例子中的迷宫是一个简单的矩阵,数字0表示可通行的空格,1表示障碍物,9表示目标点。算法的核心是根据状态选择动作并更新 Q-table。在实际问题中,状态和动作的定义以及环境的建模方式可能会有所不同。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/AllinToyou/article/detail/579988
推荐阅读
相关标签
  

闽ICP备14008679号