当前位置:   article > 正文




Reinforcement Learning is a type of learning method for a computer system or an agent which works on Artificial Intelligence. In this type of learning, the agent learns from the series of rewards or punishments which it gets on the completion of any task. The main aim of this type of agent is to get the maximum rewards. This functionality of this agent helps it to be a utility-based agent because here, the agent chooses the best among the available options so that the user is satisfied completely.

强化学习是计算机系统或在人工智能上工作的代理的一种学习方法。 在这种学习中,代理人从完成任何任务时所获得的一系列奖励或惩罚中学习。 这种类型的代理的主要目的是获得最大的回报。 该代理程序的功能有助于使其成为基于实用程序的代理程序,因为在这里,该代理程序会在可用选项中选择最佳,从而使用户完全满意。

Let us explain this further with the help of an example. Suppose an agent is designed for house cleaning. There are multiple tasks which the agent can do like mopping, dust cleaning, washing utensils, washing clothes. In the agent’s memory, every task is mentioned with different reward points. Suppose we want the agent to work only for some limited amount of time due to the electricity factor or any other factor. It is found that in that period of time, all the mentioned tasks cannot be completed. So, here, the agent will complete those tasks first which will have the highest reward points. It is obvious that the hard and laborious tasks are assigned higher reward points. So, these tasks are automatically completed first by the agent and the leftover tasks are the easy ones which the user can easily do by himself without many efforts if he does not want the agent to work further. So, in this manner, the agent can be used efficiently, and our resources are not wasted and can be used judicially.

让我们借助示例进一步解释这一点。 假设设计了一种用于房屋清洁的试剂。 代理可以执行多种任务,例如拖把,除尘,洗涤用具,洗衣服。 在代理人的记忆中,提到的每个任务都有不同的奖励积分。 假设由于电力因素或任何其他因素,我们希望代理仅在有限的时间内工作。 发现在该时间段内,所有上述任务无法完成。 因此,在这里,代理将首先完成那些具有最高奖励积分的任务。 显然,艰苦而艰巨的任务会获得更高的奖励积分。 因此,这些任务首先由代理自动完成,而剩下的任务则是简单的任务,如果用户不希望代理进一步工作,则用户可以轻松地自己完成许多任务。 因此,以这种方式,可以有效地使用代理,并且不会浪费我们的资源并且可以将其合法地使用。

However, in Reinforcement Learning, the agent has to keep track of all the actions performed in pasts, their impact on the environment, the reward points secured on performing those actions and the feedback available for those actions. By inferring and learning from these points of the past activities, the agent improves its performance and utility for the future.

但是,在“ 强化学习”中 ,代理必须跟踪过去执行的所有动作,它们对环境的影响,执行这些动作时获得的奖励积分以及这些动作可用的反馈。 通过从过去活动的这些方面进行推断和学习,该代理可以改善其性能和对未来的效用。

翻译自: https://www.includehelp.com/ml-ai/reinforcement-learning-in-artificial-intelligence.aspx


