Deep reinforcement learning - how to deal with boundaries in action space

Multi tool use
Deep reinforcement learning - how to deal with boundaries in action space
I've built a custom reinforcement learning environment
and agent
which is similar to a labyrinth game.
environment
agent
In labyrinth there're 5 possible actions: up, down, left, right, and stay. While if blocked, e.g. agent can't go up, then how do people design env
and agent
to simulate that?
env
agent
To be specific, the agent is at current state s0
, and by definition taking actions of down, left, and right will change the state to some other values with an immediate reward (>0 if at the exit). One possible approach is when taking action up
, the state will stay at s0
and the reward will be a large negative number. Ideally the agent will learn that and never go up
again at this state.
s0
up
s0
up
However, my agent seems not learning this. Instead, it still goes up
. Another approach is to hard code the agent and the environment that the agent will not be able to perform the action up
when at s0
, what I can think of is:
up
up
s0
up
up
I'm asking is the above approach feasible? Will there be any issues related to that? Or is there a better design to deal with the boundary and invalid actions?
2 Answers
2
I would say this should work (but even better than guessing is trying
it). Other questions would be: What is the state your agent is able to observe? Are you doing reward clipping?
On the other Hand, if your agent did not learn to avoid running into walls there might be another Problem within your learning Routine (maybe there is a bug in the reward function?)
Hard coded clipping Actions might lead to a behavior which you want to see, but it certainly cuts down the Overall performance of your agent.
Whatelse did you implement? If not done yet, it might be good to take experience replay into account.
I have seen this problem many times where an agent would stuck to a single action. I have seen that in the following cases:
I hope it could help.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.