How to train Actor-Critic (A2C) reinforcement learning

Multi tool use
How to train Actor-Critic (A2C) reinforcement learning
I am currently been able to train a system using Q-Learning. I will to move it to Actor_Critic (A2C) method. Please don't ask me why for this move, I have to.
I am currently borrowing the implementation from https://github.com/higgsfield/RL-Adventure-2/blob/master/1.actor-critic.ipynb
The thing is, I am keep getting a success rate of approx ~ 50% (which is basically random behavior). My game is a long episode (50 steps). Should I print out the reward, the value, or what? How should I debug this?
Here are some log:
simulation episode 2: Success, turn_count =20
loss = tensor(1763.7875)
simulation episode 3: Fail, turn_count= 42
loss = tensor(44.6923)
simulation episode 4: Fail, turn_count= 42
loss = tensor(173.5872)
simulation episode 5: Fail, turn_count= 42
loss = tensor(4034.0889)
simulation episode 6: Fail, turn_count= 42
loss = tensor(132.7567)
loss = simulation episode 7: Success, turn_count =22
loss = tensor(2099.5344)
As a general trend, I have observed that for Success episodes, the loss tends to be huge, where as for Fail episode, the loss function output tends to be small.
1 Answer
1
I think you're making a mistake, if you really want to know how to implement Actor Critic algorithm you need first to master 2 things :
- Implementing value based RL algorithms (such as DQN).
- Implementing policy based RL algorithms (such as Policy Gradients).
You can't directly jump on actor critic models, in fact you can but you will understand nothing if you're not able to understand actor (policy based) and critic (value based) separately.
It's like if you wanted to paint the Joconde before beginning by learning how to paint.
My advice, take the time to learn these 2 elements before implementing an AC agent.
I made a free course with tensorflow and completes implementations here https://simoninithomas.github.io/Deep_reinforcement_learning_Course/
But again, implement architectures by yourself, copy an architecture is useless if you don't really understand it.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.