reinforcement learning where car drives using reinforcement learning.
Deep Racer training algorithms
- SAC (Soft actor critic)
- PPO (proximal policy optimization)
SAC > It is data efficient but lacks stability. It works only in continuous action space.
PPO > It is data hungry and stable. It works in both discrete and continuous space.
vocabulary to know
action space : available choices for the agent
The reward is incentivizing (encouraging ) the car to perform better.
The longer the car explores the better result is found.