Discovering and Exploiting Sparse Rewards in a Learned Behavior Space
{{output}}
Learning optimal policies in sparse rewards settings is difficult as the learning agent has little to no feedback on the quality of its actions. In these situations, a good strategy is to focus on exploration, hopefully leading to the discovery of a reward sig... ...