In order to discover the most rewarding actions, agents must collect information about their environment, potentially foregoing reward. The optimal solution to this "explore-exploit" dilemma is often computationally challenging, but principled algorithmic appr... ...