Let A is a known set of actions. Ra is a distribution of rewards, given action a. At a timestep t, an agent selects an action a and gets a reward Rt ~ Ra. The goal is to maximize the cumulative rewards.
Search
Jul 19, 20241 min read
Let A is a known set of actions. Ra is a distribution of rewards, given action a. At a timestep t, an agent selects an action a and gets a reward Rt ~ Ra. The goal is to maximize the cumulative rewards.