Episode generation for Monte Carlo methods
Monte Carlo methods require episodes to be generated in order to derive the value function. Therefore, you'll now implement a function that generates episodes by selecting actions randomly until an episode terminates. In later exercises, you will call this function to apply Monte Carlo methods on the custom environment env pre-loaded for you.
The render() function is pre-loaded for you.
Diese Übung ist Teil des Kurses
Reinforcement Learning with Gymnasium in Python
Anleitung zur Übung
- Reset the environment using a
seedof 42. - In the episode loop, select a random
actionat each iteration. - Once an iteration ends, update the
episodedata by adding the tuple(state, action, reward).
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
def generate_episode():
episode = []
# Reset the environment
state, info = ____
terminated = False
while not terminated:
# Select a random action
action = ____
next_state, reward, terminated, truncated, info = env.step(action)
render()
# Update episode data
episode.____(____)
state = next_state
return episode
print(generate_episode())