LoslegenKostenlos loslegen

Applying double Q-learning

This exercise tasks you with applying the Double Q-learning algorithm in the same custom environment you solved with Expected SARSA to investigate the difference. Double Q-learning, by using two Q-tables, helps reduce the overestimation bias inherent in the traditional Q-learning algorithm and offers more stability in learning than other temporal difference methods. You'll use this method to navigate through the grid environment, aiming for the highest reward while avoiding mountains in order to reach the goal as quickly as possible.

new_cust_env.png

Diese Übung ist Teil des Kurses

Reinforcement Learning with Gymnasium in Python

Kurs anzeigen

Anleitung zur Übung

  • Update the Q-tables using the update_q_tables() function you coded in the previous exercise.
  • Combine the Q-tables by summing them.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

Q = [np.zeros((num_states, num_actions))] * 2
for episode in range(num_episodes):
    state, info = env.reset()
    terminated = False   
    while not terminated:
        action = np.random.choice(num_actions)
        next_state, reward, terminated, truncated, info = env.step(action)
        # Update the Q-tables
        ____
        state = next_state
# Combine the learned Q-tables        
Q = ____
policy = {state: np.argmax(Q[state]) for state in range(num_states)}
render_policy(policy)
Code bearbeiten und ausführen