LoslegenKostenlos loslegen

Applying policy iteration for optimal policy

Policy iteration is a fundamental technique in RL for finding an optimal policy. It involves two main steps: policy evaluation, where you calculate the state-value function for a given policy, and policy improvement, where you update the policy based on these values. You'll apply these steps iteratively to converge to the optimal policy in the custom MyGridWorld environment.

The render_policy() function will be used to show the steps taken by an agent according to a policy.

The compute_state_value(state, policy) and compute_q_value(state, action, policy) have been preloaded for you.

Diese Übung ist Teil des Kurses

Reinforcement Learning with Gymnasium in Python

Kurs anzeigen

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Complete the policy evaluation function
def policy_evaluation(policy):
    V = {____: ____ for ____ in range(____)}
    return V
Code bearbeiten und ausführen