Title

🆓 Exercise: Q-Learning using DQN

Description

The aim of this exercise is to implement Deep Q Networks for a pre-defined reinforcement learning environment. For this, we will be using a pre-defined environment by OpenAI Gym. We will be using an environment called FrozenLake-v0. This is the same as what was used for the previous session (Refer to it to get information about the environment).

Instructions

  • Initialize an environment using a pre-defined environment from OpenAI Gym.
  • Get the number of possible actions in the environment.
  • Define a simple feed-forward neural network with your choice of hidden layers and nodes.
  • Define the action sampling policy. We will be using the Epsilon Greedy policy.
  • Initialize sequential memory to store the data, which is the input to the DQN.
  • Define the DQN and compile it with Adam optimizer.
  • Fit and test the DQN model.

Hints

gym.make(environment-name) Access a pre-defined environment

Env.action_space.n : Returns the number of discrete actions

Env.observation_space.n : Returns the number of discrete states

Dense() A regular densely-connected NN layer

Flatten() Flattens the input.

Adam() Optimizer that implements the Adam algorithm

DQNAgent() Initializes the DQN Agent

SequentialMemory()

Keras-RL provides a class called rl.memory.SequentialMemory that provides a fast and efficient data structure that we can store the agent’s experiences in.

In [2]:
# Run this once and then comment it to ensure it does not run multiple times
# !pip install keras-rl2
In [6]:
# Import necessay libraries
import numpy as np
import gym

from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.optimizers import Adam

from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory
In [15]:
# Initializing an environment using a pre-defined environment from OpenAI Gym 
# The environment used here is 'FrozenLake-v0'
env = ___

# Get the number of actions within the environment
nb_actions = ___
In [16]:
# Define a Feed-Forward Neural Network 

# Initialize a keras sequential model
model = ___

# Flatten the input to have an input shape of (1,) + 
# shape of the environment state space i.e. the observation space
model.add(Flatten(___))

# Add Dense layers with Relu activation
# The number of hidden layers and number of nodes in each layer is your choice

___

# Add an output layer with number of nodes as the number of actions
___
In [0]:
# Define the policy to sample the actions
# We will be using the Epsilon-Greedy algorithm
policy = EpsGreedyQPolicy()

# To store our data initialize Sequential Memory with limit=500000 and window_length of 1
memory = SequentialMemory(___)

DQN AGENT

DQN Agent

In [0]:
# Initialize the DQNAgent with the neural network model, nb_actions as the number of actions in the environment, 
# set the memory as the sequential memory defined above, nb_steps_warmup as 100, policy as the epsilon greedy policy defined above
# and set the target_model_update as 1e-2
dqn = DQNAgent(___)

# Compile the DQN with Adam optimizer with learning rate of 1e-3 and metric as mse
dqn.compile(___)

# Fit the DQN by passing with environment with nb_steps as 5000
# You have an option to visualize the output, which is done by implicitly calling the render function of the environment
# However, this will slow down the training process and is not recommended for EdStem
# To see the complete training details, set verbose as 2
dqn.fit(___);
In [0]:
# Test your model by passing the environment and running for 10 episodes
dqn.test(___)