Title¶
🆓 Exercise: Q-Learning using DQN
Description¶
The aim of this exercise is to implement Deep Q Networks for a pre-defined reinforcement learning environment. For this, we will be using a pre-defined environment by OpenAI Gym. We will be using an environment called FrozenLake-v0. This is the same as what was used for the previous session (Refer to it to get information about the environment).
Instructions¶
- Initialize an environment using a pre-defined environment from OpenAI Gym.
- Get the number of possible actions in the environment.
- Define a simple feed-forward neural network with your choice of hidden layers and nodes.
- Define the action sampling policy. We will be using the Epsilon Greedy policy.
- Initialize sequential memory to store the data, which is the input to the DQN.
- Define the DQN and compile it with Adam optimizer.
- Fit and test the DQN model.
Hints¶
gym.make(environment-name) Access a pre-defined environment
Env.action_space.n : Returns the number of discrete actions
Env.observation_space.n : Returns the number of discrete states
Dense() A regular densely-connected NN layer
Flatten() Flattens the input.
Adam() Optimizer that implements the Adam algorithm
DQNAgent() Initializes the DQN Agent
SequentialMemory()
Keras-RL provides a class called rl.memory.SequentialMemory that provides a fast and efficient data structure that we can store the agent’s experiences in.
# Run this once and then comment it to ensure it does not run multiple times
# !pip install keras-rl2
# Import necessay libraries
import numpy as np
import gym
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.optimizers import Adam
from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory
# Initializing an environment using a pre-defined environment from OpenAI Gym
# The environment used here is 'FrozenLake-v0'
env = ___
# Get the number of actions within the environment
nb_actions = ___
# Define a Feed-Forward Neural Network
# Initialize a keras sequential model
model = ___
# Flatten the input to have an input shape of (1,) +
# shape of the environment state space i.e. the observation space
model.add(Flatten(___))
# Add Dense layers with Relu activation
# The number of hidden layers and number of nodes in each layer is your choice
___
# Add an output layer with number of nodes as the number of actions
___
# Define the policy to sample the actions
# We will be using the Epsilon-Greedy algorithm
policy = EpsGreedyQPolicy()
# To store our data initialize Sequential Memory with limit=500000 and window_length of 1
memory = SequentialMemory(___)
DQN AGENT¶
# Initialize the DQNAgent with the neural network model, nb_actions as the number of actions in the environment,
# set the memory as the sequential memory defined above, nb_steps_warmup as 100, policy as the epsilon greedy policy defined above
# and set the target_model_update as 1e-2
dqn = DQNAgent(___)
# Compile the DQN with Adam optimizer with learning rate of 1e-3 and metric as mse
dqn.compile(___)
# Fit the DQN by passing with environment with nb_steps as 5000
# You have an option to visualize the output, which is done by implicitly calling the render function of the environment
# However, this will slow down the training process and is not recommended for EdStem
# To see the complete training details, set verbose as 2
dqn.fit(___);
# Test your model by passing the environment and running for 10 episodes
dqn.test(___)