Atari Environments

Installation

Note

I have not had any luck with installing this on Windows, unfortunately. It had some CMake issues, which I resolved, but then I ran into some dependency issues with libtorrent. I will add the steps for when I actually get around to figuring that mess out. Until then Linux it is!

All you need to do is run the following:

pip install gym[atari,accept-rom-license]

The reason for choosing this environment is because we are going to start looking at some seminal papers of RL. Starting with what I like to call

The Experience replay paper 

I focus only on the pong environment for now.

Important takeaways from the paper are the following:

All frames must be preprocessed before handling them. The preprocessing steps involved are:
- To handle Atari environment’s flickering problem, store the max of every pixel between current & previous frame across RGB channels.
- Convert to luma only.
- Downscale to (84, 84)
- The input to the agent will a batch of 4 frames of size (84, 84) resulting in a tensor of shape (84, 84, 4)
Actions taken by the agent will be replicated for 4 frames.

3. An experience replay buffer must be implemented to account for the highly correlated samples, if taken from (say) one episode. This way, you could just randomly sample a batch of (say) 32 from this buffer. This will result in uncorrelated samples.

4. Two networks will be maintained, one for the agent & one for the target. The agent network’s weights will be updated every step, but the target network will be updated every ‘C’ steps. In my implementation, C was kept at 1000. This is done to avoid chasing a moving target.

Epsilon annealing is implemented to decay smoothly from 0.9 to 0.01.
Additionally, I made sure to create environment with *full_action_space*=False, which restricts the number of actions to 6.

env = gym.make("PongNoFrameskip-v4", full_action_space=False)

To avoid any unforseen crashes during training, I dump the learned model every 5000-10000 steps along with the replay buffer.

FYI, gym.Wrapper exists. So, you could inherit from this class & override implementations of important gym functions like step.
I modified the reward slightly: gym, by default, gives 0 reward for every step taken and +1 if the player wins & -1 if player loses.
I didn’t like this very much, since there is a chance of the agent learning to do nothing since all it receives is 0 until it wins. To encourage some exploration, I give a small negative reward(-0.2) for every action taken & +5 if agent wins, while -5 if agent loses.

Atari Environments

Installation

The Experience replay paper

The Experience replay paper 