Resources
I’ll try to accumulate some links that I found to be quite useful. This is still a TO DO : )
Multi-armed Bandit
The above link has a good explanation of epsilon-greedy as well.
Markov Decision Processes
Dynamic Programming
It is exactly what you think. Reusing earlier results to compute new results.
Monte Carlo
It is a fancy way of saying “compute the average”