Resources

I’ll try to accumulate some links that I found to be quite useful. This is still a TO DO : )

Multi-armed Bandit

TowardsDataScience

The above link has a good explanation of epsilon-greedy as well.

Markov Decision Processes

Dynamic Programming

It is exactly what you think. Reusing earlier results to compute new results.

Monte Carlo

It is a fancy way of saying “compute the average”

TD Learning

TowardsDataScience

Approximation Methods

TowardsDataScience