Whilst reading “The Master Algorithm” by Pedro Dominogos I came across an interesting discussion about the issue with exploration-exploitation and machine learning, using the example of a slot machine, you don’t want to waste all of your money finding out which one pays out more, but you also don’t want to assume you are using the one that pays the most.
The book talked about a solution by John Holland which is to “flip a biased coin each time, where the coin becomes exponentially more biased as you go along.”
I found this interesting so I decided to try and make a program to simulate this and to see how well it would perform.
This was what I came up with (Written in VB):
This code first takes input for a fixed probability of winning and the number of times to attempt at *pulling* a lever.
After each pull the program will decide what lever to pull next, it decides this by working out the percentage of wins using the first lever compared to the second lever, and will store this value as a decimal, the program then picks a random number between 0 and 1, flipping the coin. The larger the percentage of wins the more likely the number picked will fall within that region, therefore increasing the chance of that lever being picked, this then repeats the allotted number of times. In order to not prevent one of the levers being pulled by the percentage falling to a very low amount the percentages are capped at 10%/90% to keep the chances of picking either lever possible.
You can download and try the program here.