Last week, Jan Schmidt invited interested researchers (and others, too) to join a little betting community to bet on the games of the German Bundesliga (soccer league) season 2012/13 which will start on Friday night (further info here or under #scientipps on Twitter). There are also some prices to win.
I will join in, but will not bet myself - R will do that job for me! Predicting (sport) events with R became quite popular within the last couple of months: Vik Paruchuri predicted the NBA (semi)finals of the last season and Martin O’Leary tried to predict the last eurovision song contest final.
So, I though I should give it a try. Nevertheless, gaining data is more or less extensive. There is a giant database, but it’s commercial.
Additionally, soccer has never been that statistic-orientated like baseball or basketball (especially in the USA).
Data that is quite easy to gain are endtables and results of each game. I will at first rely on these two types of data which is offered from bulibox.de.
Predicting will be done using some machine learning algorithms like SVM. First tries with the last seasons did not seem to be that bad. I got over 50 Percent right predictions (which is at least more than just guessing ;-) ). Within the next days I will tweak the algorithm a little to get even better results.
Within the next days, there will be another post about the data I am using and the algorithm. Right now, I am thinking more about my (nick)name: Maybe SkyNet would be possible (but also very dangerous ;-) ), or even maybe Joshua.</p>