Betting Markets as Inspiration for a Robust ML Framework : Part 2 – Prediction Markets

Blog / Andrew Simmons / September 13, 2017

In Part 1 of this series, we discussed how, when AI models are combined together in an ensemble, they tend to cancel out each other’s errors, thus leading to more accurate predictions. However, we also observed that, while large ensembles beat simpler models in terms of accuracy on paper, they are plagued with engineering concerns that prevent their adoption in production systems.

Could betting markets offer a solution?

At first, turning to gamblers to inform a statistical prediction sounds like a terrible idea. People who bet aren’t usually economically rational, at least from a financial perspective (or else betting companies wouldn’t be able to profit). Rather, people generally gamble for psychological reasons: for excitement, or as a distraction from the stress in their lives [1]. There are also some statistically savvy individuals that design models in an attempt to beat the market [2]. But beware, people are prone to fall victim to irrational beliefs about probability while gambling, even if they have a strong theoretical knowledge of statistics [3].

Yet somehow, despite the large number of irrational bets, the betting markets manage to use demand to produce surprisingly accurate odds estimates for a range of sports and races. This is because betting markets, like stock markets, utilise the “invisible hand” of many individual participants to form a fair estimate. Even if all the irrational gamblers show a systematic bias to overestimate the same team, intelligent gamblers will bet on the opposite team, inadvertently creating fairer odds as a result. In the stock market, this takes the form of the efficient market hypothesis according to which it is impossible for a person to quickly increase their own wealth by taking advantage of patterns in the market, because the market will self-account for any biases by creating strong incentives for rational investors to buy underpriced shares (thus pushing the price up), and to sell any overpriced shares (thus pushing the price down). Some investors try to take advantage of “market inefficiencies” using technical analysis, however, even if a pattern is found, the effect is usually quite small.

The interesting thing for me here, is that we don’t need to restrict who can participate in markets. People who can find a market inefficiency will (on average) slowly gain wealth, whereas those who act irrationally will gradually (or sometimes quite quickly) lose wealth.

Prediction markets are betting markets for the purpose of predicting something we want to know. For example, even if not interested in betting, you might be interested in the odds of a bet on who will win a foreign political election in order to anticipate threats to your upcoming travel plans. Prediction markets can be set up for anything – applications include estimating project completion dates [4], predicting the weather [5], or even estimating the reproducibility of a research experiment [6].

Prediction markets have been taken to the extreme by Augur (see this Wired article for an introduction). This technology uses the Ethereum blockchain to record bets and pay out the winner. In contrast to Bitcoin which uses a proof of work system to witness one-way monetary transfers that become irreversibly encoded in the blockchain, Ethereum uses a proof of work system to witness ‘contracts’ that specify the conditions under which someone gets paid.

App developers can incorporate the prediction of a market on Augur into their app. For example, if you are developing a weather app, and want to incorporate the best aspects of each of the various weather forecast services [7], then you could setup an Augur prediction market for the weather. Long term predictions are best left to machines, but short term localised predictions still require input from human experts to create the best prediction possible [8]. Unlike TV presenters who have an incentive to overstate the possibility of rain in order to avoid being blamed for ruining picnics [9], a prediction market would create a financial incentive for the market to self-correct any biases.

Comparison of human and AI meta-prediction systems

Domain Mechanism Agents Participation Existing Solutions
Anything Average AI In-house Ensemble
Selected Leaderboard AI Open Kaggle
Stock Trade Human Open Stock Exchange
Stock Dutch Auction AI Open Numerai
Sports Bet Human Open Sports Betting
Anything Bet Human Open Prediction Market
Anything Bet Human(and AI?) Open Augur

Increasingly, we are seeing that automated AI processes have a role to play in markets. Unlike traditional machine learning ensembles that are developed in-house, the invisible hand of the market acts a bit like a built-in ensembling mechanism that combines the estimates from a wide range of both humans and AI. Systems like Numerai aim to make the stock market data more open for data scientists to experiment with, although takes some of the risk out by using a Dutch auction mechanism [10] to reward the best guesses rather than expecting data scientists to stake money directly on the stock market itself.

The (possible) future of data science

Augur seems to be focussed on human bets on large events like political elections, but I think that that prediction markets may also have applications to small events. Something like “how likely are you to want the oven to pre-warm when you come home?” isn’t valuable or interesting enough to incentivise human bets, but might be a market for automated predictions of many machine learning algorithms. A smarthome could use these predictions to better control your home, and could share your sensor data publicly (if you permit it) in order to help the market create the best predictions possible. Imagine if every aspect of life that you might be interested in estimating, whether the time to copy a file to your USB drive, or the amount of time your rice will take to cook before it starts sticking to the base of the rice-cooker, was made an open challenge for the world’s-best prediction algorithms rather than the crappy built-in algorithms we have had to put up with for so long.

In our own workplace, we have had some issues with how to allow the data science team to scale their algorithms into production without being overburdened by engineering concerns. By setting up a prediction market, it would allow the data science team to spend their time creating new models, and allow the power of markets to take care of the rest. With the infrastructure in place, we could even start to hire more remote data scientists who would only need to understand the prediction market interface in order to start building models which will influence the predictions made by the system.

Further research needed

I’d like to stress, that this post was about exploring a new idea. More research is necessary before we put these systems into production. In particular, I’d like a formal guarantee of the overall stability of prediction markets rather than assuming that a market dominated by AI predictions would work out of analogy to current markets dominated by human predictions.

For example, humans tend to be loss averse: say I were to offer you a 50/50 chance to win $300,000 in exchange for $100,000. An algorithm programmed to maximise expected value would always accept, but humans care more about the logarithm of their assets. You might accept this offer if you were a millionaire, but if $100,000 was your entire life savings, then, assuming you’re not a compulsive gambler, you probably wouldn’t accept because losing would bring the logarithm of your assets to log($0) = -∞. This was first explained by Bernoulli in 1738: “utility resulting from any small increase in wealth will be inversely proportionate to the quantity of goods previously possessed” [11, p.25] (the integral of 1/x is the logarithmic function). This is the reason why people buy insurance, even though on average they will pay more than they gain. The Kelly criterion builds upon this to suggest how humans should diversify their share portfolios to minimise risk rather than investing everything in the most profitable sector. In contrast, an AI simply maximising expected profit has no incentive to diversify. Note that Bernoulli’s expected utility function (and the equivalent Kelly criterion) isn’t a perfect model of human behaviour – there are more sophisticated models based on psychological experiments, such as Prospect Theory, that attempt to capture how humans actually behave in these kinds of scenarios.

Furthermore, I wonder about the ethical issues that arise if prediction markets were used for safety critical systems. For example, a market to predict traffic congestion could be used to help ambulances determine the fastest route to the hospital. An open market could potentially produce better predictions than relying on a single model, and better predictions means more lives saved. But it would be infeasible to audit every model and data-source that influences a market. The business world seems to accept that stock value predictions will occasionally fluctuate by hundreds of billions of dollars due to high-frequency trading triggered by something as minor as a fake tweet [12]. But is this acceptable when the currency is lives rather than dollars? How would the family of a patient that dies because the ambulance didn’t get to the hospital in time react to learning that the ambulance took a longer route than necessary because of a misinterpreted tweet?


  1. Rickwood, D., Blaszczynski, A., Delfabbro, P., Dowling, N. and Heading, K., 2010. The psychology of gambling. InPsych, 32(6), pp.11-21.
  2. Clarke, S., 2011. Want to win at gambling? Use your head. The Conversation
  3. Benhsain, K. and Ladouceur, R., 2004. Knowledge in statistics and erroneous perceptions in gambling. Gambling Research: Journal of the National Association for Gambling Studies (Australia), 16(1), pp.25-31.
  4. [Blog] Google, 2005. Putting crowd wisdom to work
  5. [Blog] Dubner, S.J., 2007. Betting the Weather. Freakonomics
  6. Dreber, A., Pfeiffer, T., Almenberg, J., Isaksson, S., Wilson, B., Chen, Y., Nosek, B.A. and Johannesson, M., 2015. Using prediction markets to estimate the reproducibility of scientific research. Proceedings of the National Academy of Sciences, 112(50), pp.15343-15347.
  7. Chubb, T., 2014. How does the Bureau’s new mobile weather site stack up?. The Conversation
  8. Baars, J.A. and Mass, C.F., 2005. Performance of National Weather Service Forecasts Compared to Operational, Consensus, and Weighted Model Output Statistics. Weather and Forecasting, 20(6), pp.1034-1047.
  9. Silver, N., 2012. The Weatherman Is Not a Moron. The New York Times Magazine
  10. Craib, R., Bradway, G., Dunn, X. and Krug, J., 2017. Numeraire: A Cryptographic Token for Coordinating Machine Intelligence and Preventing Overfitting
  11. Bernoulli, D., 1954. Exposition of a New Theory on the Measurement of Risk. Econometrica 22(1) pp.23-36. (English translation of 1738 original).
  12. Matthews, C., 2013. How Does One Fake Tweet Cause a Stock Market Crash?. Time

Header image courtesy of Carlos Santiago and gnokii.

Thanks to Nicola Pastorello, Maria Mitrevska and Shannon Pace for proofreading and providing suggestions.