Saturday, April 2, 2011

Understanding Game Theory and Hold’em, by Bryce Paradis and Douglas Zare

I was watching this video and one of the ladies mentioned something about the GTO concept. It got me curious and here's what I found--

Game theory has become a popular, if somewhat misunderstood, topic for hold’em discussion. This article is intended to give you a fundamental understanding of what game theory optimal strategy is, how it works, and what its impact is on hold’em play. Before we begin on the article proper, however, we will start by reviewing some key definitions. These definitions are not necessarily the same as those used by all others.

Optimal Exploitive Strategy: A strategy which yields the highest possible EV against your opponent’s strategy. For example, if in a game of rock-paper-scissors your opponent’s strategy is to choose rock every single time your optimal exploitive strategy is to pick paper every single time. The same is true if your opponent’s strategy is rock 50%, paper 25%, and scissors 25%.

Suboptimal Strategy: A strategy which performs worse than an optimal exploitive strategy. For example, if your opponent’s strategy is to choose rock every single time, choosing paper 50% and rock 50% is still a winning strategy. The EV of the paper-and-rock strategy, however, is less than that of the paper-only strategy. Therefore the paper-and-rock strategy is suboptimal.

Game Theory Optimal (GTO): A strategy that yields the highest possible EV (or: “is optimal”) if your opponent always chooses the best possible counter-strategy. In a game of rock-paper-scissors the GTO strategy is to choose randomly from an equal distribution of paper, scissors, and rocks. If you play rock less often than paper, you will have less than ½ equity against an all scissors strategy. Similarly, you must play paper at least as often as you play scissors, and scissors at least as often as you play rock. As a result, you must play paper, scissors, and rocks with equal frequency to guarantee ½ equity against all strategies. So long as your opponent always chooses the optimal counter-strategy to whatever strategy you choose no strategy on your part can have a higher EV than this.

Exploitive Strategy: Any strategy which has a higher EV than GTO strategy against a particular opponent.

Exploitable Strategy: A strategy which has less EV against some exploitive strategies than GTO strategy. All non-GTO strategies are exploitable.

When analyzing optimal, exploitive strategies, we treat an opponent’s strategy as a known. For example: “my opponent always chooses rock.” In reality, our opponent’s strategy is an unknown, and we often act on assumptions and observations in order to determine what we will treat our opponent’s strategy as. To determine a GTO strategy, we assume that our opponent always chooses the optimally exploitive counter to whichever strategy we try, rather than playing a fixed strategy.

Hold’em is a much more complicated game than rock-paper-scissors, and until the game is solved by computers no one will ever play against an opponent who always chooses a GTO (or: “unexploitable”) strategy. This is an important point, as a GTO strategy is not necessarily the strategy with the highest possible EV. For example, if our opponent’s strategy is rock-only then the GTO strategy of choosing randomly from an equal distribution of paper, scissors, and rocks has less EV than that of the paper-only strategy.

GTO play, however, still plays an important role in hold’em strategy. Even though a GTO strategy may have less EV an exploitive strategy, understanding what the GTO strategy is and being able to identify how our opponents’ strategy deviates from it can help you to better exploit your opponents. Further, understanding GTO strategy can also allow to be able to create balanced strategies which are difficult to exploit. These strategies can be used as a defense against tough opponents looking for an exploitive edge.

In hold’em, as in many simple games such as rock-paper-scissors, a GTO strategy is often identifiable by finding an indifference point. What this means is that the GTO strategy will often distribute your actions in such a way that your opponent is indifferent to choosing between two actions. As a result your strategy is unexploitable.

Although hold’em has not been solved, many half-street and full-street mini-games which model real hold’em situations have been solved. By understanding where the indifference points lie in different hold’em scenarios, you can identify your opponent’s deviations from GTO play and exploit your opponent maximally. At its most basic conceptual level hold’em is still a very simply game: rather than playing with a distribution of paper, scissors and rocks we play with a distribution of bluffs and not-bluffs. By understanding even just the simplest mini-games you can greatly improve your play.

A common example of a half-street game would be one where we either hold hands that always win, or always lose if we see a showdown, and can either bet or check, and our opponent may only call or fold. If he calls, there is a showdown. This is often analogous to a river-betting scenario in real hold’em play where our opponent’s range is narrow and ours is polarized. By solving the mini-game we can see that the GTO strategy is to bluff an amount proportionate to the price we are laying our opponent on his call. For example, if we bet $1 into a $2 pot we are laying 3:1 by betting, and the GTO strategy is to bluff 25% of the time that we bet. Our opponent will be indifferent to calling or folding. As a result, we know that if we deviate from this strategy our opponent can exploit us by either always calling if we bluff more, or always folding if we bluff less.

Conversely, in this scenario the pot is laying us 2:1 on our bluffs, and so we become indifferent to betting or checking with our bluffs if our opponent calls 67% of the time. This is our opponent’s GTO strategy. If our opponent deviates from this strategy we can exploit him by always bluffing if he calls less, or by never bluffing if he calls more.

If our opponent deviates from GTO strategy in the previous example, the optimal exploitive strategies of either always folding or always bluffing have higher EV than any exploitive strategies which involve bluffing or folding less than 100% of the time. Weak opponents are weak not only because they choose exploitable strategies so often, but because we can also make such large deviations from indifference points without them adapting to exploit us.

Not all GTO decisions involve finding an indifference point. For example, say we are playing a variant of rock-paper-scissors where there is a fourth option to choose dynamite, which beats everything. The GTO strategy is to choose dynamite-only. Your opponent, however, may still select a dominated strategy by choosing either paper, scissors, or rock. Similar circumstances arise in hold’em, for example, when the nuts is such a large portion of our total range that we are unable to bluff often enough to make our opponent indifferent to calling or folding.

What this means is that while a GTO strategy can never be exploited, and can therefore never be a losing strategy in hold’em (if there is no rake), your opponents can still make dominated strategy decisions which will cause them to lose, and you to win. Therefore, while GTO strategies in hold’em are often suboptimal, the prospect of these “invincible strategies” still hold some exciting implications for a savvy student of game theory, particularly at the highest levels of play.

A tough opponent is only tough, after all, because he or she chooses makes far fewer suboptimal strategy decisions than soft opponent. An extraordinarily tough opponent will have an extremely refined capacity for dynamic play. If you choose a strategy of rock-only, he or she will quickly recognize it and choose paper-only, and so on. Such players will quickly identify trends in your play, or even make pre-emptive assumptions about your play, which may allow them to exploit your non-GTO strategies with unnerving frequency and accuracy.

It is appealing to think that by selecting a GTO strategy, our opponents could only lose. However, even the strongest opponents have exploitive (and therefore potentially-exploited) strategies in their play, and hold’em is, after all, a game of incomplete information. If you are playing against an extremely tough opponent who you know uses a strategy analogous to paper 33%, scissors 20%, and rock 47%, you would be foolish to attempt a strategy of paper-only. By definition of your opponent’s toughness, your opponent will quickly adapt to exploit you. By understanding where the indifference points lie, however, and by making small deviations from them, you can still play exploitatively. Even the toughest, most cut-throat opponents are not clairvoyant, after all, and if you elect an exploitive strategy of paper 40%, scissors 30%, rock 30% how are they to know?