Eco514|Game Theory Lecture 11: Subgame Perfection Marciano Siniscalchi October 21, 1999 Introduction The notion of subgame perfection is the cornerstone of the theory of extensive games. It embodies its key intuitions|and provides a vivid example of the di culties inherent in such a theory. But, above all, it has proved to be extremely pro table in a variety of applications. More- over, it has spawned a huge theoretical literature which has attempted (often successfully) to overcome the limitations and/or expand the domain of application of subgame-perfect equilibrium. Subgame perfection The basic intuition is apparent in the entry deterrence game of Figure 1. 1,1r 1 0,2 N E r2 f a -1,-1 f a N 0,2 0,2 E -1,-1 1,1 Figure 1: The Entry Deterrence game and its normal form. Player 1 is a potential entrant who may decide to Enter or Not Enter a market; following entry, the incumbent, Player 2, decides whether to Fight or Acquiesce. 1 The normal form of the Entry Deterrence game has two pure Nash equilibria, namely (N;f) and (E;a). However, the standard reasoning goes, the rst equilibrium is somewhat unreasonable: after all, if Player 1 were to Enter, Player 2 would not want to ght, ex-post. Thus, the (N;f) equilibrium is supported by a non-credible threat. For this reason, we wish to rule it out. Reinhard Selten proposed the following. First, note that any non-terminal history h in a game with observable actions de nes a subgame|the game de ned by all histories of which h is a subhistory. For instance, in the simple game of Figure 1, there is a subgame starting at (E), corresponding the histories (E), (E;a) and (E;f). The only other subgame is the whole game, which may be viewed as beginning after ;, the initial history. A subgame-perfect equilibrium of the original game is then a pro le of strategies which induces Nash equilibria in every subgame. Thus, the (N;f) equilibrium is not subgame- perfect, because f is not a Nash equilibrium (trivially, an optimal strategy) in the subgame starting at (E). We are ready for a few formal de nitions. De nition 1 Fix an extensive game with observable actions = (N;A;H;Z;P;(Ui)i2N) and a non-terminal history h2HnZ. The subgame starting at h is the extensive game with observable actions (h) = (N;A;Hjh;Zjh;Pjh;(Uijh)i2N), where: Hjh = fh0 : (h;h0) 2 Hg is the set of continuation histories, i.e. sequences of action pro les h0 such that (h;h0) is a history in H; Zjh is de ned analogously; Pjh is the restriction of the player correspondence P to continuation histories: formally, for every h02Hjh, Pjh(h0) = P((h;h0)); and each Uijh is de ned analogously. Similarly, for any strategy pro le s = (si)i2N 2S in the game , sjh = (sijh)i2N is the strategy pro le induced by s in (h) by letting sijh(h0) = si((h;h0)) for every h02Hjh and i2Pjh(h0). Note that, according to the de nition, the whole game corresponds to the subgame (;). De nition 2 Consider a game with observable actions. A strategy pro le s 2 S is a subgame-perfect equilibrium of i , for every non-terminal history h2HnZ, sjh is a Nash equilibrium of (h). Since = (;), any subgame-perfect equilibrium (or SPE for short) is necessarily also a Nash equilibrium. The example given above shows that the converse is false. Whenever one introduces a solution concept, the rst order of business is to establish the existence of a solution according to the concept under consideration. In nite games with perfect information, this is easy to achieve using the backward induction algorithm. Proposition 0.1 Every nite extensive game with perfect information has a SPE. 2 Proof: We argue by induction on the length of histories. Let L = maxf‘(h) : h2HnZg. (L) Pick any h with ‘(h) = L. Then (h) is a single-person decision problem; let bi(h) denote the optimal choice of i = P(h) at H, and de ne o(h) = (h;bi(h)), the corresponding outcome, and c(h) = bi(h), the continuation history induced by bi at h. (‘) Assume that the procedure has been carried out for all histories of length ‘+1;:::;L, and consider a history h with ‘(h) = ‘. Consider the decision problem faced by Player i = P(h) and de ned by the action set Ai(h), and by the assumption that action a2Ai(h) leads to outcome o((h;a)). Denote by bi(h) Player i’s choice at h, and de ne o(h) = o((h;bi(h)) and c(h) = (bi(h);c((h;bi(h)))). The procedure ends in nitely many steps. It should be clear that the resulting pro le b = (bi)i2N is a SPE. Note that the only reason why the argument does not generalize to games with observable actions is that we insist on choosing pure actions in each subgame. Backward induction is viewed as the natural extension of dynamic programming to in- teractive environments. From a purely formal standpoint, this view is of course correct. However, the interpretation of the two procedures is rather di erent; we return to this point below. The analogy with dynamic programming suggests that a version of the one-deviation principle might hold. This is indeed the case. A preliminary de nition: say that a game has nite horizon i it has no in nite histories. Thus, a nite game has nite horizon, but not conversely. Proposition 0.2 Consider an extensive game with observable actions and nite horizon. A strategy pro le s is a SPE of i there exists no player i 2 N, history h such that i2P(h), and action a06= si(h) with the property that the continuation strategy s0ijh de ned by s0ijh(;) = a0 and 8h02Hjhnf;g; s0ijh(h0) = si(h;h0) is a pro table deviation in (h). Proof: Clearly, if s is a SPE, s0ijh cannot be a pro table deviation from sijh in (h). Thus, assume there is no pro table one-time deviation, but that s is not a SPE. Then there must exist a player i2N, a history h and at least one continuation strategy s00ijh such that s00ijh is a pro table deviation from sijh in (h). Note that any pro table deviation s00ijh must di er from sijh at more than one history. Thus, pick a deviation s0ijh which di ers from sijh the least, i.e. such that no other devia- tion di ers from sijh at fewer histories (if there is more than one such deviations, pick one arbitrarily). This is well-de ned because the game has a nite horizon. Now let h0 be the longest (continuation) history of (h) where s0ijh di ers from sijh, i.e such that s0ijh(h0)6= sijh(h0). Again, this is well-de ned because the game has a nite horizon. 3 Finally, consider the sub-subgame (h0). By construction, s0ijh0 di ers from sijh0 only at the empty history of (h0), i.e. it constitutes a one-time deviation. Moreover, it must be a pro table deviation in (h0), because otherwise we could nd a deviation s00ijh in (h) which di ers from sijh at fewer histories than s0ijh. Contradiction. Subgame Perfection: A Critical Look The prototypical criticism of SPE is based on the Centipede game, depicted in Figure 2. 2,2r 1 1,0 d1 a1 r2 D A 0,2 r1 d2 a2 3,0 Figure 2: A Three-Legged Centipede The argument goes as follows. According to SPE, both players should go down at each history. However, consider Player 2’s predicament: he understands this, and he also un- derstands that Player 1 should understand this. Say that Player 2 is certain that Player 1 is rational; then, if in the course of the game he is called upon to move, how should he interpret the fact that Player 1 chose a1 despite the fact that (1) she is rational and (2) she understands that Player 2 will go down? If this argument sounds too informal to be right as stated, you are absolutely correct! However, it does contain at least a grain of truth. But consider also the game in Figure 3, which is perhaps even more striking. 3,3r 1 2,2 d1 a1 r2 D A 1,1 r1 d2 a2 0,0 Figure 3: Deviations from SPE? 4 Here (d1d2;D) is not a SPE, but it is a Nash equilibrium. It fails the test of perfection because, upon being reached (contrary to the equilibrium prescription!) Player 2 \should" really expect Player 1 to continue with a2, not d2, because \Player 2 is certain that Player 1 is rational." But this argument is even more ambiguous! Theorists have ercely argued over these and similar examples. I cannot do justice to either side of the debate without a full-blown model of interactive beliefs speci cally developed to account for dynamically evolving beliefs. However, for the time being, let me propose a few informal but rigorous observation. First, in a dynamic game, statements such as \Player 2 is certain that Player 1 is rational" are meaningless. One must specify when, in the course of the game, that statement is assumed to be true|when, for example, Player 2 is certain that his opponent is rational, or Player 2 is certain that Player 1 expects Player 2 to choose D. The reason is that, in a dynamic game (and especially in a game with observable actions!), players acquire new information as the play unfolds. This information may lead them to update, i.e. \re ne" their beliefs, or it may lead them to completely revise them. For example, Player 2 might be certain, at the initial history, that Player 1 chooses d1 at the initial history. However, if Player 1 chooses a1 instead, Player 2 observes this, so it would be impossible for her to continue to be certain after the history (a1) that she chooses d1. Player 2 has to revise her beliefs. Reasoning along these lines, the argument against backward induction in the Centipede game can be reformulated more convincingly. Suppose that, at the beginning of the game, both players’ beliefs are consistent with the unique SPE. Moreover, suppose that, at the beginning of the game, both players are certain that their opponent is rational (you can assume that there is common certainty of this: it is not going to matter). Could it be the case that Player 1 chooses a1 at the initial history, contrary to the SPE prediction? The answer is, surprisingly, yes! For instance, this will be the unique best response of Player 1 if she is certain at the initial node that: (1) Player 2 is certain, at the initial node, that Player 1 will play d1d2, and that Player 1 is rational (this much we had already assumed). (2) However, Player 2, upon observing a1, will revise her beliefs and become certain at (a1) that Player 1 is actually irrational and will choose a2 after (a1;A). (3) Player 2 is rational (again, this much was already part of our assumptions). Note that (2)+(3) imply that Player 2, if he has to choose, will pick A and not D. But then, of course, Player 1 will rationally choose a1 herself! The key point is that, even if players’ beliefs are concentrated on the equilibrium at the beginning of the game, it can still be the case that the way they revise their beliefs leads them to act di erently, both o and on the equilibrium path. To put it di erently, in order to justify SPE one has to make assumptions about how players revise their beliefs. Characterizing backward induction in perfect-information games 5 has become a little cottage industry, so I will spare you the references. A similar point applies to the game in Figure 3: the Nash equilibrium (d1d2;D) is sup- ported by the assumption that, upon being reached, Player 2 becomes certain that Player 1 is irrational. The reason I mention it is that, in the Centipede game, (d1d2;D) is the only Nash equilibrium, so normal-form analysis already yields the \standard prediction" without having to rely on a potentially problematic extensive-form analysis. The game in Figure 3 does not have this feature, so it might be regarded as a better example (Phil Reny should be credited for this). 6