Eco514|Game Theory Lecture 12: Repeated Games (1) Marciano Siniscalchi October 26, 1999 Introduction [By and large, I will follow OR, Chap. 8, so I will keep these notes to a minimum.] The theory of repeated games is a double-edged sword. On one hand, it indicates how payo pro les that are not consistent with Nash equilibrium in a simultaneous-move game might be achieved when the latter is played repeatedly, in a manner consistent with Nash or even subgame-perfect equilibrium. On the other hand, it shows that essentially every individually rational payo pro le can be thus achieved if the game is repeated inde nitely (and a similar, but a bit more restrictive result holds for nitely repeated games). Thus, the theory has little predictive power. [To make matters worse, the expression \repeated-game phenomenon" is often invoked to account for the occurrence of a non-equilibrium outcome in real-world strategic interactions. But, as usual, the theory refers to a clearly speci ed, highly stylized situation, which may or may not approximate the actual strategic interaction.] OR emphasize the structure of the equilibria supporting these outcomes, rather than the set of attainable outcomes themselves, as the most interesting product of the theory. Payo Aggregation Criteria De nition 1 Consider a normal-form game G = (N;(Ai;ui)i2N). The T-repeated game (T 1) induced by G is the perfect-information game = (N;A;H;Z;P;( i)i2N) where: (i) A = Si2N Ai is the set of actions available to players in G; (ii) H is the set of sequences of length at most T of elements of Qi2N Ai; (iii) P(h) = N (iv) i satis es weak separability: for any sequence (at)Tt=1 and pro les a;a02Qi2N Ai, ui(a) ui(a0) implies (a1;:::;at 1;a;at+1;:::) i (a1;:::;at 1;a0;at+1;:::). 1 The de nition uses preference relations instead of utility functions. This is to accommo- date payo aggregation criteria which do not admit an expected-utility representation. I will only illustrate the key features of the two \special" criteria proposed in OR, and then focus on discounting. I will mostly discuss in nitely-repeated games. Discounting In general, we assume that players share a common discount factor 2 (0;1), and rank payo streams according to the rule (uti)t 1 i (wti)t 1 , X t 1 t 1(uti wti) > 0 Now, we want to be able to talk about payo pro les of the stage game G as being achieved in the repeated version of G. Thus, for instance, if the pro le a2A is played in each repetition of the game, we want to say that the payo pro le u(a) is attained. But, of course, this is not the discounted value of the stream (u(a);u(a);:::): rather, this value is u(a) 1 .Thus, one usually \measures" payo s in a repeated game in \discounted units", i.e. in terms of the discounted value of a constant payo stream (1;1;:::), which is of course 11 . The bottom line is that, given the terminal history (a1;:::;at;:::), the corresponding payo pro le is taken to be (1 )Pt 1 t 1u(at). Limit-of-Means The key feature (OR: \the problem") of discounting is that periods are weighted di erently. One might think that, in certain situations, this might be unrealistic: OR propose the example of nationalist struggles. A criterion which treats period symmetrically is the limit of means. Ideally, we would want the value of a stream (uti)t 1 to Player i to be limT!1PTt=1 utiT . However, this limit may fail to exist.1 Thus, we consider the following rule: (uti)t 1 i (wti)t 1 , lim inft!1 TX t=1 uti wti T > 0 [Recall how lim infnxn is de ned: let yn = inffxm : m ng, and set lim infnxn = limnyn. Thus, lim infnxn > 0 i , for some > 0, xn > for all but nitely many n’s.] 1Here’s how to construct an example: call Vti the average up to T. First, set v1i = 1, so Vti = 1. Now consider the sequence (1,0,0): V 3i = 13. Now extend it to (1,0,0,1,1,1): V 6i = 23. Now consider (1,0,0,1,1,1,0,0,0,0,0,0): V 1i 2 = 13. You see how you can construct a sequence such that the sequence of averages ips between 13 and 23. 2 Now the stream (0;0;0;:::;0;2;2;2;:::) will always be preferred to (1;1;:::), whereas, for low enough , the reverse is true. Conversely, the limit-of-means criterion does not distinguish between ( 1;1;0;0;:::) and (0;0;:::), whereas, for all 2 (0;1), the latter stream is preferred to the former. Note that, for any pair of streams, there exists a discount factor such that the ordering of those two streams according to the limit-of-means and discounting criteria is the same (but, you need to pick a di erent for every pair of streams!) Overtaking The limit-of-means criterion completely disregards nite histories. Thus, for instance, it does not distinguish between (-1, 2, 0, 0, :::) and (-1, 1, 0, 0, :::). This might seem a bit extreme. Thus, consider the following variant: (uti)t 1 i (wti)t 1 , lim inft!1 TX t=1 (uti wti) > 0 Then, according to the overtaking criterion, a stream is preferred to another if it eventually yields a higher cumulative payo . In particular, (-1, 2, 0, 0, :::) is preferred to (-1, 1, 0, 0, :::). In some sense, the overtaking criterion is \the best of both worlds:" it treats all stage games symmetrically, and it does give some weight to nite histories. Having said that, we shall forget about any criterion other than discounting! Machines Let me just remind you of the de nition from OR. We do not really need them in this lecture. De nition 2 Fix a normal-form game G. A machine for Player i 2 N is a tuple Mi = (Qi;q0i;fi; i), where: (i) Qi is a nite set (whose elements should be thought of as labels); (ii) q0i is the initial state of the machine; (iii) fi : Qi!Ai is the action function: it speci es what Player i does at each state; and (iv) i : Qi A!Qi is the transition function: if action a2A is played and Player i’s machine state is qi2Qi, then at the next stage Player i’s machine state will be i(qi;a). Note that every machine de nes a strategy, but not conversely. This is clear: a strategy is a sort of \hyper-machine" which has as states the set of non-terminal histories; but of course there are in nitely many such histories, so we cannot \shoehorn" strategies into our de nition of machines. 3 Strategies implemented by machines are Markovian; informally, they are \simple" strate- gies. Nash Folk Theorem for the Discounting Criterion De nition 3 Fix a normal-form game G. The minmax payo for player i, vi, is de ned by vi = mina i2A i maxai2Aiui(ai;a i). De nition 4 Fix a normal-form game G. A payo pro le u 2 RN is feasible if u =P a2A a u(a) for some set of rational coe cients ( ;( a)a2A) with P a2A a = . It is en- forceable (resp. strictly enforceable) if ui vi (resp. ui >vi) for all i2N It should be clear that no payo pro le w such that wi < vi for some player i2N can be achieved in a Nash equilibrium of the in nite repetition of G: Proposition 0.1 [OR 144.1] If s is a Nash equilibrium of the in nitely repeated game , then the resulting payo pro le is enforceable. The proof is in OR. In particular, in a two-player game, if the machineM2 = (Q2;f2;q02; 2) implements a strategy for Player 2, then the machine M1 = (Q2;f1;q02; 2) with f1(q2) 2 r1(f2(q2)) yields at least v1 to Player 1. This is OR Exercise [144.2]. The main result in this lecture is known as the Nash Folk Theorem with Discounting: Proposition 0.2 [OR 145.2] For any feasible, strictly enforceable payo pro le w, and for every > 0, there exists < 1 and a feasible, strictly enforceable payo pro le w0 such that jw w0j< and, for > , w0 is a Nash equilibrium payo pro le of . The intuition is easiest to grasp if w = u(a) for some a2Qi2N ai: for each player, we construct a strategy which conforms to a as long as no other player has deviated, and plays an action which minmaxes the rst deviator for every history following a deviation. For high enough discount factor, the bene t from a single-stage deviation is outweighted by the fact that the deviator receives his minmax payo forever after. For arbitrary feasible payo s, we must use cycles. If the payo aggregation criterion was limit-of-means, we would be home free, because (1) the value of a nite stream in which each ui(a) is repeated exactly a times is precisely Pa2A a ui(a) = w; and (2) the value of an in nite repetition of the nite stream just de ned is again w. This is OR, Theorem 144.3. However, for discounting we need to do some extra bookkeeping. The value of the nite stream de ned in (1) is not exactly w, but it will be close to w for high enough : this will be our w0. The trick is then to measure time in terms of -cycles, rede ning the discount factor appropriately. 4 Proof: Write Qi2N Ai =fa(1);:::;a(K)g, where K =jQi2N Aij. Let be a sequence consisting of a(1) repetitions of a(1), followed by a(2) repetitions of a(2), :::, and nally by a(K) repetitions of a(K). De ne w00i ( ) = KX k=1 a(k)X ‘=1 x50k 1 k0=1 a(k0)+‘ 1ui(a(k)) i.e. the discounted value of the stream of payo s for Player i generated by the nite sequence .2 Essentially, the ‘-th repetition of ui(a(1)) is discounted by ‘ 1; the ‘-th repetition of ui(a(2)) is discounted by a(1)+‘ 1, because the ui(a(2))’s follow a(1) repetitions of ui(a(1)); and so on. Now observe that the value of the payo stream generated by a history h = ( ; ) is w00i ( )+ w00i ( ); thus, the value of the payo stream along the in nite sequence z = ( ; ;:::) is 1 1 w00i ( ); recall that we measure values in terms of the discounted value of a constant stream (1,1,:::), so we need to multiply the discounted sum by (1 ). Denote the latter quantity by w0i( ). Repeat this construction for every i 2 N to obtain a pro le w0( ). Clearly, for every > 0, there exists 1 such that > 1 implies that jw w0( )j< . We now construct a strategy si for each player i2N. To deter deviations, x, for each player j2N, a minmaxing action pro le p j2arg min a j2A j max aj2Aj uj(aj;a j) For every i6= j, denote by p j;i the i-th component of p j. Consider a player i 2 N. For each nite history h which consists of nitely many repetitions of , followed by an initial subsequence of , Player i plays the action she is supposed to play next according to . These actions determine the equilibrium path. Now consider a nite history h which is o the equilibrium path. Then there exists a subhistory h0 of h such that h0 is on the equilibrium path (i.e. players conform to ) but there exists at least one player j 6= i such that sj(h0) is not according to (if there are multiple deviators, choose the one with the lowest index). Then, at h, Player i plays p j;i. Finally, at any history h at which Player i must be punished (i.e. there exists a subhistory h0 of h on the equilibrium path such that i is the lowest-indexed deviator), choose si(h) 2 ri(p i).3 2We could signi cantly simplify the exponent of , but I choose to be explicit so you see what is really going on. 3Observe that, as soon as a rst deviation occurs, the lowest-indexed deviator is punished at each continu- ation history; in particular, if some player later fails to punish the deviator, nothing happens: the prescribed strategy is still to minmax the original deviator. 5 Finally, to see that no player has an incentive to deviate from s, x any history h on the equilibrium path. If Player i deviates at h, she obtains maxai2Aiui(ai;s i(h)) + 1 vi, because her opponents will minmax her, and she will best-respond to p i.4 Clearly, there exists 2 such that > 2 implies that the deviation is not pro table at h. And, since h is on the equilibrium path, it follows that the deviation is not pro table ex-ante. Thus, for > = max( 1; 2), w0( ) is the Nash equilibrium payo pro le, and jw0( ) wj< , as required. You can see that the speci cation of the strategies is rather awkward. Machines greatly simplify the task, as I will ask you to verify in the next problem set. 4Note the role of the equilibrium assumption: in principle, Player i could hope that a lower-indexed player j also deviates at h, but in equilibrium this does not happen. Hence, she expects to be the only deviator at h. 6