Eco514|Game Theory
Lecture 13: Repeated Games (2)
Marciano Siniscalchi
October 28, 1999
Introduction
[Again, by and large, I will follow OR, Chap. 8, so I will keep these notes to a minimum.]
Review of key de nitions
Recall our three payo aggregation criteria: discounting, i.e.
(uti)t 1 i (wti)t 1 ,
X
t 1
t 1(uti wti) > 0
(also recall that the payo pro le corresponding to a stream (ut) is taken to be (1
)Pt 1 t 1u(at)); limit of means:
(uti)t 1 i (wti)t 1 , lim inf
t!1
TX
t=1
uti wti
T > 0;
and overtaking:
(uti)t 1 i (wti)t 1 , lim inft!1
TX
t=1
(uti wti) > 0:
Also recall the de nition of machines:
De nition 1 Fix a normal-form game G. A machine for Player i 2 N is a tuple Mi =
(Qi;q0i;fi; i), where:
(i) Qi is a nite set (whose elements should be thought of as labels);
(ii) q0i is the initial state of the machine;
(iii) fi : Qi!Ai is the action function: it speci es what Player i does at each state; and
(iv) i : Qi A!Qi is the transition function: if action a2A is played and Player i’s
machine state is qi2Qi, then at the next stage Player i’s machine state will be i(qi;a).
1
Note: OR actually allows for arbitrary state spaces. For most proofs, nite state spaces
are enough. There is one exception, which I will note below.
De nition 2 Fix a normal-form game G. The minmax payo for player i, vi, is de ned by
vi = mina i2A i maxai2Aiui(ai;a i).
We are not going to worry about implementing mixtures of action pro les in this lecture,
so I just remind you of the de nition of enforceability:
De nition 3 Fix a normal-form game G. A payo pro le u 2 RN is enforceable (resp.
strictly enforceable) if ui vi (resp. ui >vi) for all i2N
Perfect Folk Theorems
The strategies in the proof of the Nash folk theorem (say, with discounting) call for inde nite
punishment following a deviation from the equilibrium. This is ne in games such as the
Prisoner’s Dilemma, in which the minmax actions are actually an equilibrium (indeed, in
PD, they are the only equilibrium!). That is, the threat of inde nite punishment is indeed
credible.
A D
A 2,3 1,6
D 0,1 0,1
Figure 1: Inde nite Punishment is not credible
However, consider the game in Figure 1. The Row player can hold the Column player
down to his minimum payo by choosing D, but D is strictly dominated for her. Hence, while
(2,3) can be enforced as a Nash equilibrium payo pro le in the in nitely repeated version
of the game by assuming that players switch to (D,D) following a deviation, this would not
work if we required subgame perfection, regardless of the payo aggregation criterion.
A warm-up exercise: Perfect Folk Theorems for Limit-of-Means
Thus, we must be smarter than that. A key intuition is that, after all, a deviator need not
be punished forever, but only long enough to wipe out any gains from his deviation.
Formally, x a game G = (N;(Ai;ui)i2N) and let M = maxi2N;a2Aui(a). Fix an outcome
a of G; clearly, a one-time deviation by Player i cannot yield more than M ui(a ). Thus, if
Player i deviates, we need only punish him so as to wipe out this payo di erential; clearly,
how long this will be depends on the payo aggregation criterion.
Intuitively, limit-of-means should make things very simple, because, following a deviation,
punishers face only a nite number of periods in which they must forego their equilibrium
2
payo stream in order to punish Player i. Under limit-of-means aggregation, nite periods
do not matter, so punishers are indi erent between punishing and not punishing.
There is only one subtlety: of course, the same argument applies to Player i: no one-time
deviation can be pro table for her! So, what exactly are we trying to deter?
The answer is that, although no nite deviation from the in nite repetition of a is prof-
itable for Player i, the following strategy could be. Suppose that, in the putative equilibrium
we wish to construct, a deviation is followed by L rounds of punishment (you can think of
L0 = 1 if you wish). Thus, if Player i deviates once, she gets an extra payo of (at most)
M ui(a ), but then loses ui(a ) vi utils in each of the subsequent L periods.
Now suppose L is small, so that M ui(a ) >L[ui(a ) vi]. For example, in the game
of Figure 1, suppose that i is the Column player, that a = (A;A), and that L = 1. Then a
deviation yields 3 utils, whereas one round of punishment costs 2 utils. Then, Player i can
adopt the following strategy: deviate, then play a best-response to p i (in Figure 1, play D)
for L periods; then, as soon as play is supposed to return to a , deviate and best-respond to
the minmax pro le, and so on. This is a pro table deviation. [Observe that it is also a neat
example of a game in which the one-deviation property holds!]
Thus, we must choose L large enough so
8i2N; M ui(a ) <L[ui(a ) vi]
In Figure 1, it is enough to choose L0 = 2.
To complete the argument, we must specify what happens if more than one player de-
viates, or if somebody deviates from the punishment stage. As in the proof of the Nash
Folk Theorem, multiple deviations lead to the lowest-index player being punished (they will
not occur in equilibrium anyway, so players cannot hope to get away with deviating because
somebody else will deviate, too).
Finally, if one or more punishers fail to punish, we simply disregard these further (un-
pro table) deviations: again, in equilibrium they are not going to occur, so players cannot
count on them to improve their predicament after a rst deviation.
This proves:
Proposition 0.1 [OR, Proposition 146.2] Fix a game G. Any feasible, strictly enforce-
able payo pro le of G is a subgame-perfect equilibrium payo pro le of the limit-of-means
in nitely repeated version of G.
Machines
You will undoubtedly notice that describing these strategies verbally is awkward; doing so
formally (as we have tried to do in the proof of the Nash Folk theorem) is even worse.
Machines can help. For instance, here is a set of machines that players can use to
implement the strategies in the proof of Proposition 0.1: Player i uses machine Mi =
(Q;q0;fi; ) (where Q;q0; are common to all players) de ned as follows.
3
First, Q =fN;P(j;t)g. N is the normal state, in which a is played. P(j;t) is the state
in which j is punished, and t more rounds of punishment are required.
Second, q0 = N. Third, fi(N) = a i, fi(P(j;t)) = p j;i if j6= i, and fj(P(j;t)) = rj(p j).
This should be obvious.
Finally, ( ; ) is such that we remain in N if nobody deviates, we switch from N to
P(j;L) if j is the lowest-index deviator, we always move from P(j;t) to P(j;t 1) if t6= 0,
and we always move from P(j;0) back to N.
Easy!
Discounting
The strategy pro le thus constructed is not a subgame-perfect equilibrium of the game in
Figure 1 if payo s are aggregated via discounting, for any discount factor. Suppose the
Column player deviates: then the Row player is supposed to choose D for 2 periods, and
hence receive a ow payo of 0 units. She will then receive 2 forever after. However, if she
deviates to A, nothing happens: the Column player continues to choose D, so again the Row
player can choose A. Obviously, she prefers (1,1,2,2,:::) to (0,0,2,2,:::)!
Thus, the key intuition is that we must somehow ensure that punishers are willing to
punish. In principle, one could think of punishing punishers, but this may fail to work with
discounting: essentially, second deviations might require longer punishment periods (because
the burden of carrying out the rst punishment lasts for L periods, and not just one), third
deviations might require even longer punishments, and so on. This is certainly the case in
the game of Figure 1.
The alternative is to reward punishers. This leads to OR’s Proposition 151.1.
I am only going to o er a few comments on the proof. The \conciliation" statesC(j) serve
to reward punishers, in a way. However, this is subtle: we never go back to the Nash state
C(0). What happens is, if j deviates and i punishes him, then after punishment we move to
C(j), which i prefers to C(i). Otherwise, we go to C(i) and stay there until somebody else
deviates.1 In my opinion, this is also a punishment of sorts, but of course you are welcome
to di er.
Also: the remark about the rst condition on being su cient has to do with the fact
that, after L periods of punishment, we move to C(j); since ui(a(j)) <ui(a(0)) by assump-
tion, this is actually a further punishment (which, er, actually reinforces my interpretation,
but never mind that). The point is that this punishment may not be enough, or may come
1Again, let me repeat this because I rst got it wrong in class (but then you guys spotted me!): whenever
we are in state C(j), we remain there if somebody deviates, and move to the state P(k;L) in which Player
k’s punishment begins if k deviates from a(j). Thus, what supports a(j) as a continuation equilibrium after
a deviation by Player j is the threat of further punishment; a(j) need not be a Nash equilibrium per se.
4
too late, so we make sure that Player j is hit really hard for L periods before switching to
the conciliation phase.
Finally, the second condition on has the following interpretation: by punishing Player j,
Player i potentially loses M ui(p j;rj(p j)) for the L periods following Player j’s deviation
(be it a deviation from a(0) or from whatever j was supposed to play). On the other hand,
after L rounds of punishments, the game will switch to the C(j) conciliation stage. Now,
Player i prefers to be in the C(j) state than in the C(i) state, by assumption, so if the
discount factor is large enough, she will not deviate.
[The only subtle point is that she may actually deviate at any t2f1;:::;Lg(where time
is measured starting from the rst stage after Player j’s deviation). If she deviates at t, she
will actually be held down to vi starting from t + 1 and until t + L L + 1; the condition
in the text is thus stronger than it needs be (it assumes a payo of M from t+ 1 to L, and
a punishment payo of ui(a(i)) from L+ 1 to t+L).]
5