Bayesian Learning in Social Networks
1
Douglas Gale (Corresponding Author)
Department of Economics, New York University
269 Mercer St., 7th Floor, New York, NY, 10003-6687.
E-mail: douglas.gale@nyu.edu
Url: http://www.econ.nyu.edu/user/galed/
Phone: (212) 998-8944
Fax: (212) 995-3932
and
Shachar Kariv
Department of Economics, New York University
269 Mercer St., 7th Floor, New York, NY, 10003-6687.
E-mail: sk510@nyu.edu
Url: http://home.nyu.edu/~sk510
Version: March 13, 2003.
We extend the standard model of social learning in two ways. First, we
introduce a social network and assume that agents can only observe the actions
of agents to whom they are connected by this network. Secondly, we allow agents
to choose a di?erent action at each date. If the network satisfies a connectedness
assumption, the initial diversity resulting from diverse private information is
eventually replaced by uniformity of actions, though not necessarily of beliefs,
in finite time with probability one. We look at particular networks to illustrate
the impact of network architecture on speed of convergence and the optimality
of absorbing states. Convergence is remarkably rapid, so that asymptotic results
are a good approximation even in the medium run.
Journal of Economic Literature Classification Numbers: D82, D83
Key Words: Networks, Social learning, Herd behavior, Informational
cascades.
Running Title: Bayesian Learning in Social Networks.
1
One of us discussed this problem with Bob Rosenthal several years ago, when we
were both at Boston University. At that time, we found the problem of learning in
networks fascinating but made no progress and were eventually diverted into working on
boundedly rational learning, which led to our paper on imitation and experimentation.
We thank seminar participants at NYU, DELTA, INSEAD, Cergy, Cornell and Iowa
for their comments. The financial support of the National Science Foundation through
Grant No. SES-0095109 is gratefully acknowledged.
1
1. INTRODUCTION
The canonical model of social learning comprises a set of agents I,a
finite set of actions A,asetofstatesofnature?,andacommonpayo?
function U(a,ω),wherea is the action chosen and ω is the state of nature.
Each agent i receives a private signal σ
i
(ω), a function of the state of nature
ω, and uses this private information to identify a payo?-maximizing action.
Thissetupprovidesanexampleofapure information externality.Each
agent’s payo? depends on his own action and on the state of nature. It
does not depend directly on the actions of other agents. However, each
agent’s action reveals something about his private signal, so an agent can
generally improve his decision by observingwhatothersdobeforechoosing
his own action. In social settings, where agents can observe one another’s
actions, it is rational for them to learn from one another.
This kind of social learning was first studied by Banerjee (1992) and
Bikhchandani, Hirshleifer and Welch (1992). Their work was extended by
Smith and S?rensen (2000). These models of social learning assume a sim-
ple sequential structure, in which the order of play is fixed and exogenous.
They also assume that the actions of all agents are public information.
Thus, at date 1, agent 1 chooses an action a
1
, based on his private in-
formation; at date 2, agent 2 observes the action chosen by agent 1 and
chooses an action a
2
based on his private information and the information
revealed by agent 1’s action; at date 3, agent 3 observes the actions chosen
by agents 1 and 2 and chooses an action a
3
...; and so on. In what follows
we refer to this structure as the sequential social-learning model (SSLM).
One goal of the social learning literature is to explain the striking uni-
formity of social behavior that occurs in fashion, fads, “mob psychology”,
and so forth. In the context of the SSLM, this uniformity takes the form
of herd behavior.
2
Smith and S?rensen (2000) have shown that, in the
SSLM, herd behavior arises in finite time with probability one. Once the
proportion of agents choosing a particular action is large enough, the pub-
lic information in favor of this action outweighs the private information of
any single agent. So each subsequent agent “ignores” his own signal and
“follows the herd”.
This is an important result and it helps us understand the basis for
uniformity of social behavior.
3
Atthesametime,theSSLMhasseveral
2
A herd occurs if, after some finite date t, every agent chooses the same action. An
informational cascade occurs if, after some finite date t,everyagentfinds it optimal to
choose the same action regardless of the value of his private signal. An informational
cascade implies herd behavior, but a herd can arise without a cascade.
3
The most interesting property of the models of Bikhchandani, Hirshleifer and Welch
(1992) and Banerjee (1992) is that informational cascades arise very rapidly, before much
information has been revealed. For example, in these models if the first two agents make
the same choice, all subsequent agents will ignore their information and imitate the first
two. The behavior of a potential infinity of agents is determined by the behavior of the
first two. This is both informationally ine?cient and Pareto ine?cient.
2
special features that deserve further examination: (i) each agent makes
a single, irreversible decision; (ii) the timing of the agent’s decision (his
position in the decision-making queue) is fixed and exogenous; (iii) agents
observe the actions of all their predecessors; and (iv) the number of signals,
like the number of agents, is infinite, so once a cascade begins the amount
of information lost is large. These features simplify the analysis of the
SSLM, but they are quite restrictive.
In this paper, we study the uniformity of behavior in a framework that
allows for a richer pattern of social learning. We depart from the SSLM
in two ways. First, we drop the assumption that actions are public infor-
mation and assume that agents can observe the actions of some, but not
necessarily all, of their neighbors. Second, we allow agents to make deci-
sions simultaneously, rather than sequentially, and to revise their decisions
rather than making a single, irreversible decision. We refer to this structure
as the social network model (SNM). For empirical examples that illustrate
the important role of networks in social learning, see Bikhchandani, Hirsh-
leifer and Welch (1998).
Onthefaceofit,uniformbehaviorseemslesslikelyintheSNM,where
agents have very di?erent information sets, than in the SSLM. However,
uniformity turns out to be a robust feature of connected social networks.
4
The following results are established for any connected network:
Uniformity of behavior: Initially, diversity of private information leads to
diversity of actions. Over time, as agents learn by observing the actions
of their neighbors, some convergence of beliefs is inevitable. A central
question is whether agents can rationally choose di?erent actions forever.
Disconnected agents can clearly ‘disagree’ forever. Also, there may be
cases where agents are indi?erent between two actions and disagreement
of actions is immaterial. However, apart from cases of disconnectedness
and indi?erence, all agents must eventually choose the same action. Thus,
learning occurs through diversity but is eventually replaced by uniformity.
Optimality: We are interested in whether thecommon action chosen asymp-
totically is optimal, in the sense that the same action would be chosen if
all the signals were public information. In special cases, we can show that
asymptotically the optimal action is chosen but, in general, there is no
reason why this should be the case.
Although the process of learning in networks can be very complicated,
the SNM has several features that make the asymptotic analysis tractable.
The first is the welfare-improvement principle. Agents have perfect recall, so
expected utility is non-decreasing over time. This implies that equilibrium
4
A network is a directed graph in which the nodes correspond to representative agents.
Agent i can observe the actions of agent j if i is connected to agent j.Anetworkis
connected if, for any two agents i and j, there is a sequence i
1
,...,i
K
such that i
1
= i,
i
K
= j and i
k
is connected to i
k+1
for k =1,...,K ?1.
3
payo?s form a submartingale. We use the martingale convergence theorem
to establish that an agents’ (random) payo? converges almost surely to a
constant.
Another useful property of the model is the imitation principle. If agent
i can observe the actions of agent j, then one strategy available to him is
to imitate whatever j does. Since i and j have di?erent information sets,
their conditional payo?s under this strategy may be di?erent. However, on
average, i must do as well as j.
The imitation principle, together with the connectedness of the network,
is used to show that, asymptotically, i and j must get the same average
(unconditional) payo?s. It turns out that this is only possible if agents
choose the same actions. More precisely, agents choose di?erent actions
only if they are indi?erent.
While the convergence properties of the model are quite general, other
properties have only been established for particular networks:
Convergence in finite time: In special cases, we can rule out the possibility
of indi?erence between actions with probability one. In that case, all agents
choosethesameactioninfinite time with probability one.
Speed of convergence: In two- and three-person networks, we can show
that convergence to a uniform action is extremely rapid, typically occurring
within five or six periods with probability close to 1. What happens in those
first few periods is important for the determination of the asymptotic state.
Network architecture:Systematicdi?erences can be identified in the behav-
ior of di?erent networks. For example, in three-person complete networks
(where each agent observes all the others), learning stops almost immedi-
ately and the probability of an incorrect action in the long run is high. In
three-person incomplete networks, learning continues for a longer time and
the probability of choosing an incorrect action in the long run is lower.
The rest of the paper is organized as follows. In Section 2 we define
the model and the equilibrium concept more precisely. In Section 3 we use
the case of two-person networks to illustrate the working of the general
model and some special features of complete networks. In Section 4 we
derive the asymptotic properties of the general model. In Section 5 we
study a selection of three-person graphs. Here we see the impact of lack of
common knowledge on the dynamics of social learning and the e?ciency
of aggregation. We also compare the dynamic and asymptotic properties
of di?erent networks. The results are discussed in Section 6. Proofs are
gatheredinSection7.
2. THE MODEL
The social learning literature ignores the complications of strategic be-
havior in order to focus on pure Bayesian learning. Non-strategic behavior
4
is simpler to analyze and it also seems more appropriate for a model of
social behavior. However, special assumptions are needed to rationalized
non-strategic behavior. In the SSLM, for example, an agent is assumed
to make a once-in-a-lifetime decision. Because his payo? is independent of
other agents’ actions, it is rational for him to behave myopically and ig-
nore the a?ect of his action on the agents who follow him. In the SNM, an
agent’s payo?is independent of other agents’ actions but, unlike the SSLM,
agents make repeated decisions. In order to eliminate strategic behavior,
we assume that the economy comprises a large number of individually in-
significant agents and that agents only observe the distribution of actions
at each date. Since a single agent cannot a?ect the distribution of actions,
he cannot influence the future play of the game. This allows us to ignore
“strategic” considerations and focus on the pure Bayesian-learning aspect
of the model.
The agents
Formally, we assume there is a finite set of locations indexed by i =1,...,n.
At each location, there is a non-atomic continuum of identical agents. In
the sequel, the continuum of agents at location i is replaced by a single
representative agent i who maximizes his short-run payo? in each period.
Uncertainty is represented by a probability measure space (?,F,P),
where ? is a compact metric space, F is a σ-field, and P a probability
measure. Time is represented by a countable set of dates indexed by t =
1,2,....
Let A ? R be a finite set of actions and let U : A×? → R be the
common payo? function, where U(a,·) is a bounded, measurable function
for every action a. Each (representative) agent i receives a private signal
σ
i
(ω) at date 1,whereσ
i
: ?→ R is a random variable.
The network
A social network is represented by a family of sets {N
i
: i =1,...,n},where
N
i
? {1,...,i?1,i+1,...,n}.
For each agent i, N
i
denotes the set of agents j 6= i who can be observed
by agent i.WecanthinkofN
i
as representing i’s “neighborhood”. The
sets {N
i
:1=1,...,n} define a directed graph with nodes N = {1,...,n}
and edges E = ∪
n
i=1
{(i,j):j ∈ N
i
}. The social network determines the
information flow in the economy. Agent i can observe the action of agent j
if and only if j ∈ N
i
. Agents have perfect recall so their information set at
each date includes the actions they have observed at every previous date.
For any nodes i and j,apath from i to j is a finite sequence i
1
,...,i
K
such that i
1
= i, i
K
= j and i
k+1
∈ N
i
k
for k =1,...,K ? 1.Anodei is
connected to j if there is a path from i to j. The network {N
i
} is connected
if every pair of nodes i and j is connected. Connectedness is essential for
uniformity of behavior, but not for other results.
5
Equilibrium
At the beginning of each date t, agents choose actions simultaneously. Then
each agent i observes the actions a
jt
chosen by the agents j ∈ N
i
and
updates his beliefs accordingly. Agent i’s information set at date t consists
of his signal σ
i
(ω) and the history of actions {a
js
: j ∈ N
i
,s≤ t ? 1}.
Agent i chooses the action a
it
to maximize the expectation of his short-run
payo? U(a
it
,ω) conditional on the information available.
An agent’s behavior can be described more formally as follows. Agent
i’s choice of action at date t is described by a random variable X
it
(ω)
and his information at date t is described by a σ-field F
it
.Sincethe
agent’s choice can only depend on the information available to him, X
it
must be measurable with respect to F
it
.SinceF
it
represents the agent’s
information at date t,itmustbetheσ-field generated by the random
variables σ
i
and {X
js
: j ∈ N
i
,s≤ t ? 1}. Note that there is no need
to condition explicitly on agent i’s past actions because they are functions
of the past actions of agents j ∈ N
i
and the signal σ
i
(ω). Finally, since
X
it
is optimal, there cannot be any other F
it
-measurable choice function
that yields a higher expected utility. These are the essential elements of
our definition of equilibrium, as stated below.
Definition 1. A weak perfect Bayesian equilibrium consists of a se-
quence of random variables {X
it
} and σ-fields {F
it
} such that for each
i =1,...,n and t =1,2,...,
(i) X
it
: ?→ A is F
it
-measurable,
(ii) F
it
= F
?
σ
i
,{X
js
: j ∈ N
i
}
t?1
s=1
¢
,and
(iii) E[U(x(ω),ω)] ≤ E[U(X
it
(ω),ω)], for any F
it
-measurable
function x : ?→ A.
Note that our definition of equilibrium does not require optimality “o?
the equilibrium path”. This entails no essential loss of generality as long
as it is assumed that the actions of a single agent, who is of measure zero,
are not observed by other players. Then a deviation by a single agent has
no e?ect on the subsequent decisions of other agents.
3. LEARNING WITH TWO (REPRESENTATIVE) AGENTS AND
TWO ACTIONS
To fix ideas and illustrate the workings of the basic model, we first
consider the special case of two representative agents, A and B,andtwo
actions, 0 and 1. There are three graphs, besides the empty graph N
A
=
N
B
= ?,
(i) N
A
= {B},N
B
= {A};
(ii) N
A
= {B},N
B
= ?;
(iii) N
A
= ?,N
B
= {A}.
6
Cases (ii) and (iii) are uninteresting because there is no possibility of mu-
tual learning. For example, in case (ii), agent B observes a private signal
and chooses the optimal action at date 1. Since he observes no further in-
formation, he chooses the same action at every subsequent date. Agent A
observes a private signal and chooses the optimal action at date 1.Atdate
2, he observes agent B’s action at date 1, updates his beliefs and chooses
the new optimal action at date 2.Afterthat,A receives no additional
information, so agent A chooses the same action at every subsequent date.
Agent A has learned something from agent B, but that is as far as it goes.
In case (i), on the other hand, the two agents learn from each other and
learning can continue for an unbounded number of periods. We focus on
the network defined in (i) in what follows.
For simplicity, we consider a special information and payo? structure.
We assume that? = ?
A
×?
B
,where?
i
is an interval [a,b] and the generic
element is ω =(ω
A
,ω
B
). The signals are assumed to satisfy
σ
i
(ω)=ω
i
,?ω ∈?,i= A,B,
where the random variables ω
A
and ω
B
are independently and continuously
distributed, that is, P = P
A
×P
B
and P
i
has no atoms. There are two
actions a =0,1 and the payo? function is assumed to satisfy
u(a,ω)=
?
0 if a =0
U(ω) if a =1,
where the function U(ω
A
,ω
B
) is assumed to be a continuous and increasing
function. To avoid trivialities we assume that neither action is weakly
dominated.
These assumptions are su?cient for the optimal strategy to have the
form of a cuto? rule. To see this, note that for any history that occurs with
positive probability, agent i’s beliefs at date t take the form of an event
{ω
i
}×B
jt
, where the true value of ω
j
is known to belong to B
jt
.Then
the payo? to action 1 is ?
i
(ω
i
,B
jt
)=E[U(ω
A
,ω
B
)|{ω
i
}×B
jt
}. Clearly,
?
i
(ω
i
,B
jt
) is increasing in ω
i
, because the distribution of ω
j
is independent
of ω
i
,sothereexistsacuto? ω
?
i
(B
jt
) such that
ω
i
>ω
?
i
(B
jt
)=? ?
i
(ω
i
,B
jt
) > 0,
ω
i
<ω
?
i
(B
jt
)=? ?
i
(ω
i
,B
jt
) < 0.
We assume that when an agent is indi?erent between two actions, he
chooses action 1. The analysis is essentially the same for any other the
tie-breaking rule.
The fact that agent i’s strategy takes the form of a cuto? rule implies
that the set B
it
is an interval. This can be proved by induction as follows.
At date 1, agent j has a cuto? ω
?
j1
and X
j1
(ω)=1if and only if ω
j
≥ ω
?
j1
.
7
Then at date 2 agent i will know that the true value of ω
j
belongs to B
j
(ω),
where
B
j2
(ω)=
?
[ω
?
j1
,b] if X
j1
(ω)=1,
[a,ω
?
j1
) if X
j1
(ω)=0.
Now suppose that at some date t,theinformationsetB
jt
(ω) ? [a,b] is an
interval and agent j’s cuto? is ω
?
jt
(B
it
(ω)).Thenatdatet +1, agent i
knows that ω
j
belongs to B
jt+1
(ω),where
B
jt+1
(ω)=
?
B
jt
(ω)∩[ω
?
jt
(B
it
(ω)),b] if X
jt
(ω)=1,
B
jt
(ω)∩[a,ω
?
jt
(B
it
(ω))) if X
jt
(ω)=0.
Clearly, B
jt+1
(ω) is also an interval. Hence, by induction, B
it
(ω) is an
interval for all t and the common knowledge at date t is B
t
(ω)=B
At
(ω)×
B
Bt
(ω).Byconstruction,ω ∈ B
t+1
(ω) ? B
t
(ω) for every t.ThenB
t
(ω) &
B(ω)=∩
∞
t=1
B
t
(ω) and {B(ω):ω ∈?} defines a partition of ?.Notethat
ω ∈ B(ω) so B(ω) 6= ?.
In the limit, when all learning has ceased, agent A knows that ω
B
be-
longstoasetB
B
(ω) and agent B knows that ω
A
belongs to B
A
(ω).Fur-
thermore, because the actions chosen at each date are common knowledge,
the sets B
A
(ω) and B
B
(ω) are common knowledge.
An interesting question is whether, given their information in the limit,
the two agents must agree which action is best. In the two-person case, we
can show directly that both agents must eventually agree, in the sense that
they choose di?erent actions only if they are indi?erent. The proof is by
contradiction. Suppose, contrary to what we want to prove, that for some
B and every ω such that B(ω)=B,
E[U(ω
A
,ω
B
)|{ω
A
}×B
B
] > 0
and
E[U(ω
A
,ω
B
)|B
A
×{ω
B
}] < 0.
Then the same actions must be optimal for every element in the information
set (otherwise, more information would be revealed) and this implies
E[U(ω
A
,ω
B
)|{ω
A
}×B
B
] ≥ 0
and
E[U(ω
A
,ω
B
)|B
A
×{ω
B
}] ≤ 0,
where ω
A
=infB
A
(ω) and ω
B
=supB
B
(ω).Then
U(ω
A
,ω
B
) ≥ 0
and
U(ω
A
,ω
B
) ≤ 0,
8
or U(ω
A
,ω
B
)=0.IfB
B
is not a singleton,
E[U(ω
A
,ω
B
)|{ω
A
}×B
B
] < 0
a contradiction. Similarly, if B
A
is not a singleton,
E[U(ω
A
,ω
B
)|B
A
×{ω
B
}] > 0,
a contradiction. Thus, B is a singleton and U(ω)=0if ω ∈ B.
The set {ω : U(ω)=0} has probability zero, so the probability of
disagreeing forever is 0. In other words, both agents will choose the same
action in finite time and once they have chosen the same action, they have
reached an absorbing state and will continue to choose the same action in
every subsequent period.
3.1. An example
To illustrate the short-run dynamics of the model, we can further spe-
cialize the example by assuming that, for each agent i, the signal σ
i
(ω)=ω
i
is uniformly distributed on the interval [?1,1] and the payo? to action 1
is U(ω)=ω
A
+ ω
B
.
At date 1, each agent chooses 1 if his signal is positive and zero if it is
negative. If both choose the same action at date 1, they will continue to
choose the same action at each subsequent date. Seeing the other agent
choose the same action will only reinforce each agent’s belief that he has
made the correct choice. No further information is revealed at subsequent
dates and so we have reached an absorbing state, in which each agent knows
his own signal and that the other’s signal has the same sign, but nothing
more. So interesting dynamics occur only in the case where agents choose
di?erent actions at date 1. The exact nature of the dynamics depends on
the relative strength of the two signals, measured here by their absolute
values. Without loss of generality, we assume that A has a negative signal,
B a positive signal, and B’s signal is relatively the stronger, i.e., |ω
A
| <
|ω
B
|.
Case 1: ω
A
> ?1/2 and ω
B
> 1/2.Inthefirst round at date 1, agent
A will choose action 0 and agent B will choose action 1. At the second
date, having observed that agent B chose 1, agent A will switch to action 1,
while agent B will continue to choose 1. Thereafter, there is an absorbing
state in which both agents choose 1 foreverandnofurtherlearningoccurs.
Case 2: 3/4 <ω
A
< ?1/2 and ω
B
> 3/4. As before, A chooses 0 and
B chooses 1 at date 1.Atdate2, A observes that B chose 1 and infers
that his signal has expected value 1/2.Sinceω
A
< ?1/2,itisoptimalfor
A to choose 0 again. Since B has an even stronger signal, he will continue
to choose 1.Atdate3, A observes that B chose 1,thusrevealingthat
ω
B
> 1/2 sotheexpectedvalueofB’s signal is 3/4 and since ω
A
> ?3/4
it is optimal for him to switch to 1, which then becomes an absorbing state.
9
Case 3: ?(t?1)/t > ω
A
> ?(t?2)/t and ω
B
> (t?1)/t. By analogous
reasoning, A chooses 0 and B chooses 1 until date t when A switches to 1.
The other interesting case to consider is when the signals are similar in
strength. For example, suppose that ω
A
= ?ω
B
= x
?
where x
?
is the limit
of the sequent {x
t
}
∞
t=1
defined by putting x
1
=
1
2
,x
2
=
1
4
, and
x
t
=
1
2
(x
t?1
+ x
t?2
)
for t =3,4,....Noticethatift is even then x
t
<x
?
<x
t?1
.
As usual A chooses 0 and B chooses 1,atdate1.Atdate2, A observes
B’s choice in the previous period, realizes that the expected value of B’s
signal is 1/2 > ?x
?
and switches to 1. By the symmetric argument, B
switches to 0.Atdate3, A observes B’s switch to 0 and realizes that
1/4 <ω
B
< 1/2, that is, the expected value of ω
B
is greater than x
?
=
?ω
A
.SoitisoptimalforA to choose 0 again. By a symmetric argument,
B switches back to 1 at date 3.
At any even date t, A will choose 1 and B will choose 0 and at any
odd date t, A will choose 0 and B will choose 1. B’s choice at an even
date t reveals that x
t?2
<ω
B
<x
t?1
and his choice at an odd date reveals
x
t?1
<ω
B
<x
t?2
. By construction, at any odd date t, ?ω
A
= x
?
<x
t
=
1
2
(x
t?1
+ x
t?2
),soitisoptimalforA to choose 1 at t +1. Likewise, at
any even date t, ?ω
A
= x
?
>x
t
=
1
2
(x
t?1
+ x
t?2
),soitisoptimalforA
to choose 0 at t +1.
In fact, we can find a signal ω to rationalize any sequence of actions with
the properties that for some T, x
At
6= x
Bt
for t<Tand x
At
= x
Bt
= a
for t ≥ T. However, the sequences corresponding to T = ∞ occur with
probability 0 and the sequences with T<∞ occur with positive probability.
This example can also be used to illustrate the speed of convergence
to uniformity of actions. In the first period, the probability that agents
choosethesameactionis1/2. In the second period, it is 3/4.Inthethird
period, it is 7/8, and so on. This is a very special example, but simulations
of other examples confirm these results.
Finally, we note that in this simple example, where the signals of the
two players are symmetrically distributed, the asymptotic outcome must be
Pareto-e?cient. This follows from the fact that the agent with the stronger
signal, as measured by its absolute value, will ultimately determine the
action chosen. However, a simple adjustment to this example shows the
possibility of an ine?cient outcome. Suppose that A has a signal uniformly
distributed on [0,1] and B has a signal uniformly distributed on
£
?
1
2
,1
¤
.
Then both A and B will choose action 1 at the firstdateandtherewillbe
no learning. However, if ω
A
is close to 0 and ω
B
is close to ?
1
2
then action
0 is clearly preferred conditional on the information available to the two
agents.
10
4. ASYMPTOTIC PROPERTIES
Now we return to the general model described in Section 2 and study
the asymptotic properties.
Although the process of learning in a social network is complicated, it
has a number of features that make the characterization of the asymptotic
outcomes tractable. The first is the Welfare-Improvement Principle: since
an agent’s information is non-decreasing over time, his payo? must also
be non-decreasing. This allows us to apply the Martingale Convergence
Theorem to show that equilibrium payo?s converge with probability one as
time goes to infinity. Smith and S?rensen (2000) also use the Martingale
Convergence Principle to show that Bayesian learning eventually leads of
to convergence of actions and beliefs.
The second useful feature is the Imitation Principle: since an agent can
always imitate his neighbor, he must do at least as well as his neighbor on
average (with a one-period lag). Similar ideas have been used by Banerjee
and Fudenberg (1995) and Bala and Goyal (1998). We must be careful in
exploiting this property, since it does not imply that an agent will do as
well as his neighbor with probability one. Nonetheless, it turns out to be
a useful property.
The Imitation Principle is particularly useful in a connected network.
If i is connected to j then we can use the Imitation Principle recursively
to argue that i does as well as j.Ifj is connected to i the same argument
implies that i and j receive the same payo? on average. This fact is then
used to show that i and j must choose the same action unless they are
both indi?erent. In other words, they essentially agree on the best action
to take. In cases where indi?erence occurs with probability zero, we have
uniformity in the limit.
Without this connectedness assumption, there is no reason to expect
equal payo?s or uniformity. A trivial example, would be the two-person
network N
A
= {B},N
B
= ?,whereA observes B but B does not observe
A. Clearly, A must do at least as well as B but may do better and there is
no reason why A should always choose the same action as B.Ontheother
hand, the complete network N
A
= {B},N
B
= {A} is connected and as we
saw in the previous section, uniformity of actions arises with probability
one in finite time.
4.1. Convergence
The first step in our analysis is to establish convergence of beliefs and
payo?s. From the definition of equilibrium, we note that F
it
? F
it+1
? F
for every i and t. In other words, an agent’s information is non-decreasing
over time. Then his equilibrium payo? must be non-decreasing over time
and, since it is bounded, must converge. This property is established in
the next theorem.
11
Theorem 1. Let {X
it
,F
it
: i =1,...,n,t=1,2,...} be an equilibrium.
For each i,define V
?
it
: ?→ R by
V
?
it
= E[U(X
it
,·)|F
it
].
Then {V
?
it
} is a submartingale with respect to {F
it
} and there exists a
random variable V
?
i∞
such that V
?
it
converges to V
?
i∞
almost surely.
4.2. The Imitation Principle
ThenextstepistoestablishtheImitation Principle, which states that
asymptotically an agent must do at least as well as his neighbors. This
follows directly from the fact that one strategy available to agent i is to
imitate the actions of agent j ∈ N
i
.
Corollary 1. Let {X
it
,F
it
} be the equilibrium in Theorem 1 and let
V
?
it
be be the equilibrium payo?s. Then for any j ∈ N
i
and any t, V
?
it
≥
E[V
?
jt?1
|F
it
]. Furthermore, in the limit,
V
?
i∞
≥ E[V
?
j∞
|F
i∞
],
where F
i∞
is the σ-field generated by ∪{F
i1
,F
i2
,...}.
4.3. Connectedness
In order to make use of the foregoing results to establish uniformity of
actions, we need to make use of connectedness. We begin by studying the
behavior of adjacent agents i and j ∈ N
i
andthenextendtheresultsto
theentirenetwork.
It is easy to see that if i is connected to j,thenE [V
?
i∞
] ≥ E
£
V
?
j∞
¤
by induction. In particular, if j ∈ N
i
and j is connected to i then V
?
i∞
≥
E[V
?
j∞
|F
it
] and E[V
?
j∞
] ≥ E[V
?
i∞
], which implies that V
?
i∞
= E[V
?
j∞
|F
i∞
].
Corollary 2. Let {X
it
,F
it
} be the equilibrium in Theorem 1 and let
V
?
it
be the equilibrium payo?s. If j ∈ N
i
and j is connected to i then
V
?
i∞
= E[V
?
j∞
|F
i∞
].
Our next result concerns the possibility for agents to choose di?erent
actions in the long run. The fact that agents get the same payo? in the
long run suggests that they must choose the same actions unless they are
indi?erent. This result requires a certain amount of care because their
information sets are di?erent, but the intuition is essentially correct as the
next theorem shows.
Let {X
it
,F
it
: i =1,...,n,t =1,2,...} be an equilibrium and define
V
a
it
: ?→ R by
V
a
it
= E[U(a,·)|F
it
]
for any agent i,datet, and action a.Then{V
a
it
} is a martingale with
respect to {F
it
} and V
a
it
converges to a random variable V
a
i∞
almost surely.
12
Theorem 2. Let i and j be two agents such that j ∈ N
i
and j is
connected to i.LetE
ab
denotethemeasurablesetonwhichi chooses a
infinitely often and j chooses b infinitely often. Then V
a
i∞
(ω)=V
b
i∞
(ω) for
almost every ω in E
ab
.Thatis,i is indi?erent between a and b for almost
every ω in E
ab
.
Intuitively, if i alwaysbelievesthatheisgettingthesamepayo? as
agent j then i cannot believe that he is choosing a better action than j.In
this sense, they cannot disagree forever.
Clearly, since the network is connected, every agent asymptotically has
the same payo? and, indi?erence apart, all agents must choose the same
actions.
The concept of connectedness used here is strong in the sense that
there must be a path running both ways between any pair of nodes. If the
network is not connected in this sense, one can still apply Theorem 2 to
connected components of the graph, that is, maximal subsets of nodes such
that every pair of nodes in the subset is connected.
5. SHORT-RUN DYNAMICS
To illustrate the short-run dynamics of the model, we adapt the example
introduced in Section 3.1 by assuming there are three agents, A, B,andC,
and two actions, 0 and 1.Asbefore,thepayo?from action 0 is identically 0
and the payo? from action 1 is U(ω)=ω
A
+ω
B
+ω
C
, where, for each agent
i, the signal σ
i
(ω)=ω
i
is uniformly distributed on the interval [?1,1].
The three-person case, unlike the two-person case, has several non-
trivial social networks, each of which gives rise to its own distinctive learn-
ing patterns. We refer to a network in which every agent directly observes
every other agent as complete. Otherwise, the network is incomplete.The
network studied in Section 3 is complete whereas most of the networks for
the three-person case are incomplete. Several social networks are illustrated
in Figure 1.
In a complete network, the entire history of past actions is common
knowledge at each date. In an incomplete social network, past actions
are typically not common knowledge at each date. This lack of common
knowledge plays an important role in the learning process. It forces agents
to make more or less complicated inferences about what other agents have
seen, as well as about the inferences those agents have drawn, and changes
the nature of the decision rules adopted by the agents.
13
FIG. 1
5.1. The complete network
As a benchmark, consider the (unique) complete network, in which each
agent can observe the other two:
N
A
= {B,C},N
B
= {A,C},N
C
= {A,B}.
If agents choose the same action at the first date, learning e?ectively ends
there. For example, suppose thatω
i
> 0 for i = A,B,C.Agenti’s expected
payo? from action 1 is ω
i
> 0,sinceE[ω
j
]=0for j 6= i. So each agent
will choose action 1 at the first date. At the second date, seeing that the
others have chosen the same action at date 1, agent i will infer that ω
j
> 0
and hence that E[ω
j
|ω
j
> 0] = 1/2 for j 6= i. This will increase agent i’s
payo? from action 1 from ω
i
to ω
i
+1and reinforce agent i’s preference for
action 1. So each agent will continue to choose action 1 at date 2.Atdate
3 there is no change in actions or beliefs, so we have reached an absorbing
state. Given the assumed values of the signals, the outcome is e?cient.
A more interesting case is one in which there is diversity of actions at
date 1. For example, suppose that ω
A
> 0,ω
B
> 0,andω
C
< 0.Atdate
1, agents A and B will choose action 1 and agent C will choose action 0.At
date 2 it becomes common knowledge that ω
A
> 0,ω
B
> 0,andω
C
< 0.
14
The payo? from action 1 conditional on agent A’s information is
ω
A
+ E[ω
B
|ω
B
> 0] + E[ω
C
|ω
C
< 0] = ω
A
+
1
2
?
1
2
= ω
A
.
Similarly, the payo? from action 1 conditional on agent B’s information is
ω
B
.SobothA and B will continue to choose action 1. Conditional on
agent C’s information, however, the payo? from action 1 is
E[ω
A
|ω
A
> 0] + E[ω
B
|ω
B
> 0] + ω
C
=
1
2
+
1
2
+ ω
C
> 0
since ω
C
> ?1.SoC will switch to action 1 at date 2.Atdate3,nofur-
ther information is revealed, so actions and beliefs remain unchanged and
once again we have reached an absorbing state. Agent C ignores his own
information and joins the “herd” consisting of agents A and B. Clearly, the
outcome will be ine?cient if ω
A
and ω
B
are small relative to the absolute
value of ω
C
.
In either of the cases considered, the learning process comes to a halt
bytheendofthesecondperiodatthelatest.
5.2. Incomplete networks
The first incomplete network we examine is the Star:
N
A
= {B,C},N
B
= {A},N
C
= {A}.
Thus, at each date, A is informed about the entire history of actions that
have already been taken, whereas B and C have imperfect information and
thus have to form expectations about the actions of the unobserved third
agent.
As with the complete network, if all the agents choose the same action
at date 1, this is an absorbing state. So consider again the more interesting
case where there is diversity at date 1, for example, ω
A
> 0, ω
B
> 0,and
ω
C
< 0. The analysis of the decisions of the agents at date 1 is unchanged
but now at date 2 agent C only observes that his action at date 1 does
not match A’s action. Conditional on C’s information, the expected value
of A’s signal is 1/2 and the expected value of B’s signal is zero. Thus, at
date 2,itisoptimalforC to switch to 1 if ω
C
≥?1/2 and to continue to
choose 0 otherwise.
By the third round at date 3, agent C can draw some conclusions about
the actions that B could have taken by observing the actions of agent A at
dates 1 and 2.IfA chooses 1 at both dates then it is revealed to C that
B’s signal is positive; otherwise agent A wouldhaveswitchedtoaction0
at date 2. Thus, a simple calculation shows that, having observed that
A continues to choose 1,itisoptimalforC to switch to action 1 for any
realization of his private signal.
15
Even so, we have not necessarily reached an absorbing state, as agent
A might himself switch at date 3. Toseethis,notethatatdate3, A’s
expected value of B’s signal is 1/2 and A’s expected value of C’s signal is
?3/4. Thus, it is optimal for A to choose 1 again if ω
A
≥ 1/4 and to switch
to 0 otherwise. In case ω
A
< 1/4, it is common knowledge at date 4 that
0 ≤ ω
A
< 1/4, 0 ≤ ω
B
≤ 1 and ?1 ≤ ω
C
< ?1/2. Table 1 summarizes the
play of the game and shows that it can continue for quite a few periods.
t (a
A
,a
B
,a
C
) E
A
[ω
B
],E
A
[ω
C
] E
B
[ω
A
],E
B
[ω
C
] E
C
[ω
A
],E
C
[ω
B
]
1(1,1,0) 0,00,00,0
2(1,1,0) 1/2,?1/21/2,01/2,0
3(0,1,1) 1/2,?3/41/2, /2,1/2
4(0,1,0) 1/2,?3/ /8,?3/41/8,1/2
Table 1
As Table 1 illustrates, the dynamics of actions and learning are quite dif-
ferent in complete and incomplete social networks. First, in an incomplete
network, learning does not end after two periods and more information may
be revealed as a result. Secondly, for the diversity of actions to persist, the
agent at the center of the Star must have a signal that is relatively weak (as
measured by its absolute value) compared to the agents at the periphery.
When his signal is relatively weak, the central agent changes his action
frequently, thus transmitting information between the peripheral agents.
Alternating actions can also arise in the Star network when agent A has
the negative signal and B and C have positive signals. At date 2, agent A
observes that both B and C chose action 1 at date 1,soitisoptimalforhim
to ignore his own information and to switch to action 1. However, at the
same time, either agent B or agent C (or both) would switch from action 1
to action 0 if their signals are weak (less than 1/2). Table 2 illustrates that
alternating actions may continue beyond period 2 and that if the signals of
agents B and C are relatively weak, say ω
B
< 1/4 and ω
C
< 1/4,andA’s
signal is relatively strong, ω
A
< ?1/2, there is an absorbing state in which
all choose action 0.
t (a
A
,a
B
,a
C
) E
A
[ω
B
],E
A
[ω
C
] E
B
[ω
A
],E
B
[ω
C
] E
C
[ω
A
],E
C
[ω
B
]
1(0,1,1) 0,00,00,0
2(1,0,0) 1/2,1/2 ?1/2,0 ?1/2,0
3(0,1,1) 1/4,1/4 ?1/2,1/2 ?1/2,1/2
4(0,0,0) 1/8,1/8 ?3/4,1/2 ?3/4,1/2
16
Table 2
A second example of an incomplete social network is provided by the
Circle, in which each agent observes one other agent:
N
A
= {C},N
B
= {A},N
C
= {B}.
In this case, generically no subset of the history of actions is shared as
public information, and thus each agent makes di?erent inferences about
what others have learned. Thus, this network best illustrates how lack of
common knowledge plays a crucial role.
Proceeding with the above example where ω
A
> 0, ω
B
> 0 and ω
C
< 0,
suppose that (t?2)/t > ω
A
> (t?1)/t and ω
C
< ?(t?1)/t.Asinthetwo-
person case where agents have strong signals with opposite signs, so too in
this situation despite the lack of common knowledge agents can agree to
disagree. In fact, over time agent A and agent C learnthatatleastoneof
the other agents must have contrary private information which is stronger
than they thought and thus they adjust their expectations towards ?1 and
1 respectively. This continues until date t when agent A realizes that his
own signal is weaker and will switch to action 0.
Then, at date t+1, having observed that agent A switched to action 1,
agent B infers that agent A’s signal has expected value (2t?3)/2t.Onthe
other hand, agent B cannot tell whether agent C has also switched, but
given his beliefs about C’s information and strategy, he infers that agent
C’s signal has expected value (t?1)/t.Thus,atdatet+1,itisoptimalfor
agent B to switch to 0 if ω
B
< 1/2t and to continue to choose 1 otherwise.
6. DISCUSSION
There is a large literature on the economics of networks. The most
closely related paper is by Bala and Goyal (1998), henceforth BG. In the BG
model, at each date, an agent chooses one of several available actions with
unknown payo?distributions. The agent observes his payo?from the action
and uses this information to update his beliefs about the payo?distribution.
Agents are organized in a network and can observe the actions and payo?s
of their neighbors, that is, the agents with whom they are directly linked.
This is a model of social experimentation, in the sense that it generalizes
the problem of a single agent experimenting with a multi-armed bandit
to a social setting, rather than social learning: agents learn by observing
the outcome (payo?) of an experiment (choice of action) rather than by
inferring another agent’s private information from his action. A model of
social experimentation is quite di?erent from a model of social learning.
In a model of social experimentation, there is an informational externality
but there is no informational asymmetry.
There is private information in the BG model, but agents are assumed
to ignore it. For example, suppose that agent A observes agent B’s action
17
and payo? but not agent C’s, whereas agent B observes agent C’s action
and payo?. Then agent B has private information, not available to agent
A, that is clearly payo?-relevant for agent A as well as agent B. However,
A is assumed to ignore this linkage. A learns from B’s experiments (ac-
tions), but does not ask what information might have led B to choose those
experiments (actions).
BG show that, in a connected network, in the long run, everyone adopts
the same action and the action chosen can be sub-optimal.
Our model di?ers from BG in two ways. First, we examine the decisions
of fully rational agents, who infer the information of unobserved agents
from the actions they observe. Although the beliefs of agents are very
complicated, it captures the idea that agents try to extract information
about unobserved one, especially in small groups. Second, in our model
agents observe only the actions of other agents, whereas in BG agents
observe payo?s as well as actions. Obviously, there is a lot more information
available in BG.
The techniques used in this paper can be applied to other models. For
example, there is no di?culty applying them to random graphs, as long as
connectedness is satisfied with probability one. They could also be applied
to dynamic graphs where the set of neighbors observed changes over time.
Many questions about social learning in networks remain open. In spe-
cial cases, we have established that uniformity arises in finite time with
probability one. We conjecture that this result is true for all connected so-
cial networks, but we have yet to provide a general proof. This result would
follow as a corollary of Theorem 2 if we could prove that the probability
of indi?erence in the limit is zero, as we did in the two-person case. The
impossibility of indi?erence is harder to establish for networks with more
agents but we believe it must be true under some regularity conditions.
A second conjecture is that the result about asymptotic uniformity has
a converse: if all agents choose the same action, they have reached an
absorbing state and will continue to choose that action forever. This is
true in the special cases we have looked at but we believe it must be true
in general.
Speeds of convergence can be established analytically in simple cases.
For more complex cases, we have been forced to use numerical methods.
The computational di?culty of solving the model is massive even in the
case of three persons. However, the results are su?ciently dramatic that
they suggest the same might be true for more general cases. This is an
important subject for future research.
18
7. PROOFS
7.1. ProofofTheorem1
By definition (Billingsley, 1986, p. 480), the sequence {(V
?
it
,F
it
):t =
1,2,...} is a submartingale if the following four conditions hold:
(i) F
it
? F
it+1
;
(ii) V
?
it
is measurable F
it
;
(iii) E[|V
?
it
|] < ∞ ;
(iv) with probability 1, E[V
?
it+1
|F
it
] ≥ V
?
it
.
The first conditions follows directly from the definition of equilibrium. The
second holds because U(a,·) is F
it
-measurable, X
it
is F
it
-measurable, and
V
?
it
= E[U(X
it
(·),·)|F
it
].
The third condition follows because A ?R is finite and U(a,·) is bounded
for each a. To establish the fourth condition, note that since F
it
? F
it+1
,
X
it
is F
it+1
-measurable and the equilibrium conditions imply that
E[U(X
it
,·)|F
it+1
] ≤ E[U(X
it+1
,·)|F
it+1
]
= V
?
it+1
.
Then
V
?
it
= E[U(X
it
,·)|F
it
]
= E[E[U(X
it
,·)|F
it+1
]|F
it
]
≤ E[V
?
it+1
|F
it
]
and {(V
?
it
,F
it
):t =1,2,...} is a sub-martingale.
From the martingale convergence theorem, there exists a random vari-
able V
?
i∞
such that V
?
it
→ V
?
i∞
almost surely.
7.2. Proof of Corollary 1
We note that for any j ∈ N
i
,X
jt?1
is F
it
-measurable so the equilibrium
conditions imply that
V
?
it
≥ E[U(X
jt?1
,·)|F
it
]
= E[E[U(X
jt?1
,·)|F
jt?1
]|F
it
]
= E[V
?
jt?1
|F
it
].
From this inequality it follows that V
?
i∞
≥ E[V
?
j∞
|F
i∞
],whereF
i∞
is the
σ-field generated by ∪{F
i1
,F
i2
,...}.
19
7.3. ProofofTheorem2
Let i and j be two agents such that j ∈ N
i
and j is connected to i
and let a and b be fixed but arbitrary actions. Define E = {ω : X
it
= a
i.o, X
jt
= b i.o.}.ThenE is a measurable set, in fact, E ∈ F
i∞
.Let
χ
E
: ?→ {0,1} denote the indicator function for the set E,thatis,
χ
E
(ω)=
?
1,ω∈ E,
0,ω/∈ E.
Since V
a
it
(ω)=V
?
it
(ω) i.o. for every ω ∈ E and V
a
it
→ V
a
i∞
almost surely,
we have V
a
i∞
= V
?
i∞
for almost every ω ∈ E. Similarly, V
b
j∞
= V
?
j∞
for
almost every ω ∈ E. From Theorem 1, with probability one,
V
?
it
χ
E
→ V
?
i∞
χ
E
and
V
?
jt
χ
E
→ V
?
j∞
χ
E
.
From Corollary 2, with probability one,
E[V
?
i∞
χ
E
|F
i∞
]=E[V
?
j∞
χ
E
|F
i∞
].
For any action a in A let
V
a
it
= E[U(a,·)|F
it
].
Clearly, {V
a
it
} is a martingale and V
a
it
→ V
a
i∞
almost surely. Let
E
a
i
= {ω : V
a
i∞
(ω) ≥ V
b
i∞
(ω),b6= a}.
Then E
a
i
∩E
b
j
belongs to F
i∞
∩F
j∞
. Suppose that P[E
a
i
∩E
b
j
] > 0 for
some a 6= b.Weconclude
E[V
?
i∞
|E
a
i
∩E
b
j
]=E[U(a,·)|E
a
i
∩E
b
j
]
≥ E[U(b,·)|E
a
i
∩E
b
j
]
= E[V
?
j∞
|E
a
i
∩E
b
j
].
Since V
?
i∞
= E[V
?
j∞
|F
i∞
] we have V
a
i∞
(ω)=V
b
i∞
(ω) for almost every ω ∈
E
a
i
∩E
b
j
. Thus, agents iandj can disagree (choose di?erent optimal actions)
in the limit only if i is indi?erent.
REFERENCES
Bala, V. and Goyal, S. (1998). “Learning from Neighbors,” Rev. Econ.
Stud. 65, 595-621.
Banerjee, A. (1992). “A Simple Model of Herd Behavior,” Quart. J. Econ.
107, 797-817.
20
Banerjee, A. and Fudenberg, D. (1995). “Word-of-Mouth Learning,” MIT,
mimeo.
Bikhchandani, S., Hirshleifer, D. and Welch, I. (1992). “A Theory of Fads,
Fashion, Custom, and Cultural Change as Informational Cascade,” J.
Polit. Economy 100, 992-1026.
Billingsley, P. (1986). Probability and Measure, 2nd edition.JohnWiley
and Sons, New York.
Bikhchandani, S., Hirshleifer, D. and Welch, I. (1998). “Learning from the
Behavior of Others: Conformity, Fads, and Informational Cascades,” J.
Econ. Perspect. 12, 151-170.
Smith, L. and S?rensen, P. (2000). “Pathological Outcomes of Observa-
tional Learning,” Econometrica 68, 371-398.
21