Bayesian Learning in Social Networks 1 Douglas Gale (Corresponding Author) Department of Economics, New York University 269 Mercer St., 7th Floor, New York, NY, 10003-6687. E-mail: douglas.gale@nyu.edu Url: http://www.econ.nyu.edu/user/galed/ Phone: (212) 998-8944 Fax: (212) 995-3932 and Shachar Kariv Department of Economics, New York University 269 Mercer St., 7th Floor, New York, NY, 10003-6687. E-mail: sk510@nyu.edu Url: http://home.nyu.edu/~sk510 Version: March 13, 2003. We extend the standard model of social learning in two ways. First, we introduce a social network and assume that agents can only observe the actions of agents to whom they are connected by this network. Secondly, we allow agents to choose a di?erent action at each date. If the network satisfies a connectedness assumption, the initial diversity resulting from diverse private information is eventually replaced by uniformity of actions, though not necessarily of beliefs, in finite time with probability one. We look at particular networks to illustrate the impact of network architecture on speed of convergence and the optimality of absorbing states. Convergence is remarkably rapid, so that asymptotic results are a good approximation even in the medium run. Journal of Economic Literature Classification Numbers: D82, D83 Key Words: Networks, Social learning, Herd behavior, Informational cascades. Running Title: Bayesian Learning in Social Networks. 1 One of us discussed this problem with Bob Rosenthal several years ago, when we were both at Boston University. At that time, we found the problem of learning in networks fascinating but made no progress and were eventually diverted into working on boundedly rational learning, which led to our paper on imitation and experimentation. We thank seminar participants at NYU, DELTA, INSEAD, Cergy, Cornell and Iowa for their comments. The financial support of the National Science Foundation through Grant No. SES-0095109 is gratefully acknowledged. 1 1. INTRODUCTION The canonical model of social learning comprises a set of agents I,a finite set of actions A,asetofstatesofnature?,andacommonpayo? function U(a,ω),wherea is the action chosen and ω is the state of nature. Each agent i receives a private signal σ i (ω), a function of the state of nature ω, and uses this private information to identify a payo?-maximizing action. Thissetupprovidesanexampleofapure information externality.Each agent’s payo? depends on his own action and on the state of nature. It does not depend directly on the actions of other agents. However, each agent’s action reveals something about his private signal, so an agent can generally improve his decision by observingwhatothersdobeforechoosing his own action. In social settings, where agents can observe one another’s actions, it is rational for them to learn from one another. This kind of social learning was first studied by Banerjee (1992) and Bikhchandani, Hirshleifer and Welch (1992). Their work was extended by Smith and S?rensen (2000). These models of social learning assume a sim- ple sequential structure, in which the order of play is fixed and exogenous. They also assume that the actions of all agents are public information. Thus, at date 1, agent 1 chooses an action a 1 , based on his private in- formation; at date 2, agent 2 observes the action chosen by agent 1 and chooses an action a 2 based on his private information and the information revealed by agent 1’s action; at date 3, agent 3 observes the actions chosen by agents 1 and 2 and chooses an action a 3 ...; and so on. In what follows we refer to this structure as the sequential social-learning model (SSLM). One goal of the social learning literature is to explain the striking uni- formity of social behavior that occurs in fashion, fads, “mob psychology”, and so forth. In the context of the SSLM, this uniformity takes the form of herd behavior. 2 Smith and S?rensen (2000) have shown that, in the SSLM, herd behavior arises in finite time with probability one. Once the proportion of agents choosing a particular action is large enough, the pub- lic information in favor of this action outweighs the private information of any single agent. So each subsequent agent “ignores” his own signal and “follows the herd”. This is an important result and it helps us understand the basis for uniformity of social behavior. 3 Atthesametime,theSSLMhasseveral 2 A herd occurs if, after some finite date t, every agent chooses the same action. An informational cascade occurs if, after some finite date t,everyagentfinds it optimal to choose the same action regardless of the value of his private signal. An informational cascade implies herd behavior, but a herd can arise without a cascade. 3 The most interesting property of the models of Bikhchandani, Hirshleifer and Welch (1992) and Banerjee (1992) is that informational cascades arise very rapidly, before much information has been revealed. For example, in these models if the first two agents make the same choice, all subsequent agents will ignore their information and imitate the first two. The behavior of a potential infinity of agents is determined by the behavior of the first two. This is both informationally ine?cient and Pareto ine?cient. 2 special features that deserve further examination: (i) each agent makes a single, irreversible decision; (ii) the timing of the agent’s decision (his position in the decision-making queue) is fixed and exogenous; (iii) agents observe the actions of all their predecessors; and (iv) the number of signals, like the number of agents, is infinite, so once a cascade begins the amount of information lost is large. These features simplify the analysis of the SSLM, but they are quite restrictive. In this paper, we study the uniformity of behavior in a framework that allows for a richer pattern of social learning. We depart from the SSLM in two ways. First, we drop the assumption that actions are public infor- mation and assume that agents can observe the actions of some, but not necessarily all, of their neighbors. Second, we allow agents to make deci- sions simultaneously, rather than sequentially, and to revise their decisions rather than making a single, irreversible decision. We refer to this structure as the social network model (SNM). For empirical examples that illustrate the important role of networks in social learning, see Bikhchandani, Hirsh- leifer and Welch (1998). Onthefaceofit,uniformbehaviorseemslesslikelyintheSNM,where agents have very di?erent information sets, than in the SSLM. However, uniformity turns out to be a robust feature of connected social networks. 4 The following results are established for any connected network: Uniformity of behavior: Initially, diversity of private information leads to diversity of actions. Over time, as agents learn by observing the actions of their neighbors, some convergence of beliefs is inevitable. A central question is whether agents can rationally choose di?erent actions forever. Disconnected agents can clearly ‘disagree’ forever. Also, there may be cases where agents are indi?erent between two actions and disagreement of actions is immaterial. However, apart from cases of disconnectedness and indi?erence, all agents must eventually choose the same action. Thus, learning occurs through diversity but is eventually replaced by uniformity. Optimality: We are interested in whether thecommon action chosen asymp- totically is optimal, in the sense that the same action would be chosen if all the signals were public information. In special cases, we can show that asymptotically the optimal action is chosen but, in general, there is no reason why this should be the case. Although the process of learning in networks can be very complicated, the SNM has several features that make the asymptotic analysis tractable. The first is the welfare-improvement principle. Agents have perfect recall, so expected utility is non-decreasing over time. This implies that equilibrium 4 A network is a directed graph in which the nodes correspond to representative agents. Agent i can observe the actions of agent j if i is connected to agent j.Anetworkis connected if, for any two agents i and j, there is a sequence i 1 ,...,i K such that i 1 = i, i K = j and i k is connected to i k+1 for k =1,...,K ?1. 3 payo?s form a submartingale. We use the martingale convergence theorem to establish that an agents’ (random) payo? converges almost surely to a constant. Another useful property of the model is the imitation principle. If agent i can observe the actions of agent j, then one strategy available to him is to imitate whatever j does. Since i and j have di?erent information sets, their conditional payo?s under this strategy may be di?erent. However, on average, i must do as well as j. The imitation principle, together with the connectedness of the network, is used to show that, asymptotically, i and j must get the same average (unconditional) payo?s. It turns out that this is only possible if agents choose the same actions. More precisely, agents choose di?erent actions only if they are indi?erent. While the convergence properties of the model are quite general, other properties have only been established for particular networks: Convergence in finite time: In special cases, we can rule out the possibility of indi?erence between actions with probability one. In that case, all agents choosethesameactioninfinite time with probability one. Speed of convergence: In two- and three-person networks, we can show that convergence to a uniform action is extremely rapid, typically occurring within five or six periods with probability close to 1. What happens in those first few periods is important for the determination of the asymptotic state. Network architecture:Systematicdi?erences can be identified in the behav- ior of di?erent networks. For example, in three-person complete networks (where each agent observes all the others), learning stops almost immedi- ately and the probability of an incorrect action in the long run is high. In three-person incomplete networks, learning continues for a longer time and the probability of choosing an incorrect action in the long run is lower. The rest of the paper is organized as follows. In Section 2 we define the model and the equilibrium concept more precisely. In Section 3 we use the case of two-person networks to illustrate the working of the general model and some special features of complete networks. In Section 4 we derive the asymptotic properties of the general model. In Section 5 we study a selection of three-person graphs. Here we see the impact of lack of common knowledge on the dynamics of social learning and the e?ciency of aggregation. We also compare the dynamic and asymptotic properties of di?erent networks. The results are discussed in Section 6. Proofs are gatheredinSection7. 2. THE MODEL The social learning literature ignores the complications of strategic be- havior in order to focus on pure Bayesian learning. Non-strategic behavior 4 is simpler to analyze and it also seems more appropriate for a model of social behavior. However, special assumptions are needed to rationalized non-strategic behavior. In the SSLM, for example, an agent is assumed to make a once-in-a-lifetime decision. Because his payo? is independent of other agents’ actions, it is rational for him to behave myopically and ig- nore the a?ect of his action on the agents who follow him. In the SNM, an agent’s payo?is independent of other agents’ actions but, unlike the SSLM, agents make repeated decisions. In order to eliminate strategic behavior, we assume that the economy comprises a large number of individually in- significant agents and that agents only observe the distribution of actions at each date. Since a single agent cannot a?ect the distribution of actions, he cannot influence the future play of the game. This allows us to ignore “strategic” considerations and focus on the pure Bayesian-learning aspect of the model. The agents Formally, we assume there is a finite set of locations indexed by i =1,...,n. At each location, there is a non-atomic continuum of identical agents. In the sequel, the continuum of agents at location i is replaced by a single representative agent i who maximizes his short-run payo? in each period. Uncertainty is represented by a probability measure space (?,F,P), where ? is a compact metric space, F is a σ-field, and P a probability measure. Time is represented by a countable set of dates indexed by t = 1,2,.... Let A ? R be a finite set of actions and let U : A×? → R be the common payo? function, where U(a,·) is a bounded, measurable function for every action a. Each (representative) agent i receives a private signal σ i (ω) at date 1,whereσ i : ?→ R is a random variable. The network A social network is represented by a family of sets {N i : i =1,...,n},where N i ? {1,...,i?1,i+1,...,n}. For each agent i, N i denotes the set of agents j 6= i who can be observed by agent i.WecanthinkofN i as representing i’s “neighborhood”. The sets {N i :1=1,...,n} define a directed graph with nodes N = {1,...,n} and edges E = ∪ n i=1 {(i,j):j ∈ N i }. The social network determines the information flow in the economy. Agent i can observe the action of agent j if and only if j ∈ N i . Agents have perfect recall so their information set at each date includes the actions they have observed at every previous date. For any nodes i and j,apath from i to j is a finite sequence i 1 ,...,i K such that i 1 = i, i K = j and i k+1 ∈ N i k for k =1,...,K ? 1.Anodei is connected to j if there is a path from i to j. The network {N i } is connected if every pair of nodes i and j is connected. Connectedness is essential for uniformity of behavior, but not for other results. 5 Equilibrium At the beginning of each date t, agents choose actions simultaneously. Then each agent i observes the actions a jt chosen by the agents j ∈ N i and updates his beliefs accordingly. Agent i’s information set at date t consists of his signal σ i (ω) and the history of actions {a js : j ∈ N i ,s≤ t ? 1}. Agent i chooses the action a it to maximize the expectation of his short-run payo? U(a it ,ω) conditional on the information available. An agent’s behavior can be described more formally as follows. Agent i’s choice of action at date t is described by a random variable X it (ω) and his information at date t is described by a σ-field F it .Sincethe agent’s choice can only depend on the information available to him, X it must be measurable with respect to F it .SinceF it represents the agent’s information at date t,itmustbetheσ-field generated by the random variables σ i and {X js : j ∈ N i ,s≤ t ? 1}. Note that there is no need to condition explicitly on agent i’s past actions because they are functions of the past actions of agents j ∈ N i and the signal σ i (ω). Finally, since X it is optimal, there cannot be any other F it -measurable choice function that yields a higher expected utility. These are the essential elements of our definition of equilibrium, as stated below. Definition 1. A weak perfect Bayesian equilibrium consists of a se- quence of random variables {X it } and σ-fields {F it } such that for each i =1,...,n and t =1,2,..., (i) X it : ?→ A is F it -measurable, (ii) F it = F ? σ i ,{X js : j ∈ N i } t?1 s=1 ¢ ,and (iii) E[U(x(ω),ω)] ≤ E[U(X it (ω),ω)], for any F it -measurable function x : ?→ A. Note that our definition of equilibrium does not require optimality “o? the equilibrium path”. This entails no essential loss of generality as long as it is assumed that the actions of a single agent, who is of measure zero, are not observed by other players. Then a deviation by a single agent has no e?ect on the subsequent decisions of other agents. 3. LEARNING WITH TWO (REPRESENTATIVE) AGENTS AND TWO ACTIONS To fix ideas and illustrate the workings of the basic model, we first consider the special case of two representative agents, A and B,andtwo actions, 0 and 1. There are three graphs, besides the empty graph N A = N B = ?, (i) N A = {B},N B = {A}; (ii) N A = {B},N B = ?; (iii) N A = ?,N B = {A}. 6 Cases (ii) and (iii) are uninteresting because there is no possibility of mu- tual learning. For example, in case (ii), agent B observes a private signal and chooses the optimal action at date 1. Since he observes no further in- formation, he chooses the same action at every subsequent date. Agent A observes a private signal and chooses the optimal action at date 1.Atdate 2, he observes agent B’s action at date 1, updates his beliefs and chooses the new optimal action at date 2.Afterthat,A receives no additional information, so agent A chooses the same action at every subsequent date. Agent A has learned something from agent B, but that is as far as it goes. In case (i), on the other hand, the two agents learn from each other and learning can continue for an unbounded number of periods. We focus on the network defined in (i) in what follows. For simplicity, we consider a special information and payo? structure. We assume that? = ? A ×? B ,where? i is an interval [a,b] and the generic element is ω =(ω A ,ω B ). The signals are assumed to satisfy σ i (ω)=ω i ,?ω ∈?,i= A,B, where the random variables ω A and ω B are independently and continuously distributed, that is, P = P A ×P B and P i has no atoms. There are two actions a =0,1 and the payo? function is assumed to satisfy u(a,ω)= ? 0 if a =0 U(ω) if a =1, where the function U(ω A ,ω B ) is assumed to be a continuous and increasing function. To avoid trivialities we assume that neither action is weakly dominated. These assumptions are su?cient for the optimal strategy to have the form of a cuto? rule. To see this, note that for any history that occurs with positive probability, agent i’s beliefs at date t take the form of an event {ω i }×B jt , where the true value of ω j is known to belong to B jt .Then the payo? to action 1 is ? i (ω i ,B jt )=E[U(ω A ,ω B )|{ω i }×B jt }. Clearly, ? i (ω i ,B jt ) is increasing in ω i , because the distribution of ω j is independent of ω i ,sothereexistsacuto? ω ? i (B jt ) such that ω i >ω ? i (B jt )=? ? i (ω i ,B jt ) > 0, ω i <ω ? i (B jt )=? ? i (ω i ,B jt ) < 0. We assume that when an agent is indi?erent between two actions, he chooses action 1. The analysis is essentially the same for any other the tie-breaking rule. The fact that agent i’s strategy takes the form of a cuto? rule implies that the set B it is an interval. This can be proved by induction as follows. At date 1, agent j has a cuto? ω ? j1 and X j1 (ω)=1if and only if ω j ≥ ω ? j1 . 7 Then at date 2 agent i will know that the true value of ω j belongs to B j (ω), where B j2 (ω)= ? [ω ? j1 ,b] if X j1 (ω)=1, [a,ω ? j1 ) if X j1 (ω)=0. Now suppose that at some date t,theinformationsetB jt (ω) ? [a,b] is an interval and agent j’s cuto? is ω ? jt (B it (ω)).Thenatdatet +1, agent i knows that ω j belongs to B jt+1 (ω),where B jt+1 (ω)= ? B jt (ω)∩[ω ? jt (B it (ω)),b] if X jt (ω)=1, B jt (ω)∩[a,ω ? jt (B it (ω))) if X jt (ω)=0. Clearly, B jt+1 (ω) is also an interval. Hence, by induction, B it (ω) is an interval for all t and the common knowledge at date t is B t (ω)=B At (ω)× B Bt (ω).Byconstruction,ω ∈ B t+1 (ω) ? B t (ω) for every t.ThenB t (ω) & B(ω)=∩ ∞ t=1 B t (ω) and {B(ω):ω ∈?} defines a partition of ?.Notethat ω ∈ B(ω) so B(ω) 6= ?. In the limit, when all learning has ceased, agent A knows that ω B be- longstoasetB B (ω) and agent B knows that ω A belongs to B A (ω).Fur- thermore, because the actions chosen at each date are common knowledge, the sets B A (ω) and B B (ω) are common knowledge. An interesting question is whether, given their information in the limit, the two agents must agree which action is best. In the two-person case, we can show directly that both agents must eventually agree, in the sense that they choose di?erent actions only if they are indi?erent. The proof is by contradiction. Suppose, contrary to what we want to prove, that for some B and every ω such that B(ω)=B, E[U(ω A ,ω B )|{ω A }×B B ] > 0 and E[U(ω A ,ω B )|B A ×{ω B }] < 0. Then the same actions must be optimal for every element in the information set (otherwise, more information would be revealed) and this implies E[U(ω A ,ω B )|{ω A }×B B ] ≥ 0 and E[U(ω A ,ω B )|B A ×{ω B }] ≤ 0, where ω A =infB A (ω) and ω B =supB B (ω).Then U(ω A ,ω B ) ≥ 0 and U(ω A ,ω B ) ≤ 0, 8 or U(ω A ,ω B )=0.IfB B is not a singleton, E[U(ω A ,ω B )|{ω A }×B B ] < 0 a contradiction. Similarly, if B A is not a singleton, E[U(ω A ,ω B )|B A ×{ω B }] > 0, a contradiction. Thus, B is a singleton and U(ω)=0if ω ∈ B. The set {ω : U(ω)=0} has probability zero, so the probability of disagreeing forever is 0. In other words, both agents will choose the same action in finite time and once they have chosen the same action, they have reached an absorbing state and will continue to choose the same action in every subsequent period. 3.1. An example To illustrate the short-run dynamics of the model, we can further spe- cialize the example by assuming that, for each agent i, the signal σ i (ω)=ω i is uniformly distributed on the interval [?1,1] and the payo? to action 1 is U(ω)=ω A + ω B . At date 1, each agent chooses 1 if his signal is positive and zero if it is negative. If both choose the same action at date 1, they will continue to choose the same action at each subsequent date. Seeing the other agent choose the same action will only reinforce each agent’s belief that he has made the correct choice. No further information is revealed at subsequent dates and so we have reached an absorbing state, in which each agent knows his own signal and that the other’s signal has the same sign, but nothing more. So interesting dynamics occur only in the case where agents choose di?erent actions at date 1. The exact nature of the dynamics depends on the relative strength of the two signals, measured here by their absolute values. Without loss of generality, we assume that A has a negative signal, B a positive signal, and B’s signal is relatively the stronger, i.e., |ω A | < |ω B |. Case 1: ω A > ?1/2 and ω B > 1/2.Inthefirst round at date 1, agent A will choose action 0 and agent B will choose action 1. At the second date, having observed that agent B chose 1, agent A will switch to action 1, while agent B will continue to choose 1. Thereafter, there is an absorbing state in which both agents choose 1 foreverandnofurtherlearningoccurs. Case 2: 3/4 <ω A < ?1/2 and ω B > 3/4. As before, A chooses 0 and B chooses 1 at date 1.Atdate2, A observes that B chose 1 and infers that his signal has expected value 1/2.Sinceω A < ?1/2,itisoptimalfor A to choose 0 again. Since B has an even stronger signal, he will continue to choose 1.Atdate3, A observes that B chose 1,thusrevealingthat ω B > 1/2 sotheexpectedvalueofB’s signal is 3/4 and since ω A > ?3/4 it is optimal for him to switch to 1, which then becomes an absorbing state. 9 Case 3: ?(t?1)/t > ω A > ?(t?2)/t and ω B > (t?1)/t. By analogous reasoning, A chooses 0 and B chooses 1 until date t when A switches to 1. The other interesting case to consider is when the signals are similar in strength. For example, suppose that ω A = ?ω B = x ? where x ? is the limit of the sequent {x t } ∞ t=1 defined by putting x 1 = 1 2 ,x 2 = 1 4 , and x t = 1 2 (x t?1 + x t?2 ) for t =3,4,....Noticethatift is even then x t <x ? <x t?1 . As usual A chooses 0 and B chooses 1,atdate1.Atdate2, A observes B’s choice in the previous period, realizes that the expected value of B’s signal is 1/2 > ?x ? and switches to 1. By the symmetric argument, B switches to 0.Atdate3, A observes B’s switch to 0 and realizes that 1/4 <ω B < 1/2, that is, the expected value of ω B is greater than x ? = ?ω A .SoitisoptimalforA to choose 0 again. By a symmetric argument, B switches back to 1 at date 3. At any even date t, A will choose 1 and B will choose 0 and at any odd date t, A will choose 0 and B will choose 1. B’s choice at an even date t reveals that x t?2 <ω B <x t?1 and his choice at an odd date reveals x t?1 <ω B <x t?2 . By construction, at any odd date t, ?ω A = x ? <x t = 1 2 (x t?1 + x t?2 ),soitisoptimalforA to choose 1 at t +1. Likewise, at any even date t, ?ω A = x ? >x t = 1 2 (x t?1 + x t?2 ),soitisoptimalforA to choose 0 at t +1. In fact, we can find a signal ω to rationalize any sequence of actions with the properties that for some T, x At 6= x Bt for t<Tand x At = x Bt = a for t ≥ T. However, the sequences corresponding to T = ∞ occur with probability 0 and the sequences with T<∞ occur with positive probability. This example can also be used to illustrate the speed of convergence to uniformity of actions. In the first period, the probability that agents choosethesameactionis1/2. In the second period, it is 3/4.Inthethird period, it is 7/8, and so on. This is a very special example, but simulations of other examples confirm these results. Finally, we note that in this simple example, where the signals of the two players are symmetrically distributed, the asymptotic outcome must be Pareto-e?cient. This follows from the fact that the agent with the stronger signal, as measured by its absolute value, will ultimately determine the action chosen. However, a simple adjustment to this example shows the possibility of an ine?cient outcome. Suppose that A has a signal uniformly distributed on [0,1] and B has a signal uniformly distributed on £ ? 1 2 ,1 ¤ . Then both A and B will choose action 1 at the firstdateandtherewillbe no learning. However, if ω A is close to 0 and ω B is close to ? 1 2 then action 0 is clearly preferred conditional on the information available to the two agents. 10 4. ASYMPTOTIC PROPERTIES Now we return to the general model described in Section 2 and study the asymptotic properties. Although the process of learning in a social network is complicated, it has a number of features that make the characterization of the asymptotic outcomes tractable. The first is the Welfare-Improvement Principle: since an agent’s information is non-decreasing over time, his payo? must also be non-decreasing. This allows us to apply the Martingale Convergence Theorem to show that equilibrium payo?s converge with probability one as time goes to infinity. Smith and S?rensen (2000) also use the Martingale Convergence Principle to show that Bayesian learning eventually leads of to convergence of actions and beliefs. The second useful feature is the Imitation Principle: since an agent can always imitate his neighbor, he must do at least as well as his neighbor on average (with a one-period lag). Similar ideas have been used by Banerjee and Fudenberg (1995) and Bala and Goyal (1998). We must be careful in exploiting this property, since it does not imply that an agent will do as well as his neighbor with probability one. Nonetheless, it turns out to be a useful property. The Imitation Principle is particularly useful in a connected network. If i is connected to j then we can use the Imitation Principle recursively to argue that i does as well as j.Ifj is connected to i the same argument implies that i and j receive the same payo? on average. This fact is then used to show that i and j must choose the same action unless they are both indi?erent. In other words, they essentially agree on the best action to take. In cases where indi?erence occurs with probability zero, we have uniformity in the limit. Without this connectedness assumption, there is no reason to expect equal payo?s or uniformity. A trivial example, would be the two-person network N A = {B},N B = ?,whereA observes B but B does not observe A. Clearly, A must do at least as well as B but may do better and there is no reason why A should always choose the same action as B.Ontheother hand, the complete network N A = {B},N B = {A} is connected and as we saw in the previous section, uniformity of actions arises with probability one in finite time. 4.1. Convergence The first step in our analysis is to establish convergence of beliefs and payo?s. From the definition of equilibrium, we note that F it ? F it+1 ? F for every i and t. In other words, an agent’s information is non-decreasing over time. Then his equilibrium payo? must be non-decreasing over time and, since it is bounded, must converge. This property is established in the next theorem. 11 Theorem 1. Let {X it ,F it : i =1,...,n,t=1,2,...} be an equilibrium. For each i,define V ? it : ?→ R by V ? it = E[U(X it ,·)|F it ]. Then {V ? it } is a submartingale with respect to {F it } and there exists a random variable V ? i∞ such that V ? it converges to V ? i∞ almost surely. 4.2. The Imitation Principle ThenextstepistoestablishtheImitation Principle, which states that asymptotically an agent must do at least as well as his neighbors. This follows directly from the fact that one strategy available to agent i is to imitate the actions of agent j ∈ N i . Corollary 1. Let {X it ,F it } be the equilibrium in Theorem 1 and let V ? it be be the equilibrium payo?s. Then for any j ∈ N i and any t, V ? it ≥ E[V ? jt?1 |F it ]. Furthermore, in the limit, V ? i∞ ≥ E[V ? j∞ |F i∞ ], where F i∞ is the σ-field generated by ∪{F i1 ,F i2 ,...}. 4.3. Connectedness In order to make use of the foregoing results to establish uniformity of actions, we need to make use of connectedness. We begin by studying the behavior of adjacent agents i and j ∈ N i andthenextendtheresultsto theentirenetwork. It is easy to see that if i is connected to j,thenE [V ? i∞ ] ≥ E £ V ? j∞ ¤ by induction. In particular, if j ∈ N i and j is connected to i then V ? i∞ ≥ E[V ? j∞ |F it ] and E[V ? j∞ ] ≥ E[V ? i∞ ], which implies that V ? i∞ = E[V ? j∞ |F i∞ ]. Corollary 2. Let {X it ,F it } be the equilibrium in Theorem 1 and let V ? it be the equilibrium payo?s. If j ∈ N i and j is connected to i then V ? i∞ = E[V ? j∞ |F i∞ ]. Our next result concerns the possibility for agents to choose di?erent actions in the long run. The fact that agents get the same payo? in the long run suggests that they must choose the same actions unless they are indi?erent. This result requires a certain amount of care because their information sets are di?erent, but the intuition is essentially correct as the next theorem shows. Let {X it ,F it : i =1,...,n,t =1,2,...} be an equilibrium and define V a it : ?→ R by V a it = E[U(a,·)|F it ] for any agent i,datet, and action a.Then{V a it } is a martingale with respect to {F it } and V a it converges to a random variable V a i∞ almost surely. 12 Theorem 2. Let i and j be two agents such that j ∈ N i and j is connected to i.LetE ab denotethemeasurablesetonwhichi chooses a infinitely often and j chooses b infinitely often. Then V a i∞ (ω)=V b i∞ (ω) for almost every ω in E ab .Thatis,i is indi?erent between a and b for almost every ω in E ab . Intuitively, if i alwaysbelievesthatheisgettingthesamepayo? as agent j then i cannot believe that he is choosing a better action than j.In this sense, they cannot disagree forever. Clearly, since the network is connected, every agent asymptotically has the same payo? and, indi?erence apart, all agents must choose the same actions. The concept of connectedness used here is strong in the sense that there must be a path running both ways between any pair of nodes. If the network is not connected in this sense, one can still apply Theorem 2 to connected components of the graph, that is, maximal subsets of nodes such that every pair of nodes in the subset is connected. 5. SHORT-RUN DYNAMICS To illustrate the short-run dynamics of the model, we adapt the example introduced in Section 3.1 by assuming there are three agents, A, B,andC, and two actions, 0 and 1.Asbefore,thepayo?from action 0 is identically 0 and the payo? from action 1 is U(ω)=ω A +ω B +ω C , where, for each agent i, the signal σ i (ω)=ω i is uniformly distributed on the interval [?1,1]. The three-person case, unlike the two-person case, has several non- trivial social networks, each of which gives rise to its own distinctive learn- ing patterns. We refer to a network in which every agent directly observes every other agent as complete. Otherwise, the network is incomplete.The network studied in Section 3 is complete whereas most of the networks for the three-person case are incomplete. Several social networks are illustrated in Figure 1. In a complete network, the entire history of past actions is common knowledge at each date. In an incomplete social network, past actions are typically not common knowledge at each date. This lack of common knowledge plays an important role in the learning process. It forces agents to make more or less complicated inferences about what other agents have seen, as well as about the inferences those agents have drawn, and changes the nature of the decision rules adopted by the agents. 13 FIG. 1 5.1. The complete network As a benchmark, consider the (unique) complete network, in which each agent can observe the other two: N A = {B,C},N B = {A,C},N C = {A,B}. If agents choose the same action at the first date, learning e?ectively ends there. For example, suppose thatω i > 0 for i = A,B,C.Agenti’s expected payo? from action 1 is ω i > 0,sinceE[ω j ]=0for j 6= i. So each agent will choose action 1 at the first date. At the second date, seeing that the others have chosen the same action at date 1, agent i will infer that ω j > 0 and hence that E[ω j |ω j > 0] = 1/2 for j 6= i. This will increase agent i’s payo? from action 1 from ω i to ω i +1and reinforce agent i’s preference for action 1. So each agent will continue to choose action 1 at date 2.Atdate 3 there is no change in actions or beliefs, so we have reached an absorbing state. Given the assumed values of the signals, the outcome is e?cient. A more interesting case is one in which there is diversity of actions at date 1. For example, suppose that ω A > 0,ω B > 0,andω C < 0.Atdate 1, agents A and B will choose action 1 and agent C will choose action 0.At date 2 it becomes common knowledge that ω A > 0,ω B > 0,andω C < 0. 14 The payo? from action 1 conditional on agent A’s information is ω A + E[ω B |ω B > 0] + E[ω C |ω C < 0] = ω A + 1 2 ? 1 2 = ω A . Similarly, the payo? from action 1 conditional on agent B’s information is ω B .SobothA and B will continue to choose action 1. Conditional on agent C’s information, however, the payo? from action 1 is E[ω A |ω A > 0] + E[ω B |ω B > 0] + ω C = 1 2 + 1 2 + ω C > 0 since ω C > ?1.SoC will switch to action 1 at date 2.Atdate3,nofur- ther information is revealed, so actions and beliefs remain unchanged and once again we have reached an absorbing state. Agent C ignores his own information and joins the “herd” consisting of agents A and B. Clearly, the outcome will be ine?cient if ω A and ω B are small relative to the absolute value of ω C . In either of the cases considered, the learning process comes to a halt bytheendofthesecondperiodatthelatest. 5.2. Incomplete networks The first incomplete network we examine is the Star: N A = {B,C},N B = {A},N C = {A}. Thus, at each date, A is informed about the entire history of actions that have already been taken, whereas B and C have imperfect information and thus have to form expectations about the actions of the unobserved third agent. As with the complete network, if all the agents choose the same action at date 1, this is an absorbing state. So consider again the more interesting case where there is diversity at date 1, for example, ω A > 0, ω B > 0,and ω C < 0. The analysis of the decisions of the agents at date 1 is unchanged but now at date 2 agent C only observes that his action at date 1 does not match A’s action. Conditional on C’s information, the expected value of A’s signal is 1/2 and the expected value of B’s signal is zero. Thus, at date 2,itisoptimalforC to switch to 1 if ω C ≥?1/2 and to continue to choose 0 otherwise. By the third round at date 3, agent C can draw some conclusions about the actions that B could have taken by observing the actions of agent A at dates 1 and 2.IfA chooses 1 at both dates then it is revealed to C that B’s signal is positive; otherwise agent A wouldhaveswitchedtoaction0 at date 2. Thus, a simple calculation shows that, having observed that A continues to choose 1,itisoptimalforC to switch to action 1 for any realization of his private signal. 15 Even so, we have not necessarily reached an absorbing state, as agent A might himself switch at date 3. Toseethis,notethatatdate3, A’s expected value of B’s signal is 1/2 and A’s expected value of C’s signal is ?3/4. Thus, it is optimal for A to choose 1 again if ω A ≥ 1/4 and to switch to 0 otherwise. In case ω A < 1/4, it is common knowledge at date 4 that 0 ≤ ω A < 1/4, 0 ≤ ω B ≤ 1 and ?1 ≤ ω C < ?1/2. Table 1 summarizes the play of the game and shows that it can continue for quite a few periods. t (a A ,a B ,a C ) E A [ω B ],E A [ω C ] E B [ω A ],E B [ω C ] E C [ω A ],E C [ω B ] 1(1,1,0) 0,00,00,0 2(1,1,0) 1/2,?1/21/2,01/2,0 3(0,1,1) 1/2,?3/41/2, /2,1/2 4(0,1,0) 1/2,?3/ /8,?3/41/8,1/2 Table 1 As Table 1 illustrates, the dynamics of actions and learning are quite dif- ferent in complete and incomplete social networks. First, in an incomplete network, learning does not end after two periods and more information may be revealed as a result. Secondly, for the diversity of actions to persist, the agent at the center of the Star must have a signal that is relatively weak (as measured by its absolute value) compared to the agents at the periphery. When his signal is relatively weak, the central agent changes his action frequently, thus transmitting information between the peripheral agents. Alternating actions can also arise in the Star network when agent A has the negative signal and B and C have positive signals. At date 2, agent A observes that both B and C chose action 1 at date 1,soitisoptimalforhim to ignore his own information and to switch to action 1. However, at the same time, either agent B or agent C (or both) would switch from action 1 to action 0 if their signals are weak (less than 1/2). Table 2 illustrates that alternating actions may continue beyond period 2 and that if the signals of agents B and C are relatively weak, say ω B < 1/4 and ω C < 1/4,andA’s signal is relatively strong, ω A < ?1/2, there is an absorbing state in which all choose action 0. t (a A ,a B ,a C ) E A [ω B ],E A [ω C ] E B [ω A ],E B [ω C ] E C [ω A ],E C [ω B ] 1(0,1,1) 0,00,00,0 2(1,0,0) 1/2,1/2 ?1/2,0 ?1/2,0 3(0,1,1) 1/4,1/4 ?1/2,1/2 ?1/2,1/2 4(0,0,0) 1/8,1/8 ?3/4,1/2 ?3/4,1/2 16 Table 2 A second example of an incomplete social network is provided by the Circle, in which each agent observes one other agent: N A = {C},N B = {A},N C = {B}. In this case, generically no subset of the history of actions is shared as public information, and thus each agent makes di?erent inferences about what others have learned. Thus, this network best illustrates how lack of common knowledge plays a crucial role. Proceeding with the above example where ω A > 0, ω B > 0 and ω C < 0, suppose that (t?2)/t > ω A > (t?1)/t and ω C < ?(t?1)/t.Asinthetwo- person case where agents have strong signals with opposite signs, so too in this situation despite the lack of common knowledge agents can agree to disagree. In fact, over time agent A and agent C learnthatatleastoneof the other agents must have contrary private information which is stronger than they thought and thus they adjust their expectations towards ?1 and 1 respectively. This continues until date t when agent A realizes that his own signal is weaker and will switch to action 0. Then, at date t+1, having observed that agent A switched to action 1, agent B infers that agent A’s signal has expected value (2t?3)/2t.Onthe other hand, agent B cannot tell whether agent C has also switched, but given his beliefs about C’s information and strategy, he infers that agent C’s signal has expected value (t?1)/t.Thus,atdatet+1,itisoptimalfor agent B to switch to 0 if ω B < 1/2t and to continue to choose 1 otherwise. 6. DISCUSSION There is a large literature on the economics of networks. The most closely related paper is by Bala and Goyal (1998), henceforth BG. In the BG model, at each date, an agent chooses one of several available actions with unknown payo?distributions. The agent observes his payo?from the action and uses this information to update his beliefs about the payo?distribution. Agents are organized in a network and can observe the actions and payo?s of their neighbors, that is, the agents with whom they are directly linked. This is a model of social experimentation, in the sense that it generalizes the problem of a single agent experimenting with a multi-armed bandit to a social setting, rather than social learning: agents learn by observing the outcome (payo?) of an experiment (choice of action) rather than by inferring another agent’s private information from his action. A model of social experimentation is quite di?erent from a model of social learning. In a model of social experimentation, there is an informational externality but there is no informational asymmetry. There is private information in the BG model, but agents are assumed to ignore it. For example, suppose that agent A observes agent B’s action 17 and payo? but not agent C’s, whereas agent B observes agent C’s action and payo?. Then agent B has private information, not available to agent A, that is clearly payo?-relevant for agent A as well as agent B. However, A is assumed to ignore this linkage. A learns from B’s experiments (ac- tions), but does not ask what information might have led B to choose those experiments (actions). BG show that, in a connected network, in the long run, everyone adopts the same action and the action chosen can be sub-optimal. Our model di?ers from BG in two ways. First, we examine the decisions of fully rational agents, who infer the information of unobserved agents from the actions they observe. Although the beliefs of agents are very complicated, it captures the idea that agents try to extract information about unobserved one, especially in small groups. Second, in our model agents observe only the actions of other agents, whereas in BG agents observe payo?s as well as actions. Obviously, there is a lot more information available in BG. The techniques used in this paper can be applied to other models. For example, there is no di?culty applying them to random graphs, as long as connectedness is satisfied with probability one. They could also be applied to dynamic graphs where the set of neighbors observed changes over time. Many questions about social learning in networks remain open. In spe- cial cases, we have established that uniformity arises in finite time with probability one. We conjecture that this result is true for all connected so- cial networks, but we have yet to provide a general proof. This result would follow as a corollary of Theorem 2 if we could prove that the probability of indi?erence in the limit is zero, as we did in the two-person case. The impossibility of indi?erence is harder to establish for networks with more agents but we believe it must be true under some regularity conditions. A second conjecture is that the result about asymptotic uniformity has a converse: if all agents choose the same action, they have reached an absorbing state and will continue to choose that action forever. This is true in the special cases we have looked at but we believe it must be true in general. Speeds of convergence can be established analytically in simple cases. For more complex cases, we have been forced to use numerical methods. The computational di?culty of solving the model is massive even in the case of three persons. However, the results are su?ciently dramatic that they suggest the same might be true for more general cases. This is an important subject for future research. 18 7. PROOFS 7.1. ProofofTheorem1 By definition (Billingsley, 1986, p. 480), the sequence {(V ? it ,F it ):t = 1,2,...} is a submartingale if the following four conditions hold: (i) F it ? F it+1 ; (ii) V ? it is measurable F it ; (iii) E[|V ? it |] < ∞ ; (iv) with probability 1, E[V ? it+1 |F it ] ≥ V ? it . The first conditions follows directly from the definition of equilibrium. The second holds because U(a,·) is F it -measurable, X it is F it -measurable, and V ? it = E[U(X it (·),·)|F it ]. The third condition follows because A ?R is finite and U(a,·) is bounded for each a. To establish the fourth condition, note that since F it ? F it+1 , X it is F it+1 -measurable and the equilibrium conditions imply that E[U(X it ,·)|F it+1 ] ≤ E[U(X it+1 ,·)|F it+1 ] = V ? it+1 . Then V ? it = E[U(X it ,·)|F it ] = E[E[U(X it ,·)|F it+1 ]|F it ] ≤ E[V ? it+1 |F it ] and {(V ? it ,F it ):t =1,2,...} is a sub-martingale. From the martingale convergence theorem, there exists a random vari- able V ? i∞ such that V ? it → V ? i∞ almost surely. 7.2. Proof of Corollary 1 We note that for any j ∈ N i ,X jt?1 is F it -measurable so the equilibrium conditions imply that V ? it ≥ E[U(X jt?1 ,·)|F it ] = E[E[U(X jt?1 ,·)|F jt?1 ]|F it ] = E[V ? jt?1 |F it ]. From this inequality it follows that V ? i∞ ≥ E[V ? j∞ |F i∞ ],whereF i∞ is the σ-field generated by ∪{F i1 ,F i2 ,...}. 19 7.3. ProofofTheorem2 Let i and j be two agents such that j ∈ N i and j is connected to i and let a and b be fixed but arbitrary actions. Define E = {ω : X it = a i.o, X jt = b i.o.}.ThenE is a measurable set, in fact, E ∈ F i∞ .Let χ E : ?→ {0,1} denote the indicator function for the set E,thatis, χ E (ω)= ? 1,ω∈ E, 0,ω/∈ E. Since V a it (ω)=V ? it (ω) i.o. for every ω ∈ E and V a it → V a i∞ almost surely, we have V a i∞ = V ? i∞ for almost every ω ∈ E. Similarly, V b j∞ = V ? j∞ for almost every ω ∈ E. From Theorem 1, with probability one, V ? it χ E → V ? i∞ χ E and V ? jt χ E → V ? j∞ χ E . From Corollary 2, with probability one, E[V ? i∞ χ E |F i∞ ]=E[V ? j∞ χ E |F i∞ ]. For any action a in A let V a it = E[U(a,·)|F it ]. Clearly, {V a it } is a martingale and V a it → V a i∞ almost surely. Let E a i = {ω : V a i∞ (ω) ≥ V b i∞ (ω),b6= a}. Then E a i ∩E b j belongs to F i∞ ∩F j∞ . Suppose that P[E a i ∩E b j ] > 0 for some a 6= b.Weconclude E[V ? i∞ |E a i ∩E b j ]=E[U(a,·)|E a i ∩E b j ] ≥ E[U(b,·)|E a i ∩E b j ] = E[V ? j∞ |E a i ∩E b j ]. Since V ? i∞ = E[V ? j∞ |F i∞ ] we have V a i∞ (ω)=V b i∞ (ω) for almost every ω ∈ E a i ∩E b j . Thus, agents iandj can disagree (choose di?erent optimal actions) in the limit only if i is indi?erent. REFERENCES Bala, V. and Goyal, S. (1998). “Learning from Neighbors,” Rev. Econ. Stud. 65, 595-621. Banerjee, A. (1992). “A Simple Model of Herd Behavior,” Quart. J. Econ. 107, 797-817. 20 Banerjee, A. and Fudenberg, D. (1995). “Word-of-Mouth Learning,” MIT, mimeo. Bikhchandani, S., Hirshleifer, D. and Welch, I. (1992). “A Theory of Fads, Fashion, Custom, and Cultural Change as Informational Cascade,” J. Polit. Economy 100, 992-1026. Billingsley, P. (1986). Probability and Measure, 2nd edition.JohnWiley and Sons, New York. Bikhchandani, S., Hirshleifer, D. and Welch, I. (1998). “Learning from the Behavior of Others: Conformity, Fads, and Informational Cascades,” J. Econ. Perspect. 12, 151-170. Smith, L. and S?rensen, P. (2000). “Pathological Outcomes of Observa- tional Learning,” Econometrica 68, 371-398. 21