1
1
Lecture 13
Poisson Regression
Using Stata
2
This lecture covers
?
T
he Univariate
Poisson Distribution
?
T
he Poisson Regression Model
?
T
he Negative Binomial Distribution
?
N
egative Binomial Regression
?
C
omparing the Poisson and Negative
Binomial Regression Models
2
3
Introduction
?
T
his lecture deals with the modeling of
dependent variables that are event count or count data. An event count refers to the number of times an event occurs, and is the realization of a nonnegative integer-valued random variable. Variables that count the number of times that something has happened are common in the social scien
ce
s.
4
?
S
ome examples of count variables are the
number of times in a year that persons visit the doctor, the number of car accidents that occur each day in a city, the number of love affairs that occur to a university student during the four years, the number of industrial injuries in the workplace in a day, the number of cigarettes a person smokes in a day, etc.
3
5
?
I
n demography, popular count variables
are the number of children born to a woman, the number of pregnancies a woman has, the number of sexual intercou
rse
s
a p
e
rson
has in a week, the
number of sexual partners in a year, the number of abortions a woman has had in her lifetime, the number of residential migrations a person makes in a lifetime, the number of jobs a migrant worker has done since s/he arrived, etc.
6
?
F
requently, count variables are treated as
though they are continuous and unbounded. OLS models are then used to estimate the effects of
X
variable
s on thei
r
occurrence. OLS is appropriate if the dependent variable, the count, is independently and identically distributed. However, the use of OLS for count outcomes can result in inefficient, inconsistent and biased estimates if one or more of the OLS assumptions are not met.
4
7
?
T
hus, models other than OLS models
have been used to handle count data. This lecture will cover two: (1) the Poisson regression model, and (2) the negative binomial regression model. The software used in this lecture is
Sta
t
a
. (Poisson
regre
s
sion is also available in SPSS under
general log-linear model) Before the discu
ssi
on of th
e Poisso
n regre
s
sions,
let’s first take a look at the univariatePoisson distribution.
8
The Univariate
Poisson Distribution
?
T
he univariate
Poisson distribution
provides the benchmark for Poisson regression. Let
Y
equal a random variable
that represents the number of times that an event has occurred during an interval of time.
Y
will have a
Poisson
distribution
with a parameter
μ
greater
than 0:
5
9
?
w
here th
e param
e
ter
μ
represents the
expected number of counts that have occurred; for the distribution this will also be the mean. Thus,
10
Some properties of a
Poisson distribution
?A
s
μ
increases, the mass of the
distribution shifts to the right; we’ll see this below in the sample graphs of the univariate
Poisson distribution.
6
11
?
T
he va
ri
ance of
Y
equals
μ
. The equality
of the mean and the variance is known as equidis
p
ersion
. Actually in practice,
many co
unt varia
b
les have a varia
n
ce
greater than the mean, which is called overdispersion
. Sometimes, therefore,
the Poisson regression model is not entirely appropriate, often leading the analyst to the negative binomial regression model (to be discussed later).
12
?A
s
μ
increases, the pr
obability of 0’s
decreases. In a Poisson distribution, for
μ
= 0.8, the probability of an 0 is 0.45;
for
μ
= 1.5, the probability of an 0 is 0.22;
for
μ
= 2.9, the probability of an 0 is 0.05;
for
μ
= 10.5, the probability of an 0 is
0.00002.
?A
s
μ
in
crea
se
s,
the Poisso
n distribution
approximates a normal distribution.
7
13
?
H
ere are four examples of univariate
Poisson distributions, varying on their values o
f
μ
.
?
T
he first Poisson distribution has
μ
=0.8.
The second,
μ
=1.5. The third,
μ
=2.9.
The fourth,
μ
=10.5.
?
T
he Stata
c
ommands to produce the
graph are as below:
14
8
1516
?
A
critical assumption of the Poisson
distribution is that when an event occurs, it does not affect the probability of the event occurring in the future. If the “count”
is
children born to mothers, the assumption of independence implies that when a woman has a baby born to her, it does not affect the probability of another baby being born to her. In demography, however, future fertility is not independent from past fertility, and particularly in China, the next birth (or abortion) is heavily dependent upon the previous ones in the context of the strict family planning policy.
9
17
The Poisson Regression Model
?
I
n the Poisson regression model, the
number of events (the dependent variable) is a nonnegative intege
r; it has a Poisson
distribution with a conditional mean that depends on the characteristics (the
X
variables) of the individuals according to the following structural model:
18
The Poisson regression model is a nonlinear model, predicting for each individual the number of times,
μ
, that the event has
occurred. The
X
variables are related to
μ
nonlinearly.
10
19
The Poisson Regression Model
Predic
t
ing the Number of Children
Ever Born to Chinese Women
?
W
e are going to modeling the number of
children ever born (CEB) to Chinese women from the 1997 survey. Before doing the Poisson regression, we would be wondering if the count data are Poisson distributed.
20
?
T
hus we conducted
an analysis of the
count dependent variable to compare the observed distribution of the count data with a univariate
Poisson distribution with
the same mean as the count data.
?
T
he dependent variable is a count variable,
namely, the number of children ever born to a woman. The variable is called “CEB”. Here are descriptive data on this variable for the sample women:
11
21
sum ceb, detail
22
?
T
he “CEB”
v
ariable is a count variable,
ranging from 0 to 9, with a mean of 1.855, a standard deviation of 1.125, and a variance of 1.266. The mean and the variance are not the same, as they are in a univariate
Poisson
distribution, but they are close. Unlike the case with many count variables, there is no overdispersion
in the “CEB”
v
ariable. We will
use Stata’s
“
prcounts”
c
ommand to graph the
distribution of “CEB”
in a graph along with a
univariate
Poisson distribution that has a
mean of 1.855. We can then see how closely the data are Poisson distributed.
12
23
?
F
irst, we estimate a Poisson regression
without any independent variables, so to be able to fit a univariate
P
oisson
distribution with a mean equal to that of our count dependent variable, CEB, namely 1.855.
?
w
hen the Poisson regression model has
no independent variables, the estimated model is reduced to:
24
poisson
c
eb, nolog
13
25
?
O
bserve that the Poisson intercept in this
model, which has no independent variables, is .6178104 . We exponentiatethis value, that is,
e
.6178104
= 1.854862
,
which is indeed the mean of the “CEB”variable.
?
N
ow we use the “prcounts”
c
ommand to
graph the observed distribution of the CEB variable with a univariate
P
oisson
distribution with a mean of 1.855.
26
Stata
c
ommand:
prcounts
c
ebprob, plot max(10)
label var
c
ebprobobeq
"Observed CEB Distribution"
label var
c
ebprobpreq
"
Univariate
P
oisson, mu
=
1.855"label var
c
ebprobval
"Number of Children Ever Born"
graph twoway
connected cebprobobeq
cebprobpreq
cebprobval, ytitle("Proportion
or Probability")
ylabel(0(.1).4) xlabel(0(1) 9) title("CEB
D
istribution
and Poisson Distribution with mu
= 1.855")
14
27
0 .1 .2 .3 .4
Proportion or Probability
0
1
2
3
4
5
6
7
8
9
Number of Children Ever Born
Observed CEB Distribution
Univariate Poisson, mu = 1.855
CEB Distribution and Poisson Distribution with mu = 1.855
28
?
T
he graph shows that the children ever born
variable is pretty much Poisson distributed. The univariate
Poisson distribution over-
predicts the observed CEB distribution at the count of zero, and under-predicts counts of 1 and 2; the remaining counts are pretty close. The observed CEB distribution, compared to the univariate
P
oisson distribution with a
mean,
μ
, of 1.855, has substantially fewer
0’s, and more cases in the earlier counts. Even though the two distributions are not perfect, we may conclude
that we are correct
in estimating the CEB dependent variable with a Poisson model.
15
29
?
O
ne reason for the failure of the pure Poisson
distribution to perfectly fit the observed CEB data is that the rate of childbearing, i.e., the number of babies produced,
μ
, differs across
the women. The univariate
P
oisson distribution
with a mean of 1.855 does not take into account the heterogeneity of the sample women in their values of
μ
. So we need to extend the
univariate
Poisson distribution to the Poisson
regression model, in which we assume that the observed CEB count for woman
i
is drawn from
a Poisson distribution with mean
μ
, where
μ
i
,
is estimated from observed characteristics, that is, from X variables of the women.
30
Poisson regression is used to model CEB, with se
ven X var
i
ables, namely:
?
w
oman’s age at
menarche
,
in ye
ars;
?
a
ge at first marriage (
afm
),
in years;
?
w
oman’s exposure to the risk of childbearing
(
fec
u
nd
), which is calculated in years for
each woman, is the difference between her age at menarche and either, her age at sterilization, her age at menopause, or her age when the survey was conducted, whichever is less;
16
31
?
n
umber of years of education completed
(
eduyrs
);
?
w
hether the woman lives in an
urban
ar
ea,
coded 1 if yes, 0 if no;
?
w
hether the woman is a
Han
Chinese,
coded 1 if yes, 0 if no;
?
w
hether the woman’s first pregnancy
occurred after 1979; this variable is called policy
, and is coded 1 if yes, 0 if no.
32
Here are summary descriptive data for the dependent variable and the seven independent variables:sum ceb
m
enarche afm
f
ecund eduyrs
urban
han
policy
17
33
?
W
e hypothesize that the older the woman
at first menstruation, the greater the number of CEB to her; the greater the number of years of education, the less the CEB; if the first pregnancy occurred after 1980, CEB will be lower; urban women will have fewer CEB than rural women.
?
W
e will now estimate a Poisson regression
model, predicting CEB with the above seven
X
variables. The Stata
c
ommand is
poiss
on
followed by the dependent
variable and then the seven
X
variables.
34
poisson
ceb
m
enar
che afm
f
ecund eduyrs
urban han
policy
18
35
?
W
e may ask how well does this Poisson
regression model of children ever born improve our ability to predict the probabilities of a woman having each number (i.e., each count) of children.
?
W
e use the “prcounts”
command to
calculate the predicted probabilities for each count of CEB. We will call these predicted probabilities “prceb”.
36
Stata
c
ommand:
poisson
ceb
m
enar
che afm
f
ecund eduyrs
u
rban han
p
olicy
prcount
s
prceb, plot max(10)
label var
p
rcebpr
eq
"Prediction from PRM"
graph twoway
connected cebprobobeq
cebprobpr
eq
prcebpreq
c
ebprobval, ytitle("Proportion
or Probability")
ylabel(
0
(.1).4) xlabel
(0(1) 9) title("Di
s
tributions
of CEB,
Uni
v
ariate
Poisson")
sub("and
Poisson Regression Model")
19
37
0 .1 .2 .3 .4
Proportion or Probability
0
1
2
3
4
5
6
7
8
9
Number of Children Ever Born
Observed CEB Distribution
Univariate Poisson, mu = 1.855
Prediction from PRM
and Poisson Regression Model
Distributions of CEB, Univariate Poisson
38
?
T
he predicted probabilities generated by
the Poisson regression model (PRM) are only slightly worse at predicting count 0 than the predictions generated by the univariate
Poisson distribution; but for the
most pa
rt the P
R
M only result
s in a
modest improvement in the predictions. Both the PRM predictions, and the predictions generated by the univariatePoisson model, are still somewhat off the actual distribution of CEB.
20
39
Poisson Goodness of Fit
?
T
here is a formal “goodness of fit”
test we
may calculate, that co
mpares the observed
empirical distribution with the distribution predicted by the Poisson regression model. The null hypothesis (
H
0
) is that there is no
difference between the observed data and the modeled data, indicating that the model fits the data. So we are looking for a small value of chi-square, wi
th a probability > 0.05.
If we have a small chi-square, this would mean we have a model that fits the data.
40
?
T
he Stata
c
ommand is
poisgof
?
T
he goodness of fit test is good news for our
model. It tells us that the model fits the data very well; specifically, the goodness of fit
Chi2
test indicates that giv
en the Poisson regression
model we cannot reject the null hypothesis that our observed data are Poisson distributed.
?
T
he Stata
p
rintout for the Poisson regression
equation also shows values of
Pseudo R2
and
the (likelihood ratio)
LR Chi2
, which indicate
that we have a statistically significant model.
21
41
Interpretation of Poisson
Regression Coefficients
?
T
he interpretation of the actual Poisson
regression coefficients are a little difficult. This owes to the fact that the predicted number of children ever born for each woman, μ
i
, is nonlinearly related to the
X
variable
s.
Thus it is more straightforward and easier to interpret the Poisson b coefficients in a multiplicative way. This is analogous to our discussion of odds ratios and percent change in odds ratios.
42
?
If we exponentiate
t
he Poisson b
coefficient, we get what Stata
r
efers to as
incide
nce rate ratios
(
IR
R
); these are
very similar to odds ratios. Stata
w
ill
provide these if you ask for them (ask for irr
) after the comma following the
poiss
on
equation command.
?
A
n easier way is to use the
listcoef
command and you get the Poisson b coefficients (1st column of data on slide page 43), the irr
c
oefficients (4th column of
data on slide page 44,
e^
b
), and other
things.
22
43
poisson
c
eb
menarche afm
f
ecund eduyrs
urban han
policy, irr
44
listcoef, help
23
45
The IRR coefficients are interpreted just like odds ratios:
?
T
he IRR for urban is 0.76497 =
e
-0.26792
.
Being a urban women compared to being a rural woman multiplies the expected number of children by a factor of 0.765, holding the other X variables constant.
?
W
e can interpret the effect of the woman’s
age at menarche as follows: for every additional year of the woman’s age at menarche, her CEB is multiplied by a factor of 1.063, holding the other
X
variables constant.
46
?
A
n even better way to interpret the effects
of the
X
variables is in terms of the percent
change in the IRR coefficients; this is calculated just like we did in calculating the percent change in odds ratios.
?
T
he Stata
c
ommand
listcoef, pe
rcent
will
produce these calculations (in the 4th column of data on slide page 47).
24
47
listcoef, percent help
48
?
T
hus for every additional year in age at
menarche, a woman’s mean production of children increases by 6.3%, holding all other variables constant. For every additional year in education, a woman’s mean production of children decreases by 2.4%, holding all other variables constant. Being a urban woman decreased the expected production of children by -23.5%, holding all other variables constant.
25
49
?
A
lso, we can interpret the Poisson effects
in a
sta
ndardized
way, by calculating IRR
coefficients standardized on the
X
variab
le.
This is done by exponentiating
the
Poisson b coefficient after it has been multiplied by its standard deviation. These values are reported in the Stata
listcoef
table (5th column of data on slide page 44). We may also interpre
t these standardized
IRR coefficients in terms of percent change. These values are reported in the Stata
listcoe
f, percent
table (5th column
of data on slide page 47).
50
?
W
e may use these standardized
coefficients to gauge the relative impacts on fertility of all seven of the
X
var
i
ables.
?
T
hus, from the above table, we see that
the woman’s exposure to the risk of childbearing has the greatest impact, followed by the age at first marriage variable, followed by the age at menarche variable, followed by the urban variable, followed by the education variable, followed by the han
v
ariable, followed by
the policy variable.
26
51
Negative Binomial Regression
?
P
oisson regression model rarely fits in
practice since in most applications the variance of the count data is greater than the mean. Often, if there is overdispersion, the Poisson estimates will be consistent, but will be inefficient. The standard errors will be biased downwards, resulting in spuriously large z-values. Thus if there is overdispersion
in data
an
alyzed, t
h
e z-te
st
s
will tend to over estima
te the significance of
the
X
va
riables.
52
?
S
tatisticians thus recommend that when
there is overdispersion
in the count data
that the dependent variable be estimated with
Negative Binomial regre
s
sion
; this
alternate approach is used when the count event that is being estimated has extra-Poisson variation
,
that is, overdi
sp
ersion.
27
53
The Negati
ve Binomial (NB)
Distri
bution
?
T
he expected value of
Y
for the Negative
Binomial (NB) distribution is the same as for the Poisson distribution. But the variance for the NB distribution is larger. The larg
er va
ria
n
ce of NB thus increa
se
s
the relative frequency of low and high counts. A comparison
of the Poisson and
NB distri
butions shows t
h
at the NB
distribution corrects a number of sources of poor fit that are often found when the Poisson distribution is used.
54
?
One
, the variance of the NB distribution
exceeds the variance of the Poisson distribution for a given mean;
?
Two
, th
e increa
sed vari
ance of t
h
e NB
regression model results in substantially larger probabilities for small counts; and
?
Finally
, in the NB distribution there are
slightly larger probabilities for larger count
s.
28
55
?
I
n the negative binomial regression model,
variation in
μ
is due both to variation in
(the independent variables) among the individuals (in the sample population) and to unobserved heterogeneity introduced by ε
. The term
ε
i
s
a ran
d
o
m erro
r that is
assumed to be uncorrelated with the independent variables.
ε
may be thought
of either as the combined effects of unobserved variables that have been omitted from the model or as another sour
ce o
f
pure ra
ndomne
ss.
56
?
T
he negative binomial regression model
thus adds to the Poisson regression model the erro
r term
ε
according to the
following stru
ctu
r
al equa
t
ion:
?
If there is no overdispersion, the NB r
e
gr
ess
i
on model r
e
duces
to the P
o
is
s
o
n
r
e
gr
ess
i
on model.
29
57
Estimating a Negative Binomial
Regression Model of Chinese Fertility
?
S
tata
estimates NB regression models,
with the command
nbre
g
, followed by the
dependent and independent variables. We will estimate an NB regression model, using the same 1997 survey children ever born data set, as used above.
58
?
W
e know that there is no overdispersion
i
n
these CEB data; there is actually some underdispersion. Since there is no overdispersion
in the data, the NB
regression model should not do us any good, and the results should be the same as
in the P
o
is
s
o
n r
egr
ess
i
on model. Let’s
see if this happens.
nbreg
ceb
m
enarche afm
f
ecund eduyrs
urban han
policy
30
5960
?
S
tata
first fits (estimates) the Poisson
regression model; see the first panel of output --
these are the initial and final log
likelihoods from the Poisson model estimated earlier.
?
S
tata
then estimates the NB constant-only
model, with a final log likelihood of -6046.6406. This is the beginning value of the log likelihood;
X
variables are then
introduced, with the objective of reducing the value of this log likelihood.
?
T
he final log likelihood of the fitted model
is shown above as -5560.9586.
31
61
?
It is exactly the same value (and based on the same calculations) as in the Poisson regression model.
?
A
lso, compare the values of the NB
regression coefficients with the coefficients produced in the Poisson regression model. They too are nearly identical.
62
?
N
ote near the bottom of the NB regression
table values for
ln(alpha)
and
alpha
. The
alpha value represents the amount of dispersion in the data. The Stataparameterization is in terms of ln(alpha), but the value of alpha is given right below. If alpha is zero, the NB regression model reduces to the Poisson regression model. There is no evidence at all of any overdispersion
in the fertility (CEB) data;
alpha = 8.62e-12, that is, .00000000000862. Alpha is for all practical purposes, zero.
32
63
?
W
e see at the very base of the NR
regression table, a likelihood ratio
t
est of
the
H
0
that alpha = 0. This is calculated by
multiplying 2 times the difference between the final log likelihoods of the NB model and the Poisson model:
?
T
his value of 0 means that the observed
data are Poisson distributed. So, there is no need to estimate a negative binomial regression model; it reduced to the Poisson regression model.
64
Estimating Negative Binomial
Regression Models of Scientific
Productivity of Graduate Students
?
W
e’ll now use Stata
t
o estimate an NB
regression model of scientific productivity of graduate students, using data from the examples in “Regression Models for Categorical Dependent Variables Using Stata”
by Scott Long and Jeremy Freese.
This research on scientific productivity examined the factors (
X
variables) affecting
the number of papers published during graduate school (in the last 3 years) by a sample of 915 biochemists.
33
65
?
T
he dependent variable is number of articles
(
art
) published by the 915 graduate students.
Five
X
variables are intro
duced as predictors
of the number of articles published by the scientists while they were in graduate school.
?
T
he
X
variables are: (1) Female (fem), scored
1 if female, 0 if male; (2) Marital status (mar), sco
red 1
if marri
ed, 0 if not; (3) Number o
f
children aged 5 or younger (kid5), ranging from 0 to 3; (4) Prestige of the scientist’s PhD department (phd), an interval variable ranging from 0.76 to 4.62; (5) Articles published by the scientist’s mento
r
(ment
)
in last 3 year
s,
ranging from 0 to 77.
66
Here are summary descriptive data for this count dependent variable and these five
X
variables:
sum art fem mar kid5 phd
m
ent
34
67
The count dependent variable (art) has a mean of 1.693, and a variance of 3.7097. There is a lot of overdispersion
in these
data.Here is the Poisson regression model results:
68
poisson
a
rt fem mar kid5 phd
ment
35
69
Since we know there is a lot of overdispersion
in the dependent
variable, a Poisson regression model is likely not appropriate.
We will thus
estimate an NB regression model, using the same scientific productivity data set, as just used above.
nbreg
art fem mar kid5 phd
ment
70
36
71
?
N
ow, note near the bottom of the NB
regression table values for
ln(alpha)
and
alpha
. If alpha is zero, the NB regression
model reduces to the Poisson regression model. This was the case in the CEB example. But it is not the case here.
?
H
ere there is strong ev
idence that there is
overdispersion
in the data; alpha = .4416,
with a standard error of .0530, with a corresponding z value of 8.3, which is significant at .01. Alpha is significantly different from zero.
72
We see at the very base of the NR regression table, a likelihood ratio test of the H
0
that alpha = 0. This is calculated by
multiplying 2 times the difference between and final log likelihoods of the NB model and the Poisson model:This is a highly significant value, meaning that the observed data are not Poisson distributed.
37
73
Interpretation of the Negative
Binomial Regression Coefficients
?
T
hey are interpreted in exactly the same
way as in the Poisson model. Strata’s listcoef
comman
d
also works afte
r
estimating the NB model.
74
listcoef, help
38
75
?
T
he IRR coefficients (
e^b)
a
r
e interprete
d
just like odds ratios. The IRR for fem is 0.8054. Being a female scientist decreases (multiplie
s) the expected
number of articles by a factor of 0.8, holding other variables constant. Being married increases (mul
tiplies) the number
of articles by a factor of 1.16, holding other X
variables constant. We can interpret the
effect of the mentor’s productivity as follows: for every additional article of the mentor, a scientist’s average productivity increases by a factor of 1.03, holding the other
X
variables constant.
76
listcoef, percent help
39
77
?
T
hus for every additional article published
by the scientist’s mentor, a scientist’s mean productivity increases by 3%, holding all other variables constant. Being a female scientist decreases the expected productivity by 20%, holding all other variables constant. For every addition kid, a scienti
st’s mea
n
produ
ctivity d
e
crea
se
s
by 16%, holding all other variables con
s
tant
.
78
Comparing
the Poisson and Negati
ve
Binomial Regression Models
?
N
ow let’s plot the predicted probabilities of
scientific publication counts, as produced by the NB regression model, with those produced by the Poisson regression model, along with the observed proportions in the actual data.
40
79
0 .1 .2 .3 .4
Pr o p o r t i o n o r Pr o b a b ilit y
0
1
2
3
4
5
6
7
8
9
10
N
u
m
ber
of
A
r
t
i
c
l
es
O
b
se
rve
d
P
o
i
sso
n
NB
N
u
m
b
e
r
of
A
r
t
i
cl
es
D
i
st
r
i
bu
t
i
on
80
?
T
his chart shows us graphically that the
NB model fits the observed publication count data quite well, and does so in a much better way than the Poisson model.
?
W
e will also compare the scientific
productivity resu
lts of the
Poisson
regression with the negative binomial regression (Table on the next slide). For the most part, the Poisson and the NB regre
s
sion coeffi
cients a
r
e ver
y
similar;
they are not at all far apart.
41
81
Poisson Model
Negative Binomial Model
82
?
T
he sta
n
dard errors are uniformly lower i
n
the Poisson model than in the NB model, resulting in higher z-tests in the Poisson model than in the NB model. The differences in the Poisson and NB models regarding the standard errors (lower
standard errors in the
Poisson model) are precisely the expected consequence of overdispersion.
?
I
ndeed the marriage variable (
mar
) is shown
to be significant in the Poisson model, but not in the NB model.
?
It would be better off using the Negative Binomial model to predict the count of scientific productivity.
42
83
?
T
he “rule of thumb”
is to first estimate the
simpler model, the Poisson regression model, and then the NB model. If there is significant overdispersion, you are safer to report the results of the Negative Binomial model. If there is not, use the Poisson model.