Case-control
studies
This version is made for bilingual
teaching,Case-control study is an essential
research design of Epidemiology,which
involves identifying patients who have the
outcome of interest (cases) and control
patients who do not have that same outcome,
and looking back to see if they had the
exposure of interest,The exposure could be
some environmental factor,a behavioural
factor,or exposure to a drug or other
therapeutic intervention.
Select Study Design to
Match the Research Goals
Objective Design
Description o f dis ease or spectrum Case series or report
Cross - sectional stu dy
Determ ine operatin g characteristics
of a new diagno stic test
Cross - sectional
Describe progn osis Cohort s tudy
Determ ine cause - effect Cohort s tudy
Case - contr ol st udy
Compare new intervent ions Randomized clinical trial
Summarize literature Meta - analysis
Case-Control Studies
? Introduction
? Matching
? Investigate Example
? Design of Case-Control Studies
? Data collection and analysis
? Bias
? Strengths and Weaknesses
? Several important features EP IDE M IO LO G Y AN D H EA LTH ST ATI ST ICS
19 57
卫
Introduction
? Historical Perspective
? Definition
? Types of Design
EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
? Unique contribution of epidemiology to the repertoire
of clinical research designs
? First case-control study performed in late 1950s
? Doll and Hill?s study of lung cancer and smoking behavior
among physicians
? Jerome Cornfield?s classic description of
“Retrospective Studies”
? New statistical tools were developed to analyze the
study design - logistic regression
Historical Perspective
Introduction
Definition
A case-control study is a design in which
individuals with an event or condition of interest,
CASES,are identified and then compared with
regard to one or more exposures to individuals
without the event or condition of interest,
CONTROLS,Case-control investigations
typically are designed to assess the association
between occurrence of disease and an exposure
suspected of causing (or preventing) that disease.
Introduction
a
b
c
d
Cases
Controls
Direction of inquiry
Exposed
Exposed
Unexposed
Unexposed
a/(a+c)
b/(b+d)
Introduction
Types
Family of epidemiological study designs
? Traditional case-control design
? Case-control studies within cohorts
? Nested case-control study design
? Case-cohort study design
? Case-parent study design
? Case-only study design EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
Introduction
Matching
? Summarize
? Types
? Problems with Matching
EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
? Matching is defined as the process of selecting
controls so that they resemble the cases with
regard to certain characteristics
? The goal of matching is to create similar
distributions between cases and controls with
regard to certain characteristics
? Matching can be used to
? Adjust for potential confounding factors
? Increase precision of estimate
Matching
Summarize
? Individual level matching
? For each case in the study,one or more controls
are selected with identical (similar)
characteristics as the case
? Frequency,or group,matching
? Select controls so that the proportion with a
certain characteristic is identical to the
proportion of cases with that characteristic
Matching
Types
? Difficult and expensive
? Cannot evaluate the effect of controlled
variables
? May limit the ability to control for other
variables
? Overmatching
? Controls resemble cases in terms of known and
unknown characteristics,some of which may be
associated with the disease
Problems with Matching
Matching
Investigate Example
? the association between occurrence of
Eosinophilia-myalgia syndrome(EMS) and
Ingestion of L-tryptophan.
? Background
? conduct
? results
EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
Investigate Example
Background
EMS was first recognized in October 1989,it
occurs predominantly in women and is relatively
rare,when astute physicians determined that
three people with unexplained myalgias and
eosinophilia had consumed L-tryptophan,Prompt
response by health departments quickly led to
case-control studies,the results of which
suggested that ingestion of L-tryptophan was the
cause of EMS.
? The Centers for Disease Control and Prevention(CDC)
conducted a series of case-control studies in 1989 and
1990.
? One of the studies conducted in Minnesota,Researchers
selected 63 case subjects of EMS in the metropolitan
area of Minneapolis-St.Paul.
? Researchers randomly selected 5188 control subjects in
the same area.
? Researchers interviewed subjects and asked abort
potential risk factors and about their use of L-
tryptophan.
Investigate Example
Conduct
? L-tryptophan was taken significantly more
frequently by cases than by controls— 61 of
63 case subjects (97%),but only 101 of
5188 control subjects (2%).
? L-Tryptophan-containing products were
taken off the market in November1989,In
1990,after the recall of L-tryptophan,the
number of reported cases fell to near zero.
Investigate Example
Results
? Selection of Cases
Develop a case definition then identify new cases
within a specified time period
? Selection of Controls
The sample of controls should have the same
prevalence of exposure as the source population of
unaffected persons.
? Determination of Exposure
Design
EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
Selection of Cases
Design
? Sources of cases
? Species of cases
? Something important
Selection of Cases
Sources of Cases
? Sources of Cases
? Hospital or clinic
? Because risk factors may result from referral patterns to
specific hospitals,multiple hospitals/clinics often chosen
? Referral of more ill patients to hospitals,especially tertiary
care centers
? Population-based or community
? New cases reported to health departments,registries,
hospital record departments,etc,
? Cases cannot be selected based on known or
unknown association with exposure of interest
Design
Selection of Cases
Species of Cases
? Newly diagnosed or incident cases
? Previously existing or prevalent cases
? Incident cases preferred over prevalent cases in
most settings
? If prevalent cases chosen,then risk factors identified for
disease may be those related more to survival with
disease than disease occurrence.
? Survivorship bias also true for incident cases,but
minimized
Design
? Specify the definition of a case
? The criteria should minimize the likelihood that
an affected person (true case) is missed (i.e,the
criteria must be sensitive),
? A nonaffected person is falsely classified as a
case (i.e,the criteria must be specific).
Design
Selection of Cases
Something Important
Selection of Controls
? Sources of controls
? Multiple controls
? Something important
Design
? Hospital control group
? Hospitalized patients,best if chosen from the
same hospital as cases in order to control for
unknown reference population
? Select from all patients admitted to the
hospital
? Select from specific diagnosis
Design
Selection of Controls
Sources of Controls (1)
Selection of Controls
Sources of Controls (2)
? Community control group
? Probability sample best,but not often
practical
? Select from school rosters,insurance
companies,etc.
? Neighbors of cases
? Random digit dialing
? Best friend
Design
Selection of Controls
Multiple Controls
? Controls of the same type
? May improve precision of the measure of
association
? Precision rarely improved with more than 5
controls per case
? Controls of different types
? Hospital controls and community controls
per case
Design
? Controls cannot be selected based on
known or unknown association with
exposure(s) or risk factors of interest
Design
Selection of Controls
Something Important
? Exposure
? Something important
Determination of Exposure
Design
Determination of Exposure
Exposure
? Exposure is determined in a ?retrospective?
manner,that is one must look back in time
to assess exposure status before a person
became a case,
? Each individual?s prior exposure to the risk
factor of interest
? Other exposures
Design
Determination of Exposure
Something Important (1)
? Cases and controls must be assessed for
exposure in the same way
? Interviews should be standardized,monitored,
and conducted by trained interviewers.
Design
? Exposure must be measured in a blinded
manner
? Data collectors must be unaware of whether
subject is a case or control
? Data collectors should be unaware of the study
hypothesis
Design
Determination of Exposure
Something Important (2)
Data collection and analysis
? Collection of Data
? Analysis of Data
? OR
? Unmatched analysis
? Matched analysis
? Analytic Strategy
EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
Collection of Data
? Interviews and questionnaires
? Information concerning risk factors may also
be obtained from medical,occupational,or
other records.
Data collection and analysis
Analysis of Data
Data collection and analysis
Unexposed - c
Exposed - a
Population
at Risk
Exposed - b
Cases
Controls
Unexposed - d
? The power of the study design lies in the
symmetry of the OR.
? OR is the odds of exposure given disease divided
by the odds of exposure given no disease.
? Remember that the odds of exposure among cases
compared with controls is the same as the odds of
disease among exposed and unexposed.
Data collection and analysis
Odds Ratio
Exposed Unexposed Total
Cases
Controls
a
c
b
d
a+b
c+d
Total a+c b+d a+b+c+d
Unmatched analysis
d ) c) ( bd ) ( ab ) ( c(a
nb c)( a d
χ
????
?
?
2
2
Data collection and analysis
Unmatched analysis
)χ1, 9 6 /(1
2
ORO R 95% C I,
bc
ad
d
c
b
a
e x pos u r e c on t r ol of O dds
e x pos u r e c a s e of O dds
OR
?
?
??
?
Data collection and analysis
Control
Exposed
Control
Unexposed
Total
Case exposed a b a+b
Case unexposed c d c+d
Total a+c b+d a+b+c+d
Case-control pairs that share the same exposure
status do not contribute to the estimate of risk.
Data collection and analysis
Matched analysis
Matched analysis
cb
cb
?
?
?
2
2 )(?
)χ1,96 /(1
2
OROR 9 5 % C I,
cbOR
?
?
?
Data collection and analysis
Analytic Strategy
? Assess relationship/association between
? Exposure and independent variables
? Case/Control status and independent variables
? Calculate crude,or unadjusted,OR for
exposure - case association
? Matched analysis required for matched studies
Data collection and analysis
Analytic Strategy
? Stratified analysis
? Calculate stratum-specific ORs for exposure-case
relationship
? Determine presence of confounding and interaction
? Logistic regression analysis
? Regression technique used to adjust for confounding
and interaction
? Special logistic model applied in matched studies
Data collection and analysis
Bias
? Introduction
? Selection bias
? Information bias
? Confounding
EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
Introduction
? Case-control studies are subject to bias and
confounding,both will distort the results of the
study
? Bias is defined as the deviation of results,or
inferences,from the truth,or processes leading to
such deviation,
? There are about 75 different types of bias now
identified in published case-control studies
Bias
Selection Bias
? Features
? Types
Bias
Features (1)
? Selection bias reflects systematic errors that arise
from the way in which subjects are selected.
? If the prior exposure of the cases studied differs
from that of all cases arising from the source
population — or if prior exposure of controls
differs from that of persons in the source
population without the disease or interest —
selection bias may be present.
Bias
Features (2)
? Preferential diagnosis of exposed cases may
lead to selection bias.
? Low participation may lead to selection bias.
? Errors in sampling controls from the source
population can also create selection bias.
Bias
Types
? Admission rate bias
? Prevalence-incidence bias
? Detection signal bias
? Time effect bias
Bias
Information Bias
? A distortion in measuring exposure or
outcome data that results in different quality
(i.e.,accuracy or reliability) or frequency of
information between comparison groups.
? Recall bias
? Confoumding bias
Bias
Confounding Bias
? Confounding is a distortion of results that occurs when
the apparent effects of the exposure of interest are
attributable entirely or in part to the effects of an
extraneous variable.
? Criteria for confounding
? Factor is associated with exposure
? Factor is associated with disease in the absence of exposure
? Factor is not in the causal path between exposure and outcome
Bias
Strengths and Weaknesses
Strengths
? Rare disease
? Long latency between exposure
and disease
? Explore multiple hypotheses
? Inexpensive EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
Weaknesses
? Prone to bias
? Temporal relationships cannot be
established
? Inefficient for rare exposures,unless
exposure often lead to disease
Strengths and Weaknesses
Several important features
? The study provides an efficient means to study
rare diseases.Case-control studies tend to be more
feasible than other studies,
? Case-control studies allow researchers to
investigate several risk factors.
? A single case-control investigation does not
“prove” causality,but it can provide suggestive
evidence of a causal relationship that warrants
intervention by public health officials to reduce
exposure to the implicated risk factor,EP IDE M IO LO G Y AN D H EA LTH ST ATI ST ICS
19 57
卫
studies
This version is made for bilingual
teaching,Case-control study is an essential
research design of Epidemiology,which
involves identifying patients who have the
outcome of interest (cases) and control
patients who do not have that same outcome,
and looking back to see if they had the
exposure of interest,The exposure could be
some environmental factor,a behavioural
factor,or exposure to a drug or other
therapeutic intervention.
Select Study Design to
Match the Research Goals
Objective Design
Description o f dis ease or spectrum Case series or report
Cross - sectional stu dy
Determ ine operatin g characteristics
of a new diagno stic test
Cross - sectional
Describe progn osis Cohort s tudy
Determ ine cause - effect Cohort s tudy
Case - contr ol st udy
Compare new intervent ions Randomized clinical trial
Summarize literature Meta - analysis
Case-Control Studies
? Introduction
? Matching
? Investigate Example
? Design of Case-Control Studies
? Data collection and analysis
? Bias
? Strengths and Weaknesses
? Several important features EP IDE M IO LO G Y AN D H EA LTH ST ATI ST ICS
19 57
卫
Introduction
? Historical Perspective
? Definition
? Types of Design
EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
? Unique contribution of epidemiology to the repertoire
of clinical research designs
? First case-control study performed in late 1950s
? Doll and Hill?s study of lung cancer and smoking behavior
among physicians
? Jerome Cornfield?s classic description of
“Retrospective Studies”
? New statistical tools were developed to analyze the
study design - logistic regression
Historical Perspective
Introduction
Definition
A case-control study is a design in which
individuals with an event or condition of interest,
CASES,are identified and then compared with
regard to one or more exposures to individuals
without the event or condition of interest,
CONTROLS,Case-control investigations
typically are designed to assess the association
between occurrence of disease and an exposure
suspected of causing (or preventing) that disease.
Introduction
a
b
c
d
Cases
Controls
Direction of inquiry
Exposed
Exposed
Unexposed
Unexposed
a/(a+c)
b/(b+d)
Introduction
Types
Family of epidemiological study designs
? Traditional case-control design
? Case-control studies within cohorts
? Nested case-control study design
? Case-cohort study design
? Case-parent study design
? Case-only study design EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
Introduction
Matching
? Summarize
? Types
? Problems with Matching
EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
? Matching is defined as the process of selecting
controls so that they resemble the cases with
regard to certain characteristics
? The goal of matching is to create similar
distributions between cases and controls with
regard to certain characteristics
? Matching can be used to
? Adjust for potential confounding factors
? Increase precision of estimate
Matching
Summarize
? Individual level matching
? For each case in the study,one or more controls
are selected with identical (similar)
characteristics as the case
? Frequency,or group,matching
? Select controls so that the proportion with a
certain characteristic is identical to the
proportion of cases with that characteristic
Matching
Types
? Difficult and expensive
? Cannot evaluate the effect of controlled
variables
? May limit the ability to control for other
variables
? Overmatching
? Controls resemble cases in terms of known and
unknown characteristics,some of which may be
associated with the disease
Problems with Matching
Matching
Investigate Example
? the association between occurrence of
Eosinophilia-myalgia syndrome(EMS) and
Ingestion of L-tryptophan.
? Background
? conduct
? results
EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
Investigate Example
Background
EMS was first recognized in October 1989,it
occurs predominantly in women and is relatively
rare,when astute physicians determined that
three people with unexplained myalgias and
eosinophilia had consumed L-tryptophan,Prompt
response by health departments quickly led to
case-control studies,the results of which
suggested that ingestion of L-tryptophan was the
cause of EMS.
? The Centers for Disease Control and Prevention(CDC)
conducted a series of case-control studies in 1989 and
1990.
? One of the studies conducted in Minnesota,Researchers
selected 63 case subjects of EMS in the metropolitan
area of Minneapolis-St.Paul.
? Researchers randomly selected 5188 control subjects in
the same area.
? Researchers interviewed subjects and asked abort
potential risk factors and about their use of L-
tryptophan.
Investigate Example
Conduct
? L-tryptophan was taken significantly more
frequently by cases than by controls— 61 of
63 case subjects (97%),but only 101 of
5188 control subjects (2%).
? L-Tryptophan-containing products were
taken off the market in November1989,In
1990,after the recall of L-tryptophan,the
number of reported cases fell to near zero.
Investigate Example
Results
? Selection of Cases
Develop a case definition then identify new cases
within a specified time period
? Selection of Controls
The sample of controls should have the same
prevalence of exposure as the source population of
unaffected persons.
? Determination of Exposure
Design
EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
Selection of Cases
Design
? Sources of cases
? Species of cases
? Something important
Selection of Cases
Sources of Cases
? Sources of Cases
? Hospital or clinic
? Because risk factors may result from referral patterns to
specific hospitals,multiple hospitals/clinics often chosen
? Referral of more ill patients to hospitals,especially tertiary
care centers
? Population-based or community
? New cases reported to health departments,registries,
hospital record departments,etc,
? Cases cannot be selected based on known or
unknown association with exposure of interest
Design
Selection of Cases
Species of Cases
? Newly diagnosed or incident cases
? Previously existing or prevalent cases
? Incident cases preferred over prevalent cases in
most settings
? If prevalent cases chosen,then risk factors identified for
disease may be those related more to survival with
disease than disease occurrence.
? Survivorship bias also true for incident cases,but
minimized
Design
? Specify the definition of a case
? The criteria should minimize the likelihood that
an affected person (true case) is missed (i.e,the
criteria must be sensitive),
? A nonaffected person is falsely classified as a
case (i.e,the criteria must be specific).
Design
Selection of Cases
Something Important
Selection of Controls
? Sources of controls
? Multiple controls
? Something important
Design
? Hospital control group
? Hospitalized patients,best if chosen from the
same hospital as cases in order to control for
unknown reference population
? Select from all patients admitted to the
hospital
? Select from specific diagnosis
Design
Selection of Controls
Sources of Controls (1)
Selection of Controls
Sources of Controls (2)
? Community control group
? Probability sample best,but not often
practical
? Select from school rosters,insurance
companies,etc.
? Neighbors of cases
? Random digit dialing
? Best friend
Design
Selection of Controls
Multiple Controls
? Controls of the same type
? May improve precision of the measure of
association
? Precision rarely improved with more than 5
controls per case
? Controls of different types
? Hospital controls and community controls
per case
Design
? Controls cannot be selected based on
known or unknown association with
exposure(s) or risk factors of interest
Design
Selection of Controls
Something Important
? Exposure
? Something important
Determination of Exposure
Design
Determination of Exposure
Exposure
? Exposure is determined in a ?retrospective?
manner,that is one must look back in time
to assess exposure status before a person
became a case,
? Each individual?s prior exposure to the risk
factor of interest
? Other exposures
Design
Determination of Exposure
Something Important (1)
? Cases and controls must be assessed for
exposure in the same way
? Interviews should be standardized,monitored,
and conducted by trained interviewers.
Design
? Exposure must be measured in a blinded
manner
? Data collectors must be unaware of whether
subject is a case or control
? Data collectors should be unaware of the study
hypothesis
Design
Determination of Exposure
Something Important (2)
Data collection and analysis
? Collection of Data
? Analysis of Data
? OR
? Unmatched analysis
? Matched analysis
? Analytic Strategy
EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
Collection of Data
? Interviews and questionnaires
? Information concerning risk factors may also
be obtained from medical,occupational,or
other records.
Data collection and analysis
Analysis of Data
Data collection and analysis
Unexposed - c
Exposed - a
Population
at Risk
Exposed - b
Cases
Controls
Unexposed - d
? The power of the study design lies in the
symmetry of the OR.
? OR is the odds of exposure given disease divided
by the odds of exposure given no disease.
? Remember that the odds of exposure among cases
compared with controls is the same as the odds of
disease among exposed and unexposed.
Data collection and analysis
Odds Ratio
Exposed Unexposed Total
Cases
Controls
a
c
b
d
a+b
c+d
Total a+c b+d a+b+c+d
Unmatched analysis
d ) c) ( bd ) ( ab ) ( c(a
nb c)( a d
χ
????
?
?
2
2
Data collection and analysis
Unmatched analysis
)χ1, 9 6 /(1
2
ORO R 95% C I,
bc
ad
d
c
b
a
e x pos u r e c on t r ol of O dds
e x pos u r e c a s e of O dds
OR
?
?
??
?
Data collection and analysis
Control
Exposed
Control
Unexposed
Total
Case exposed a b a+b
Case unexposed c d c+d
Total a+c b+d a+b+c+d
Case-control pairs that share the same exposure
status do not contribute to the estimate of risk.
Data collection and analysis
Matched analysis
Matched analysis
cb
cb
?
?
?
2
2 )(?
)χ1,96 /(1
2
OROR 9 5 % C I,
cbOR
?
?
?
Data collection and analysis
Analytic Strategy
? Assess relationship/association between
? Exposure and independent variables
? Case/Control status and independent variables
? Calculate crude,or unadjusted,OR for
exposure - case association
? Matched analysis required for matched studies
Data collection and analysis
Analytic Strategy
? Stratified analysis
? Calculate stratum-specific ORs for exposure-case
relationship
? Determine presence of confounding and interaction
? Logistic regression analysis
? Regression technique used to adjust for confounding
and interaction
? Special logistic model applied in matched studies
Data collection and analysis
Bias
? Introduction
? Selection bias
? Information bias
? Confounding
EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
Introduction
? Case-control studies are subject to bias and
confounding,both will distort the results of the
study
? Bias is defined as the deviation of results,or
inferences,from the truth,or processes leading to
such deviation,
? There are about 75 different types of bias now
identified in published case-control studies
Bias
Selection Bias
? Features
? Types
Bias
Features (1)
? Selection bias reflects systematic errors that arise
from the way in which subjects are selected.
? If the prior exposure of the cases studied differs
from that of all cases arising from the source
population — or if prior exposure of controls
differs from that of persons in the source
population without the disease or interest —
selection bias may be present.
Bias
Features (2)
? Preferential diagnosis of exposed cases may
lead to selection bias.
? Low participation may lead to selection bias.
? Errors in sampling controls from the source
population can also create selection bias.
Bias
Types
? Admission rate bias
? Prevalence-incidence bias
? Detection signal bias
? Time effect bias
Bias
Information Bias
? A distortion in measuring exposure or
outcome data that results in different quality
(i.e.,accuracy or reliability) or frequency of
information between comparison groups.
? Recall bias
? Confoumding bias
Bias
Confounding Bias
? Confounding is a distortion of results that occurs when
the apparent effects of the exposure of interest are
attributable entirely or in part to the effects of an
extraneous variable.
? Criteria for confounding
? Factor is associated with exposure
? Factor is associated with disease in the absence of exposure
? Factor is not in the causal path between exposure and outcome
Bias
Strengths and Weaknesses
Strengths
? Rare disease
? Long latency between exposure
and disease
? Explore multiple hypotheses
? Inexpensive EP IDE M IO LO G Y AN D
H EA LTH ST ATI ST ICS
19 57
卫
Weaknesses
? Prone to bias
? Temporal relationships cannot be
established
? Inefficient for rare exposures,unless
exposure often lead to disease
Strengths and Weaknesses
Several important features
? The study provides an efficient means to study
rare diseases.Case-control studies tend to be more
feasible than other studies,
? Case-control studies allow researchers to
investigate several risk factors.
? A single case-control investigation does not
“prove” causality,but it can provide suggestive
evidence of a causal relationship that warrants
intervention by public health officials to reduce
exposure to the implicated risk factor,EP IDE M IO LO G Y AN D H EA LTH ST ATI ST ICS
19 57
卫