Lecture 4
From the last lecture, we followed gene segregation in a cross of a true breeding shibire
fly with a wild type fly.
Shibire x wild type
↓
F
1
: all not paralyzed
↓
F
2
: 3 not paralyzed : 1 paralyzed
This is the segregation pattern expected for a single gene. But in an actual experiment
how do we know that the phenotypic ratio is really 3 : 1 ?
There is no logical way to prove that we have a 3 :1 ratio. Nevertheless, we can think of
an alternative hypothesis then show that the alternative hypothesis does not fit the
data. Usually, we then adopt the simplest hypothesis that still fits the data.
A possible alternative hypothesis is that recessive mutations in two different genes are
needed to get a paralyzed fly.
In this case a true breeding paralyzed fly would have genotype:
a
/
a
,
b
/
b
Whereas wild type would have genotype:
A
/
A
,
B
/
B
F
1
: A/a B/b not paralyzed
F
2
: p(
a
/
a
and
b
/
b
) = (
1/
4
)
2
=
1
/
16
p(
a
/
a
and
B
/–) =
1
/
4
x
3
/
4
=
3
/
16
p(
b
/
b
and
A
/–) =
3
/
16
p(
A
/– and
B
/–) = the rest =
9
/
16
This is the classic ratio for two gene segregation 9 : 3 : 3 : 1
paralyzed
For our hypothesis we should see a phenotypic ratio of 15 not paralyzed : 1 paralyzed.
Therefore, to distinguish one-gene segregation from two-gene segregation we need a
statistical test to distinguish 3 : 1 from 15 : 1. Intuitively, we know that in order to get
statistical significance, we need to look at a sufficient number of individuals.
For a chi-square test you start with a specific hypothesis that gives a precise
expectation. The test is then applied to the actual experimental results and will give the
probability of obtaining the results under the hypothesis. The test is useful for ruling
out hypotheses that would be very unlikely to give the actual results.
Say we look at 16 flies in the F
2
and observe 14 not paralyzed and 2 paralyzed flies.
Under the hypothesis of two genes we expect 15 not paralyzed flies and 1 paralyzed fly.
We calculate the value χ2 using the formula below. Where O is the number of individuals
observed in each class and E is the number of individuals expected for each class.
1
2
Σ
(O–E)
2
1
2
χ2 =
= +
= 0.067 + 1 = 1.067
E 15 1
(all classes)
degrees of freedom (df) = number of classes – 1
From the table using 1 df, 0.05 < p < 0.5
The convention we use is that p ≤ 0.05 constitutes a deviation from expectation that is
significant enough to reject the hypothesis. Therefore, on the basis of this sample of 16
flies we can’t rule out the hypothesis that two genes are required.
Say we look at 64 F
2
flies and find that 12 are paralyzed. For the hypothesis of two
genes the expectation is that 4 would be paralyzed. The χ2 for this data:
8
2
8
2
χ2 =
+
= 1.07 + 16 = 17.1
60 4
From the table p < 0.005 so we reject the two-gene hypothesis.
Let’s use this data to test the hypothesis of one gene segregation which would be
expected to give 16 paralyzed flies from 64 F
2
flies,
4
2
4
2
χ2 = + = 0.33 + 1 = 1.33
48 16
From the table using 1 df, 0.5 < p < 0.5. Thus the data still fits the hypothesis of one-
gene segregation.
So far, the hypothesis that one gene is responsible for the paralyzed trait is the simplest
explanation that fits the data.
The way to distinguish most easily between a heterozygote and a homozygote expressing
a dominant trait is to cross to a homozygous recessive test strain.
Test cross: cross to homozygote recessive:
A
/
AA
x
a
/
aa
gives all
A
/
aa
. i.e. all offspring will express the dominant trait.
A
/
aa
x
a
/
aa
gives
1
/
2
A
/
aa
and
1
/
2
a
/
a
. i.e. one half of the offspring will express
the dominant trait.
Mendelian inheritance in humans
For humans we can’t do test crosses, of course, but by following inheritance of a trait for
several generations the modes of inheritance can usually be identified by applying basic
principles of Mendel. The following are guidelines for identifying different modes of
inheritance in pedigrees.
Autosomal dominant
i) Affected individuals must have at least one affected parent
Exceptions to this rule will occur if a new mutation arises in one of the parents (in
real life a more likely explanation is extramarital paternity). Another possibility is
incomplete penetrance, where other genetic or environmental factors prevent the
trait from being expressed in one of the parents.
Autosomal recessive
i) When both parents are carriers, on average
1
/
4
of the children will be affected.
ii) When both parents are affected, then all of the children will be affected.
iii) If the trait is very rare then consanguinity is likely. That is, it is likely that
parents of affected children are themselves related (e.g. cousins).
X-linked inheritance
O X
c
X
+
x O X
+
Y
(carrier)
↓
X
+
Y
(carrier) (color blind)
O
X
c
X
+
,
O
X
+
X
+
,
O
X
c
Y,
O
i) When parents are a carrier
O
and an unaffected
O
, then on average, 1/2 of the
daughters will be carriers and 1/2 of the sons will be affected.
If the trait is rare then the vast majority of affected individuals will be male
which is the hallmark of X-linked traits.
ii) Affected sons inherit the allele from mother
? Maternal uncles often affected
? Since inherited only from mother, inbreeding doesn’t increase the
probability of an affected
O
.
Conditional probabilities
Consider the following pedigree of a recessive trait.
= female
= male
?
p(affected child) = p(mother carrier and father carrier and affected child)
=
2
/
3
x
2
/
3
x
1
/
4
=
1
/
9
However, if they have a child that is affected we must reassess the probability that
their next child will be affected.
p(both parents carriers) = 1. So, p(next child affected) =
1
/
4
This example shows how probability calculations are based on information. The
probability changes not because the parents have changed but because our information
about them has.
HANDBOOK for PROBABILITY CALCULATIONS
Many problems in diploid genetics rely on basic concepts of probability. This is because each individual
inherits at random only one of two possible copies of a gene from each parent. Thus, breeding experi-
ments or inheritance in human pedigrees have probabilistic rather than absolute outcomes. Everyone
has an intuitive sense of probability but what we need is a precise definition that will allow probabilities
to be manipulated quantitatively.
Probabilities are usually defined in terms of possible outcomes of a trial. A trial could be the toss of a
coin, the roll of a die, or two parents having a child. If we define a specific event a, p(a) or the probabil-
ity of a, can be defined as follows: after a very large number of trials, p(a) is simply the fraction of trials
that give outcome a. In principle, we could determine p(a) by actually performing a large number of
trials and directly measuring the fraction of trials that produce event a. This is sometimes called the
“Monte Carlo method” named after a famous European casino and works well for computer simulations
of complicated phenomena. However, in many cases there is a much simpler way to calculate probabili-
ties. To directly calculate classical probabilities one must know enough about a process to break down
the possible outcomes of a trial into some number of equally probable events. In these cases the prob-
ability of event a is:
p(a)=
n
a
N
where n
a
is the number of outcomes that satisfy the criteria for a and N is the total number of equally
probable outcomes. Note that since N includes all possible outcomes, n
a
≤ N and 0 ≤ p(a) ≤ 1.
Example: A couple has two children, what is the probability that they are both girls? Assuming that the
chances of having a boy or a girl are equal, there are 4 equally probable ways of having two children
(boy, boy; girl, boy; boy, girl; girl, girl) and the probability of two girls is 1/4 or 0.25.
For classical probability problems you will always be able to arrive at the correct answer by writing out
all of the possible outcomes of a trial and counting the fraction of outcomes that satisfy the criteria for a
given event. Often, enumerating all of the outcomes for a trial is time-consuming and error-prone. It is
usually faster and easier to break a problem down into simple parts and then to combine the probabilities
for the individual parts. The following are useful ways that probabilities can be combined to speed
probability calculations.
PRODUCT RULE
p(a and b) = p(a) x p(b) if a and b are independent.
Two events are considered independent if they do not influence one another. The criterion of indepen-
dence is very important — application of the product rule for events that are not independent will give
an incorrect answer.
Examples: To find the probability that a couple with three children have three boys we first note that the
sex of one child has no influence on the sex of another and therefore constitute independent events. For
each child, p(boy ) = 1/2 and by the product rule p(3 boys) = 1/2 x 1/2 x 1/2 = 1/8.
First, for a recessive trait to be expressed the progeny must inherit the recessive allele from both the
mother and the father. Since the probability of inheriting a given allele from a heterozygote is 1/2,
p(mutant from mother and mutant from father) = 1/2 x 1/2 = 1/4. Second, since unlinked genes are
inherited independently, we can use the product rule again to calculate p(recessives at gene A and reces-
sives at gene B) = 1/4 x 1/4 = 1/16.
SUM RULE
The probability that either a or b will occur can be written as p(a or b). If two events a and b cannot
both occur they are mutually exclusive and the number of events that satisfy a or b is n
a
+ n
b.
It should
be apparent from our definition of probability that:
n + n
a b
p(a or b) =
n +n
=
a b
= p(a) + p(b)
N N N
A useful special case of the sum rule arises when we consider p(not a). By definition p(a) and p(not a)
are mutually exclusive and they encompass all possible outcomes. Thus:
p(a or not a) = 1 = p(a) + p(not a) and p(not a) = 1 – p(a)
Examples: Find the probability that a family with three children has at least one girl. We begin by
noting that instead of trying to count all possible families with at least one girl it is easier to realize that
p(at least one girl) is the same as p( not all boys). Since p(all boys) = 1/8, p(not all boys) = 1– 1/8 = 7/8
= p(at least one girl).
In a cross where both parents are heterozygous for recessive mutations in two unlinked genes, what is
the probability that one of their progeny will express at least one of the dominant traits? p(at least one
dominant) = 1 – p(both recessive), and from above, p(both recessive) = 1/16. Therefore p(at least one
dominant) = 1 – 1/16 = 15/16.
In cases where two events a and b are independent but not mutually exclusive, we can still calculate
p(a or b). In this case we note that the two events a and (b and not a) are mutually exclusive and
encompass all outcomes that satisfy a or b or both. For these mutually exclusive events we can apply
the sum rule. Thus,
p(a or b) = p(a or [b and not a]) = p(a) + p(b and not a)
Since b and not a are independent:
p(a) + p(b and not a) = p(a) + p(b) x p(not a) = p(a) + p(b) x [1 – p(a)] =
p(a) + p(b) – [p(a) x p(b)]
Note that in the case where a and b are mutually exclusive, p(a) x p(b) = 0 giving the same formula as
for the sum rule.
Example: We can use this formula as another way to solve the last example, which is a case in which the
two events are independent but not mutually exclusive. p(at least one dominant) = p(dominant at gene A
or dominant at gene B) = p(dominant at gene A)+p(dominant at gene B) – [p(dominant at gene A) x
p(dominant at gene B )] = 3/4 + 3/4 – [3/4 x 3/4] =6/4 – 9/16 = 15/16.