Chapter 4 The Chemical Composition of
Proteins
Homework II (cont’d),Ch,6 problems 6,7,8
(optionally 10) and Ch,7 problems 3,6,7,9,10
(optionally 12)
a useful site,www.cbi.pku.edu.cn
Use blast search in database swissprot
PDB is available on this site,Rasmol is available
for Win98 if you don’t want to download
Chemscape.
1,Proteins are macromolecules and variable in
sizes
1.1 There is no simple generalization between size and
function,Small ones can be less than 10 kD (insulin ~6
kD); big ones more than 1000 kD.
1.2 Proteins can be monomeric or oligomeric,
Monomeric proteins contain only one covalent structure
(either a single polypeptide chain or more chains
connected by covalent bonds),Oligomeric proteins
contain more than one covalent structures interacting
by noncovalent interactions,Each covalent structure in
an oligomeric protein is called a subunit (hence
multisubunit proteins).
2,Proteins have characteristic amino acid
compositions
2.1 Proteins can be hydrolyzed in bases or acids to free
a-amino acids,They are usually hydrolyzed in 6 M
HCl at 110oC for 24 hours.
2.2 The resulting characteristic proportion of different
amino acids,namely,the amino acid composition was
used to distinguish different proteins before the days
of protein sequencing,
Various proteins are currently distinguished by
their specific amino acid sequences (by 3-D structures
in the future?)
3,Some proteins contain chemical groups other
than amino acids
3.1 Many proteins contain only amino acids,e.g.,
insulin,ribonuclease A,chymotrypsin.
3.2 Some proteins (conjugated proteins) contain
other components.
3.2.1 Cytochrome c and myoglobin
contain heme groups,Immunoglobulin G
contains carbohydrate groups.
3.2.2 The non-amino acid parts are usually
called prosthetic groups and the protein part alone
called apoprotein,
(holoenzyme=apoenzyme+substrates+cofactors+prosth
etic groups)
3.2.3 Prosthetic groups usually play important
roles for protein functions.
3.2.4 The conjugated proteins are usually
classified according to the nature of their prosthetic
groups.
Lipoproteins,lipids
Glycoproteins,carbohydrate groups
Metalloproteins,different metals (ions).
4,The amino acid sequence of short
polypeptide chains can be determined by
chemical methods.
4.1 Amino acid sequence of a peptide chain is
the identity and linking order of its amino acid
residues,No other properties so clearly
distinguish one peptide from another.
4.2 Sanger worked out the first amino acid
sequence of a peptide (bovine insulin) in 1953.
He accomplished this by using 1-fluoro-2,4-
dinitrobenzene(1- 氟 -2,4-硝基苯) to react
with the N-terminal residues of cleaved short
peptides,100 g of insulin were consumed over
ten years to determine the sequence,The
peptide chains were cut into 150 fragments of
different lengths,He was awarded the Nobel
Prize in 1958 in chemistry for this breakthrough
invention.
4.3 The amino acid sequence of a short peptide can be
efficiently determined by Edman degradation(埃德曼
降解),
4.3.1 The uncharged terminal amino group is
reacted with phenylisothiocyanate(苯异硫氰酸盐,异
硫氰酸 )to form a phenylthiocarbamyl(苯氨基硫代甲
酰基 ) peptide.
4.3.2 The N-terminal amino acid residue is
liberated as a cyclic phenylthiohydantoin (PTH)
derivative under mildly acid conditions,leaving the
rest of the peptide chain intact.
4.3.3 The PTH derivative (thus the amino acid
residue) can be identified by chromatographic
methods
4.3.4 The newly exposed N-terminal amino acid
residue can be identified by repeating the above
procedure.
4.4 The N-terminal amino acid sequence of a polypeptide
chain can be easily obtained by using a fully automated
sequenator.
4.4.1 The machine is designed based on the Edman
degradation method,
4.4.2 The peptide is covalently linked to glass
beads through its carboxyl terminals,One cycle of the
Edman degradation is carried out in less than 2 hours.
4.4.3 Usually 50 (10-20) residues from the N-
terminal can be routinely determined by the sequenator.
4.4.4 Less than a microgram (or picomoles?) of the
peptide is needed for such sequence determination.
5,Large proteins are cleaved into short peptides and
then sequenced (the,divide and conquer” strategy!)
5.1 Disulfide bonds(二硫键),if exist,need to be
broken first.
5.1.1 The PTH-cysteines would not be released if
connected by disulfide bonds.
5.1.2 The disulfide bonds can be reduced by
dithiothreitol (DTT,二硫苏糖醇 ) or b-
mercaptoethanol(巯基乙醇 ),and then alkylated with
iodoacetate to prevent reformation of the disulfide
bonds,Addition of iodoacetate in SDS-PAGE can
prevent cross-linking by disulfide bonds between
subunits.
5.2 Polypeptide chains are cleaved into short fragments
by chemical or enzymatic methods and then sequenced
by Edman method.
5.2.1 Cyanogen bromide (CNBr) cleaves
polypeptides on the carboxyl side of methionine residues.
5.2.2 A set of proteases cleave peptide chains
adjacent to specific amino acid residues,Trypsin,
specifically on the carboxyl side of Arg(精氨酸) and
Lys(赖氨酸) ; Chymotrypsin(胰凝乳蛋白酶,糜蛋白
酶),the carboxyl side of Phe(苯丙氨酸),Tyr(酪氨
酸),and Trp(色氨酸), (terminals?)
5.2.3 The fragments thus produced need to be
separated (purified) by chromatographic(色谱分析的)
or electrophoretic(电泳的) methods before they can
be sequenced.
5.3 The polypeptide has to be cleaved by at least
two sets of reagents to get the order of the short
peptides on the polypeptide.
5.3.1 A second set of short peptides
overlapping(重叠) the first set is needed to put
the first set in the correct order.
5.3.2 If the second set fails to provide
appropriate overlapping sequences a third or
even further cleavage is needed.
5.4 The positions of disulfide bonds,if existing,
need to be located.
5.4.1 This can be accomplished by
comparing patterns of peptide fragments on
electrophoresis gels with and without breaking
the disulfide bonds.
6,The amino acid sequences of many proteins are
currently deduced from their genes or cDNA(互
补 DNA) sequences
6.1 The amino acid sequence of a protein is
encoded by its corresponding gene,Every
three bases consist of a genetic code,which is
translated into a specific amino acid on the
polypeptide chain.
6.2 Genes encoding specific proteins are routinely
isolated (cloned) in the laboratory of molecular biology.
6.2.1 Sequencing DNA is much easier (faster and
more accurate) than sequencing a polypeptide,Genome
projects and databases.
6.2.2 Amino acid sequences of proteins are mostly
deduced from their DNA sequences nowadays!
6.2.3 The partial amino acid sequence of a
protein can be used for its gene isolation.
6.2.4 Disulfide bonds can not be deduced from
DNA sequences and has to be determined directly.
6.2.5 Proteins can be much more efficiently
studied with their genes available! New techniques for
large scale processing are being developed (mass
spectrum[质谱 ]).
Mass spectrometer measures the ratio of the mass
to the electric charge of a particle
ESI,a method of ionization or charging the
macromolecule
The mass of a particle can be calculated from the
peaks of the spectrum
Tandem MS (MS/MS) can give sequence information
7,The function of a protein depends on its amino
acid sequence
7.1 Each separate type of protein has a unique
amino acid sequence.
7.1.1 Each of the ~3000 different proteins in
an E.coli cell,the ~100,000 ones in a human being
has a different amino acid sequence.
7.1.2 Proteins of different functions always
have different amino acid sequences,Genomic
anotation is assigning functions to genes based on
sequence information.
?
7.2 Many human genetic diseases have been traced to
the deficiency of a single enzyme or protein,The
deficiency of many such enzymes are found to be
caused by a single change of amino acid residue,
indicating protein functions are determined by their
amino acid sequences.
7.3 Proteins that have similar functions but from
different species are found to be very similar in amino
acid sequences,(cytochrome c,myoglobin,etc.)
sequence homology as a basis for phylogenetic trees.
7.4 Many proteins have variations in their amino acid
sequences in the same population (individual
polymorphism) or in different species.
7.4.1 Many such variations in amino acid
sequences,especially in the same population,mostly do
not affect protein functions.
7.4.2 This phenomenon of amino acid variation in
the same population is called polymorphism and widely
used in places where identification of individuals are
needed (in the court,by examining DNA molecules).
7.4.3 Proteins of the same (or similar) functions
vary in their amino acid sequences,in proportion to
their evolutionary distance or relationship (more
different,less similar,farther in relationship).
Conservation and variation of cytochrome c sequences
27 invariant residues (yellow),conservative substitutions (blue)
nonconservative or variable residues (unshaded)
8,The amino acid sequences of proteins provide
important biochemical information (application
of bioinformatic tools).
8.1 A newly revealed amino acid sequence (of
well studied or unknown proteins) is usually
compared with a large bank of stored
sequences.
8.1.1 Thousands of sequences have been
revealed and stored in computerized databases
(e.g.,Swiss-Prot,PIR,TreEMBL).
8.1.2 Sequence similarity usually reveals
functional relatedness.
8.2 Homologous proteins share a common ancestor
during evolution,hence are of the same function.
8.2.1 Homology describes the percentage of
similarity (or identity) between two proteins (or
DNA molecules),usually referring to sequence
rather than structure.
8.2.2 Amino acid residues between two
proteins at certain positions (counting from the N-
terminal as 1) can be identical,conservative
(similar),or variable.
8.2.3 The amino acid sequences of two or
more homologous proteins can be compared by
sequence alignment.
8.2.4 Evolution can be studied quantitatively
at the molecular level,phylogenetic trees are made
by using the number of residues that differ.
8.2.5 Highly conserved residues usually play
important roles in protein structure and/or function,
(need to be further exploited,e.g,How does it
determine the 3D structure?).
8.2.6 Homologous proteins share the same
three-dimensional structure,(sequence-structure-
function paradigm,unless the structure is flexible
and non-unique).
8.3 Certain amino acid sequences often serve as signals
(are thus called signal peptides) that determine the
cellular location,chemical modification,and half-life
of proteins.
8.3.1 Different signal peptides lead proteins to
various locations in the cell.
8.3.2 Such signal motifs are being identified.
8.4 There are well-conserved short sequences in
proteins with specific functions (for binding,
modification,regulation),most of them are expected to
have a loop (relative flexible) structure,Identification
of these short sequences with known functional
sequences may shed light on the functions of an
unknown protein,Database of sequences with known
functions exist (PROSITE in Expasy) (e.g.,ATP
binding loop).
8.5 The amino acid sequence of a protein
encodes its three-dimensional structure and
activity (function) in turn,This is the
sequence-structure-function paradigm.
The,folding” codes are waiting to be
deciphered!
Proteins
Homework II (cont’d),Ch,6 problems 6,7,8
(optionally 10) and Ch,7 problems 3,6,7,9,10
(optionally 12)
a useful site,www.cbi.pku.edu.cn
Use blast search in database swissprot
PDB is available on this site,Rasmol is available
for Win98 if you don’t want to download
Chemscape.
1,Proteins are macromolecules and variable in
sizes
1.1 There is no simple generalization between size and
function,Small ones can be less than 10 kD (insulin ~6
kD); big ones more than 1000 kD.
1.2 Proteins can be monomeric or oligomeric,
Monomeric proteins contain only one covalent structure
(either a single polypeptide chain or more chains
connected by covalent bonds),Oligomeric proteins
contain more than one covalent structures interacting
by noncovalent interactions,Each covalent structure in
an oligomeric protein is called a subunit (hence
multisubunit proteins).
2,Proteins have characteristic amino acid
compositions
2.1 Proteins can be hydrolyzed in bases or acids to free
a-amino acids,They are usually hydrolyzed in 6 M
HCl at 110oC for 24 hours.
2.2 The resulting characteristic proportion of different
amino acids,namely,the amino acid composition was
used to distinguish different proteins before the days
of protein sequencing,
Various proteins are currently distinguished by
their specific amino acid sequences (by 3-D structures
in the future?)
3,Some proteins contain chemical groups other
than amino acids
3.1 Many proteins contain only amino acids,e.g.,
insulin,ribonuclease A,chymotrypsin.
3.2 Some proteins (conjugated proteins) contain
other components.
3.2.1 Cytochrome c and myoglobin
contain heme groups,Immunoglobulin G
contains carbohydrate groups.
3.2.2 The non-amino acid parts are usually
called prosthetic groups and the protein part alone
called apoprotein,
(holoenzyme=apoenzyme+substrates+cofactors+prosth
etic groups)
3.2.3 Prosthetic groups usually play important
roles for protein functions.
3.2.4 The conjugated proteins are usually
classified according to the nature of their prosthetic
groups.
Lipoproteins,lipids
Glycoproteins,carbohydrate groups
Metalloproteins,different metals (ions).
4,The amino acid sequence of short
polypeptide chains can be determined by
chemical methods.
4.1 Amino acid sequence of a peptide chain is
the identity and linking order of its amino acid
residues,No other properties so clearly
distinguish one peptide from another.
4.2 Sanger worked out the first amino acid
sequence of a peptide (bovine insulin) in 1953.
He accomplished this by using 1-fluoro-2,4-
dinitrobenzene(1- 氟 -2,4-硝基苯) to react
with the N-terminal residues of cleaved short
peptides,100 g of insulin were consumed over
ten years to determine the sequence,The
peptide chains were cut into 150 fragments of
different lengths,He was awarded the Nobel
Prize in 1958 in chemistry for this breakthrough
invention.
4.3 The amino acid sequence of a short peptide can be
efficiently determined by Edman degradation(埃德曼
降解),
4.3.1 The uncharged terminal amino group is
reacted with phenylisothiocyanate(苯异硫氰酸盐,异
硫氰酸 )to form a phenylthiocarbamyl(苯氨基硫代甲
酰基 ) peptide.
4.3.2 The N-terminal amino acid residue is
liberated as a cyclic phenylthiohydantoin (PTH)
derivative under mildly acid conditions,leaving the
rest of the peptide chain intact.
4.3.3 The PTH derivative (thus the amino acid
residue) can be identified by chromatographic
methods
4.3.4 The newly exposed N-terminal amino acid
residue can be identified by repeating the above
procedure.
4.4 The N-terminal amino acid sequence of a polypeptide
chain can be easily obtained by using a fully automated
sequenator.
4.4.1 The machine is designed based on the Edman
degradation method,
4.4.2 The peptide is covalently linked to glass
beads through its carboxyl terminals,One cycle of the
Edman degradation is carried out in less than 2 hours.
4.4.3 Usually 50 (10-20) residues from the N-
terminal can be routinely determined by the sequenator.
4.4.4 Less than a microgram (or picomoles?) of the
peptide is needed for such sequence determination.
5,Large proteins are cleaved into short peptides and
then sequenced (the,divide and conquer” strategy!)
5.1 Disulfide bonds(二硫键),if exist,need to be
broken first.
5.1.1 The PTH-cysteines would not be released if
connected by disulfide bonds.
5.1.2 The disulfide bonds can be reduced by
dithiothreitol (DTT,二硫苏糖醇 ) or b-
mercaptoethanol(巯基乙醇 ),and then alkylated with
iodoacetate to prevent reformation of the disulfide
bonds,Addition of iodoacetate in SDS-PAGE can
prevent cross-linking by disulfide bonds between
subunits.
5.2 Polypeptide chains are cleaved into short fragments
by chemical or enzymatic methods and then sequenced
by Edman method.
5.2.1 Cyanogen bromide (CNBr) cleaves
polypeptides on the carboxyl side of methionine residues.
5.2.2 A set of proteases cleave peptide chains
adjacent to specific amino acid residues,Trypsin,
specifically on the carboxyl side of Arg(精氨酸) and
Lys(赖氨酸) ; Chymotrypsin(胰凝乳蛋白酶,糜蛋白
酶),the carboxyl side of Phe(苯丙氨酸),Tyr(酪氨
酸),and Trp(色氨酸), (terminals?)
5.2.3 The fragments thus produced need to be
separated (purified) by chromatographic(色谱分析的)
or electrophoretic(电泳的) methods before they can
be sequenced.
5.3 The polypeptide has to be cleaved by at least
two sets of reagents to get the order of the short
peptides on the polypeptide.
5.3.1 A second set of short peptides
overlapping(重叠) the first set is needed to put
the first set in the correct order.
5.3.2 If the second set fails to provide
appropriate overlapping sequences a third or
even further cleavage is needed.
5.4 The positions of disulfide bonds,if existing,
need to be located.
5.4.1 This can be accomplished by
comparing patterns of peptide fragments on
electrophoresis gels with and without breaking
the disulfide bonds.
6,The amino acid sequences of many proteins are
currently deduced from their genes or cDNA(互
补 DNA) sequences
6.1 The amino acid sequence of a protein is
encoded by its corresponding gene,Every
three bases consist of a genetic code,which is
translated into a specific amino acid on the
polypeptide chain.
6.2 Genes encoding specific proteins are routinely
isolated (cloned) in the laboratory of molecular biology.
6.2.1 Sequencing DNA is much easier (faster and
more accurate) than sequencing a polypeptide,Genome
projects and databases.
6.2.2 Amino acid sequences of proteins are mostly
deduced from their DNA sequences nowadays!
6.2.3 The partial amino acid sequence of a
protein can be used for its gene isolation.
6.2.4 Disulfide bonds can not be deduced from
DNA sequences and has to be determined directly.
6.2.5 Proteins can be much more efficiently
studied with their genes available! New techniques for
large scale processing are being developed (mass
spectrum[质谱 ]).
Mass spectrometer measures the ratio of the mass
to the electric charge of a particle
ESI,a method of ionization or charging the
macromolecule
The mass of a particle can be calculated from the
peaks of the spectrum
Tandem MS (MS/MS) can give sequence information
7,The function of a protein depends on its amino
acid sequence
7.1 Each separate type of protein has a unique
amino acid sequence.
7.1.1 Each of the ~3000 different proteins in
an E.coli cell,the ~100,000 ones in a human being
has a different amino acid sequence.
7.1.2 Proteins of different functions always
have different amino acid sequences,Genomic
anotation is assigning functions to genes based on
sequence information.
?
7.2 Many human genetic diseases have been traced to
the deficiency of a single enzyme or protein,The
deficiency of many such enzymes are found to be
caused by a single change of amino acid residue,
indicating protein functions are determined by their
amino acid sequences.
7.3 Proteins that have similar functions but from
different species are found to be very similar in amino
acid sequences,(cytochrome c,myoglobin,etc.)
sequence homology as a basis for phylogenetic trees.
7.4 Many proteins have variations in their amino acid
sequences in the same population (individual
polymorphism) or in different species.
7.4.1 Many such variations in amino acid
sequences,especially in the same population,mostly do
not affect protein functions.
7.4.2 This phenomenon of amino acid variation in
the same population is called polymorphism and widely
used in places where identification of individuals are
needed (in the court,by examining DNA molecules).
7.4.3 Proteins of the same (or similar) functions
vary in their amino acid sequences,in proportion to
their evolutionary distance or relationship (more
different,less similar,farther in relationship).
Conservation and variation of cytochrome c sequences
27 invariant residues (yellow),conservative substitutions (blue)
nonconservative or variable residues (unshaded)
8,The amino acid sequences of proteins provide
important biochemical information (application
of bioinformatic tools).
8.1 A newly revealed amino acid sequence (of
well studied or unknown proteins) is usually
compared with a large bank of stored
sequences.
8.1.1 Thousands of sequences have been
revealed and stored in computerized databases
(e.g.,Swiss-Prot,PIR,TreEMBL).
8.1.2 Sequence similarity usually reveals
functional relatedness.
8.2 Homologous proteins share a common ancestor
during evolution,hence are of the same function.
8.2.1 Homology describes the percentage of
similarity (or identity) between two proteins (or
DNA molecules),usually referring to sequence
rather than structure.
8.2.2 Amino acid residues between two
proteins at certain positions (counting from the N-
terminal as 1) can be identical,conservative
(similar),or variable.
8.2.3 The amino acid sequences of two or
more homologous proteins can be compared by
sequence alignment.
8.2.4 Evolution can be studied quantitatively
at the molecular level,phylogenetic trees are made
by using the number of residues that differ.
8.2.5 Highly conserved residues usually play
important roles in protein structure and/or function,
(need to be further exploited,e.g,How does it
determine the 3D structure?).
8.2.6 Homologous proteins share the same
three-dimensional structure,(sequence-structure-
function paradigm,unless the structure is flexible
and non-unique).
8.3 Certain amino acid sequences often serve as signals
(are thus called signal peptides) that determine the
cellular location,chemical modification,and half-life
of proteins.
8.3.1 Different signal peptides lead proteins to
various locations in the cell.
8.3.2 Such signal motifs are being identified.
8.4 There are well-conserved short sequences in
proteins with specific functions (for binding,
modification,regulation),most of them are expected to
have a loop (relative flexible) structure,Identification
of these short sequences with known functional
sequences may shed light on the functions of an
unknown protein,Database of sequences with known
functions exist (PROSITE in Expasy) (e.g.,ATP
binding loop).
8.5 The amino acid sequence of a protein
encodes its three-dimensional structure and
activity (function) in turn,This is the
sequence-structure-function paradigm.
The,folding” codes are waiting to be
deciphered!