ASICs...THE COURSE (1 WEEK)
1
CMOS LOGIC
? CMOS transistor (or device)
? A transistor has three terminals: gate, source, drain (and a fourth that we ignore for a
moment)
? An MOS transistor looks like a switch (conducting/on, nonconducting/off, not open or
closed)
Key concepts: The use of transistors as switches ? The difference between a flip-flop and a
latch ? Setup time and hold time ? Pipelines and latency ? The difference between datapath,
standard-cell, and gate-array logic cells ? Strong and weak logic levels ? Pushing bubbles ?
Ratio of logic ? Resistance per square of layers and their relative values in CMOS ? Design
rules and λ
CMOS transistors viewed as switches ? a CMOS inverter
gate drainsource
'1' =
'0' =
n-channel transistor gate
drainsource
'1' =
'0' =
p-channel transistor
'1' =
'1' =
'0' =
'0' = VDDVDD'0' '1''1'
'0' GND orVSS '0'
'1'
'0' '1'
=
VDDA F A F
(a) (c)(b)
off
onoff
on GND orVSS
GND orVSS
2
2 SECTION 2 CMOS LOGIC ASICS... THE COURSE
CMOS logic ? a two-input NAND gate ? a two-input NOR gate ? Good '1's ? Good '0's
offoff
0 1AB
1 01 101
F=NAND(A, B)
VDD
off off
F =1B=0
A=0 onon VDD
off on
F =0B=0
A=1 offon
B=1
VDD
A=1
off off
onon F=0B=0
VDD
A=1
on off
offon F=1B=1
VDD
A=0
off on
onoff F=1
VDD
on off
F =0B=1
A=0 onoff VDD
on on
F =0B=1
A=1 0 1AB
0 01 001
F=NOR(A, B)
p-channeln-channel
p-channeln-channel
(a)
(b)
F=1B=0
VDD
A=0
on on
offoff
ASICs... THE COURSE 2.1 CMOS Transistors 3
2.1 CMOS Transistors
? Channel charge = Q (imagine taking a picture and counting the electrons)
? tf is time of flight or transit time
? μn is the electron mobility (μp is the hole mobility)
? E is the electric field (units Vm–1)
An n-channel transistor ? channel ? source ? drain ? depletion region ? gate ? bulk
current (amperes) = charge (coulombs) per unit time (second)
The drain-to-source current IDSn = Q/tf
The (vector) velocity of the electrons v = –μnE
L L2
tf = ––– = –––––––
vx μnVDS
GND orVSS
+VDSLWVGS
bulksource drainTox
Exelectrons
++ VDSbulk
draingate
sourceVGS+
mobile channel charge depletionregionp-type
n-type n-typegate
fixed depletion charge
4 SECTION 2 CMOS LOGIC ASICS... THE COURSE
? The linear region (triode region) extends until VDS=VGS–Vtn
? VDS=VGS–Vtn=VDS(sat) (saturation voltage)
? VDS>VGS–Vtn (the saturation region, or pentode region, of operation)
? saturation current, IDSn(sat)
Q = C(VGC – Vtn) = C [ (VGS – Vtn) – 0.5 VDS ] = WLCox [ (VGS – Vtn) – 0.5 VDS ]
IDSn = Q/tf
= (W/L)μnCox[ (VGS – Vtn) – 0.5 VDS ]VDS = (W/L)k'n [ (VGS – Vtn) – 0.5 VDS ]VDS
k'n = μnCox is the process transconductance parameter (or intrinsic transconductance)
βn = k'n(W/L) is the transistor gain factor (or just gain factor)
IDSn(sat) = (βn/2)(VGS – Vtn)2 ; VGS > Vtn
ASICs... THE COURSE 2.1 CMOS Transistors 5
2.1.1 P-Channel Transistors
? Vtp is negative
? VDS and VGS are normally negative (and –3V<–2V)
(a) (b)
MOS n-channel transistor characteristics
(c)
IDSp = –k'p(W/L)[ (VGS – Vtp) – 0.5 VDS ]VDS ; VDS > VGS – Vtp
IDSp(sat) = –βp/2 (VGS – Vtp)2 ; VDS < VGS – Vtp .
0
1
2
3
0 1 2 3
n-ch. W/L=6/0.6
VDS /V
IDS /mA
n-ch. W/L=60/6 VGS /V
2.52.0
1.51.0
3.0
0.5, 0.0
1
0 1
2 3
012
30
3 n-ch.W/L=6/0.6IDS /mA
VDS /V
VGS /V 1
2
2
3 VDS =3.0 V
0
1
0 1 2 3
2
n-ch. W/L=6/0.6
n-ch. W/L =60/6
VGS /V
IDS(sat) /mA
IDS (sat) ∝ (VGS –V tn)2
IDS (sat) ∝ VGS –V tn
6 SECTION 2 CMOS LOGIC ASICS... THE COURSE
2.1.2 Velocity Saturation
? vmaxn=105ms–1
? velocity saturation
? tf =Leff/vmaxn
? mobility degradation
2.1.3 SPICE Models
? KP (in μAV–2) = k'n (k'p)
? VT0 and TOX = Vtn (Vtp) and Tox
? U0 (in cm2V–1s–1) = μn (and μp)
IDSn(sat) = WvmaxnCox (VGS – Vtn) ; VDS > VDS(sat) (velocity saturated).
SPICE parameters
.MODEL CMOSN NMOS LEVEL=3 PHI=0.7 TOX=10E-09 XJ=0.2U TPG=1 VTO=0.65
DELTA=0.7
+ LD=5E-08 KP=2E-04 UO=550 THETA=0.27 RSH=2 GAMMA=0.6 NSUB=1.4E+17
NFS=6E+11
+ VMAX=2E+05 ETA=3.7E-02 KAPPA=2.9E-02 CGDO=3.0E-10 CGSO=3.0E-10
CGBO=4.0E-10
+ CJ=5.6E-04 MJ=0.56 CJSW=5E-11 MJSW=0.52 PB=1
.MODEL CMOSP PMOS LEVEL=3 PHI=0.7 TOX=10E-09 XJ=0.2U TPG=-1 VTO=-
0.92 DELTA=0.29
+ LD=3.5E-08 KP=4.9E-05 UO=135 THETA=0.18 RSH=2 GAMMA=0.47
NSUB=8.5E+16 NFS=6.5E+11
+ VMAX=2.5E+05 ETA=2.45E-02 KAPPA=7.96 CGDO=2.4E-10 CGSO=2.4E-10
CGBO=3.8E-10
+ CJ=9.3E-04 MJ=0.47 CJSW=2.9E-10 MJSW=0.505 PB=1
ASICs... THE COURSE 2.1 CMOS Transistors 7
2.1.4 Logic Levels
CMOS logic levels
? VSS is a strong '0' ? VDD is a strong '1'
? degraded logic levels: VDD–Vtn is a weak '1' ; VSS–Vtp (Vtp is negative) is a weak '0'
'1' VGD >V tnVGS >V tn
'0' '1' → '0'–Q strong '0'
'1''0'VC tweak '0'SD
'0'
'0' VCG strong '1''0' '1'VC SDG
'0' VGD< VtpVGS <V tp
+Q
VDD+ VDD
+'0' →'1'strong '1'
VGD=0 VGS = –Vtp'0' '1'
VDD+
'1' → '0'– Vtpweak '0''0'
'0' → '1''1''0'VC
t
(a) (b)
(c) (d)
'1' → '0'–V tp
gate
n-type
gate
n-type
gate
p-type
'1' →'0'strong '0' '1''0'VC
t
DS
'0'
'1' VCG '1' →'0' weak '1''1' '1'VC DSG '1' '0' → '1'–V tn'0'VC t
'0' →'1'–V tn VGD=0VGS =V tn
VDD
'1' '1'
weak '1' +
no channel charge
gate
p-type
no channel charge
p-typep-typedrain sourcep-typedrain p-typesource
n-typesource n-typedrain n-typesource n-typedrain
8 SECTION 2 CMOS LOGIC ASICS... THE COURSE
2.2 The CMOS Process
The CMOS manufacturing process
Key words: boule ? wafer ? boat ? silicon dioxide ? resist ? mask ? chemical etch ? isotropic ?
plasma etch ? anisotropic ? ion implantation ? implant energy and dose ? polysilicon ? chemical
vapor deposition (CVD) ? sputtering ? photolithography ? submicron and deep-submicron
process ? n-well process ? p-well process ? twin-tub (or twin-well) ? triple-well ? substrate
contacts (well contacts or tub ties) ? active (CAA) ? gate oxide ? field ? field implant or chan-
nel-stop implant ? field oxide (FOX) ? bloat ? dopant ? self-aligned process ? positive resist ?
negative resist ? drain engineering ? LDD process ? lightly doped drain ? LDD diffusion or LDD
implant ? stipple-pattern
1
2 43
6
As+
5
7 8 9 10 11 12
1hour
grow crystal saw
resistspinfurnace
mask
etchresistoxide
wafer
grow oxide
ASICs... THE COURSE 2.2 The CMOS Process 9
Mask/layer name
Derivation
from drawn
layers
Alternative names for mask/layer Mask label
n-well =nwell bulk, substrate, tub, n-tub, moat CWN
p-well =pwell bulk, substrate, tub, p-tub, moat CWP
active =pdiff+ndiff thin oxide, thinox, island, gate oxide CAA
polysilicon =poly poly, gate CPG
n-diffusion
implant =grow(ndiff) ndiff, n-select, nplus, n+ CSN
p-diffusion
implant =grow(pdiff) pdiff, p-select, pplus, p+ CSP
contact =contact contact cut, poly contact, diffusion con-tact CCP and CCA
metal1 =m1 first-level metal CMF
metal2 =m2 second-level metal CMS
via2 =via2 metal2/metal3 via, m2/m3 via CVS
metal3 =m3 third-level metal CMT
glass =glass passivation, overglass, pad COG
10 SECTION 2 CMOS LOGIC ASICS... THE COURSE
(a) nwell (b) pwell (c) ndiff (d) pdiff
(e) poly (f) contact (g) m1 (h) via
(i) m2 (j) cell (k) phantom
The mask layers of a standard cell
ASICs... THE COURSE 2.2 The CMOS Process 11
Active mask
CAA (mask) = ndiff (drawn) ∨ pdiff (drawn)
Implant select masks
CSN (mask) = grow (ndiff (drawn)) and
CSP (mask) = grow (pdiff (drawn))
Source and drain diffusion (on the silicon)
n-diffusion (silicon) = (CAA (mask) ∧ CSN (mask)) ∧ (?CPG (mask)) and
p-diffusion(silicon)=(CAA(mask) ∧ CSP(mask)) ∧ (?CPG(mask))
Source and drain diffusion (on the silicon) in terms of drawn layers
n-diffusion (silicon) = (ndiff (drawn)) ∧ (?poly (drawn)) and
p-diffusion (silicon) = (pdiff (drawn)) ∧ (?poly (drawn))
12 SECTION 2 CMOS LOGIC ASICS... THE COURSE
Drawn layers and stipple patterns
The transistor layers
pdiff polynwell pwell ndiff contact
via1 via2m1 m2 m3 glass(or solid)
(or solid)(or solid)
pdiff
nwell
poly
p-diffusion
polysiliconfield oxide
n-well (or substrate)
gate oxide
(a) (b)
y x
field implant
source/drain diffusionLDD diffusion
2λ
xyz 2λ
ASICs... THE COURSE 2.2 The CMOS Process 13
2.2.1 Sheet Resistance
The interconnect layers
Sheet resistance (1μm ) Sheet resistance (0.35μm)
Layer Sheetresistance Units Layer Sheetresistance Units
n-well 1.15± 0.25 k?/square n-well 1± 0.4 k?/square
poly 3.5± 2.0 ?/square poly 10± 4.0 ?/square
n-diffusion 75± 20 ?/square n-diffusion 3.5± 2.0 ?/square
p-diffusion 140± 40 ?/square p-diffusion 2.5± 1.5 ?/square
m1/2 70± 6 m?/square m1/2/3 60± 6 m?/square
m3 30± 3 m?/square metal4 30± 3 m?/square
Key words: diffusion ? ?/square (ohms per square) ? sheet resistance ? silicide ? self-
aligned silicide (salicide) ? LI, white metal, local interconnect, metal0, or m0 ? m1 or metal1
? diffusion contacts ? polysilicon contacts ? barrier metal ? contact plugs (via plugs) ?
chemical–mechanical polishing (CMP) ? intermetal oxide (IMO) ? interlevel dielectric
(ILD) ? metal vias, cuts, or vias ? stacked vias and stacked contacts ? two-level metal
(2LM) ? 3LM (m3 or metal3) ? via1 ? via2 ? metal pitch ? electromigration ? contact resis-
tance and via resistance
via1
via2m3m2
m1
contact W plug(4000?)
AlCu(3000?)
Pt barrier(200?)m3m2
(a) (b)
TiW
y x xy z
+via1contact+m1+m2
m2+via2 +m3
2λ
14 SECTION 2 CMOS LOGIC ASICS... THE COURSE
2.3 CMOS Design Rules
Scalable CMOS design rules
nwell
pwell
nwell
pwell ndiff
pdiff pdiffndiffndiffpdiff pdiff
nwell
poly
nwell
pwell
p-selectn-selectndiff
poly
poly
nwell poly
metal2m1 polycontactpdiff polyactive
contact
m3
via2 m3 glass
m2
m1
n-selectpdiffp-select
ndiff
pdiff
1. well 2. active 3. poly
4. select 5. polycontact
6. activecontact
7. metal1
9. metal2 15. metal3 10. overglass (microns)
pwell
nwell
hot
poly
ndiff
m1 m2via1
8. via1
m2
m3via2 via1m2
14. via2
m2via1
m1
21
4
3
5
6
7 8
15 10149 m3
0 (1.4) 9 (1.2)
10 (1.1)
0 or 6 (1.3)
3 (2.1)
3(2.2)
5 (2.3)0 or 4(2.5)
0 or 4(2.5)
3 (2.4)3(2.2)
3 (2.1) 5 (2.3) 3 (2.4)
2 (3.2)
2 (3.1)
2 (3.3)
1 (3.5)3 (3.4)
1.5(5.2a)
2 × 2 (5.1a)
2 (5.3a)
1.5 (6.2a)
2 × 2(6.1a)
2 (6.4a)
1.5(6.2a)
3 (8.2) 2 × 2 (8.1)2 (8.5)
2 (8.5) 2 (8.4)1 (8.3)
2 (6.3a)1 (4.3)
2 (4.2)
3 (7.1) 1 (7.3)3(7.2a)
1 (7.4)
3 (4.1)
2(7.2b)
3(14.2)
2 (14.4) 1(14.3)2 ×2 (14.1)3 (9.1)
4(9.2a) 1(9.3)3(9.2b)
6(15.1)4 (15.2)
2 (15.3)
6 (10.3) 30 (10.4)
15(10.5)
100 ×100 (10.1)
ASICs... THE COURSE 2.4 Combinational Logic Cells 15
2.4 Combinational Logic Cells
2.4.1 Pushing Bubbles
2.4.2 Drive Strength
We ratio a cell to adjust its drive strength and make βn=βp to create equal rise and fall
times
Naming of complex CMOS com-
binational logic cells
The AOI family of cells with three index numbers or less
Cell type1 Cells Number of unique cells
Xa1 X21, X31 2
Xa11 X211, X311 2
Xab X22, X33, X32 3
Xab1 X221, X331, X321 3
Xabc X222, X333, X332, X322 4
Total 14
1Xabc: X={AOI, AO, OAI, OA}; a, b, c = {2, 3}; {} means “choose one.”
BCD
E
A Z BCDEA
F
Z
AOI221
AOI221 OAI321
OAI321
(a) (b)
OR AND INVERTORAND INVERT
16 SECTION 2 CMOS LOGIC ASICS... THE COURSE
2.4.3 Transmission Gates
Charge sharing: suppose CBIG=0.2pF and CSMALL =0.02pF, VBIG=0V and VSMALL =5V;
then
Constructing a CMOS logic cell—an AOI221 ? pushing bubbles ? de Morgan’s theorem ?
network duals
CMOS transmission gate (TG, TX gate, pass gate, coupler)
(0.2 × 10–12) (0) + (0.02 × 10–12) (5)
VF =
–––––––––––––––––––––––––––
– = 0.45 V
(0.2 × 10–12) + (0.02 × 10–12)
ZABCDE
VDD
Z
AC
E
BD
E AB CD
ZABCDE
push bubbles to the inputs
OR = parallelAND = series
OR = parallelAND = series1
3 VDD6/1 6/16/16/1
6/1
1/1 2/12/1 2/12/1
6/(1+1+1) =2/1
2
adjustsizes4
(a) (c)(b)
(a)
A '1' Z CBIGCSMALL
charge sharing
VBIG→VFVSMALL→VF
(c)
'0'A
S'
Z A ZS=0 ZA S=1 A S ZS'
(b)S
strong '1'
strong '0'
ASICs... THE COURSE 2.5 Sequential Logic Cells 17
2.5 Sequential Logic Cells
Two choices for sequential logic: multiphase clocks or synchronous design. We choose
the latter.
2.5.1 Latch
CMOS latch ? enable ? transparent ? static ? sequential logic cell ? storage ? initial value
CLKNCLK CLKNI4 CLKPI5
CLKN Q
CLKPI2 I3I1D CLKP QI2 I3I1D QI2 I3I1D storageloop
(a) (b) (c)
DCLKQ
t
DCLKQ
t
latch is transparent
1DC1
18 SECTION 2 CMOS LOGIC ASICS... THE COURSE
2.5.2 Flip-Flop
CMOS flip-flop
? master latch ? slave latch
? active clock edge ? negative-edge–triggered flip-flop
? setup time (tSU) ? hold time (tH) ? clock-to-Q propagation delay (tPD)
? decision window
CLKN
CLKN
CLKPI2 I3I1D CLKP(a)
DCLK
t
CLKP
CLKNI6 I7
CLK CLKNI4 CLKPI5 CLKP
Q
QNI8I9
S
(b) MI2
I3
I1D load master SI6
I7store
Q
QNI8I9
(c) MI2
I3
I1D load slave SI6
I7store
Q
QNI8I9
CLK=1
CLK=0
MQ(d)
load master load slave load master load slave
tSU tH50%
tPD
1DC1
decisionwindow
master slave
MCLKN
ASICs... THE COURSE 2.6 Datapath Logic Cells 19
2.6 Datapath Logic Cells
? parity function ('1' for an odd numbers of '1's)
? majority function ('1' if the majority of the inputs are '1')
full adder (FA): SUM = A ⊕ B ⊕ CIN = SUM(A, B, CIN) = PARITY(A, B, CIN) ,
COUT = A · B + A · CIN + B · CIN = MAJ(A, B, CIN).
S[i] = SUM (A[i], B[i], CIN)
COUT = MAJ (A[i], B[i], CIN)
A datapath adder
? Ripple-carry adder (RCA)
? Data signals ? control signals ? datapath ? datapath cell or datapath element
? Datapath advantages: predictable and equal delay for each bit ? built-in interconnect
? Disadvantages of a datapath: overhead ? harder design ? software is more complex
(a)
SUM B[1]A[1]B[0]A[0]
B[2]A[2]B[3]
A[3]
VSS
COUT[3]
(b)
S[3]S[2]
S[1]S[0]AB
CIN
COUTADD
(d)
A B COUTCIN
(c)
COUT[3]
VSS
controldatam2
m1
S m2 m1COUT[2]
m1m2
CIN
COUT[2]
CIN[0]
20 SECTION 2 CMOS LOGIC ASICS... THE COURSE
2.6.1 Datapath Elements
ASICs... THE COURSE 2.6 Datapath Logic Cells 21
Binary arithmetic
Operation
Binary Number Representation
Unsigned Signedmagnitude Ones’ complement Two’scomplement
no change if positive then
MSB=0
else MSB=1
if negative
then flip bits
if negative
then {flip bits;
add 1}
3= 0011 0011 0011 0011
–3= NA 1011 1100 1101
zero= 0000 0000 or 1000 1111 or 0000 0000
max. positive= 1111=15 0111=7 0111=7 0111=7
max. negative= 0000=0 1111=–7 1000=–7 1000=–8
addition=
S= A+B
=addend+auge
nd
SG(A)=sign of A
S=A+B if SG(A)=SG(B)
then S=A+B
else {if B<A then
S=A–B
else S=B–A}
S=
A+B+COUT[MS
B]
COUT is carry
out
S=A+B
addition result:
OV=overflow,
OR=out of range
OR=COUT[M
SB]
COUT is carry
out
if SG(A)=SG(B)
then
OV=COUT[MSB]
else OV=0 (impossi-
ble)
OV=
XOR(COUT[MS
B],
COUT[MSB–1])
OV=
XOR(COUT[MS
B],
COUT[MSB–1]
)
SG(S)=sign of S
S= A+B
NA if SG(A)=SG(B)
then SG(S)=SG(A)
else {if B<A then
SG(S)=SG(A)
else SG(S)=SG(B)}
NA NA
subtraction=
D= A–B
=minuend
–subtrahend
D=A–B SG(B)=NOT(SG(B));
D=A+B
Z=–B (negate);
D=A+Z
Z=–B (negate);
D=A+Z
22 SECTION 2 CMOS LOGIC ASICS... THE COURSE
2.6.2 Adders
Generate, G[i] and propagate, P[i]
Carry signal:
Carry chain using two-input NAND gates, one per cell:
Carry-save adder (CSA) cell CSA(A1[i], A2[i], A3[i], CIN, S1[i], S2[i], COUT) has three out-
puts:
subtraction
result:
OV=overflow,
OR=out of range
OR=BOUT[M
SB]
BOUT is bor-
row out
as in addition as in addition as in addition
negation:
Z=–A (negate)
NA Z=A;
SG(Z)=NOT(SG(A))
Z=NOT(A) Z=NOT(A)+1
method 1 method 2
G[i] = A[i] · B[i] G[i] = A[i] · B[i]
P[i] = A[i] ⊕ B[i P[i] = A[i] + B[i]
C[i] = G[i] + P[i] · C[i–1] C[i] = G[i] + P[i] · C[i–1]
S[i] = P[i] ⊕ C[i–1] S[i] = A[i] ⊕ B[i] ⊕ C[i–1]
either C[i] = A[i] · B[i] + P[i] · C[i – 1]
or C[i] = (A[i] + B[i]) · (P[i]' + C[i – 1]), where P[i]'=NOT(P[i])
even stages odd stages
C1[i]' = P[i ] · C3[i – 1] · C4[i – 1] C3[i]' = P[i ] · C1[i – 1] · C2[i – 1]
C2[i] = A[i] + B[i ] C4[i]' = A[i] · B[i ]
C[i] = C1[i ] · C2[i ] C[i] = C3[i ]'+ C4[i ]'
S1[i] = CIN ,
S2[i] = A1[i] ⊕ A2[i] ⊕ A3[i ] = PARITY(A1[i], A2[i], A3[i ])
COUT = A1[i] · A2[i] + [(A1[i] + A2[i]) · A3[i ]] = MAJ(A1[i], A2[i], A3[i ])
ASICs... THE COURSE 2.6 Datapath Logic Cells 23
Carry-propagate adder (CPA)
carry-bypass adders (CBA):
carry-skip adder:
The carry-save adder (CSA) ? pipeline ? latency ? bit slice
C[7]=(G[7]+P[7]·C[6])·BYPASS'+C[3]·BYPASS
CSKIP[i] = (G[i] + P[i] · C[i – 1]) · SKIP' + C[i – 2] · SKIP
(a)
S1A1A2
CIN
COUTCSA
S2A3
COUT[MSB]
CIN[0](b)
COUT[MSB–1]
ΣA2[MSB:0]A4[MSB:0] ++A3[MSB:0] + Σ+++
A1[MSB:0] Σ S[MSB:0]+
+
(c)
(d)
Σ S[MSB:0]++ΣA2[MSB:0]
A4[MSB:0] ++A3[MSB:0] +
A1[MSB:0] Σ+++ CLKCLK
(e)
ΣA1[MSB:0]A3[MSB:0] S1[MSB:0]++A2[MSB:0] S2[MSB:0]+
OV
(f) (g)
CSA1 CSA2 RCA
CSA1 CSA2 RCApipeline registers
RCACLK CLK
pipeline registers
CSA1 CSA21
23 45 1 2 3 4 5
nn
nnnn
n
n
nn
nn n
n
n
n
nn
n nn
COUT[MSB] COUT[MSB–1]
CSA1 CSA2 RCAbit slice MSB
LSB
24 SECTION 2 CMOS LOGIC ASICS... THE COURSE
Carry-lookahead adder (CLA, for example the Brent–Kung adder):
Carry-select adder duplicates two small adders for the cases CIN='0' and CIN='1' and then
uses a MUX to select the case that we need
C[1] = G[1] + P[1] · C[0]
= G[1] + P[1] · (G[0] + P[1] · C[–1])
= G[1] + P[1] · G[0]
C[2] = G[2] + P[2] · G[1] + P[2] · P[1] · G[0] ,
C[3] = G[3] + P[2] · G[2] + P[2] · P[1] · G[1] + P[3] · P[2] · P[1] · G[0]
The Brent–Kung carry-lookahead adder
A[i] B[i]
G[i]P[i]
G[i +1]P[i +1]
G[0]P[0]G[1]P[1] C[1] =G[1]+P[0]P[2]P[0]P[1]G[2] C[2] =G[2]+P[2]G[1]+P[2]P[1]G[0]P[3]P[0]P[1]P[2]G[3]
C[3]= G[3]+P[3]G[2]+ P[3]P[2]G[1]
+P[3]P[2]P[1]G[0]P[0]P[1]P[2]P[3]
CLG
CLG CLG CLG
G[i +1]+P[ i]
P[i]P[i+ 1]
G[0]P[0]G[1]P[1]
G[2]P[2]G[3]P[3]
CLG
C[3]
C[2]C[1]
L1 L2 L3
L4
01 2 3 123
012
3
1
32
(a)
(b) (c)
(d)
(e) (f)
CLG
CLG CLG
L1
L2 L3
012
345
67
0
0
012
345
67
G[i]/P[i] in C[i] out Each wire is a bundle ofG[i +1]+P[ i] and P[i]P[i +1].
(g)
A[i] B[i]
G[i]P[i] Sum[i]C[i]
orP[i]
Create generate and propagate signals.Create carry signals.Create sum signals.
ASICs... THE COURSE 2.6 Datapath Logic Cells 25
The conditional-sum adder
A[0] B[0]
C1_0_0
H0
C[0]
C1_0_1
A[1] B[1]H1stage 0
1
2 S[1] C[2] S[0]
bit 1 0
Q1_0
Q2_1
A[i] B[i]H
A[i] ⊕B[i](A[i] ⊕ B[i])'A[i].B[i]A[i]+B[ i]
(a) (c)
Ci_j_kSi_j_1 orCi_j_1 Si_j_0 orCi_j_0
G111 Si_j_k orCi_j_kSi_j_k or Ci_j_k
Qi_j
(b)
(k =0 or 1)
Ci_j_k =carry in to the ith bit assuming the carry in to the jth bit is k (k =0 or 1)Si_j_k =sum at the ith bit assuming the carry in to the jth bit is k (k =0 or 1)
Ci_j_kSi_j_0 orCi_j_0
Si_j_1 orCi_j_1
carry out (carry in=0)sum (carry in =0)
sum (carry in =1)carry out (carry in=1) Q1_1
26 SECTION 2 CMOS LOGIC ASICS... THE COURSE
2.6.3 A Simple Example
2.6.4 Multipliers
? Mental arithmetic: 15 (multiplicand) × 19 (multiplier) = 15×(20–1) = 15×21
? Suppose we want to multiply by B=00010111 (decimal 16+4+2+1=23)
? Use the canonical signed-digit vector (CSD vector) D=00101001 (decimal 32–8+1=
23)
? B has a weight of 4, but D has a weight of 3 — and saves hardware
An 8-bit conditional-sum adder
module m8bitCSum (C0, a, b, s, C8); // Verilog conditional-sum adder
for an FPGA //1
input [7:0] C0, a, b; output [7:0] s; output C8; //2
wire
A7,A6,A5,A4,A3,A2,A1,A0,B7,B6,B5,B4,B3,B2,B1,B0,S8,S7,S6,S5,S4,S3,S2
,S1,S0; //3
wire C0, C2, C4_2_0, C4_2_1, S5_4_0, S5_4_1, C6, C6_4_0, C6_4_1,
C8; //4
assign {A7,A6,A5,A4,A3,A2,A1,A0} = a; assign
{B7,B6,B5,B4,B3,B2,B1,B0} = b; //5
assign s = { S7,S6,S5,S4,S3,S2,S1,S0 }; //6
assign S0 = A0^B0^C0 ; // start of level 1: & = AND, ^ = XOR, | =
OR, ! = NOT //7
assign S1 = A1^B1^(A0&B0|(A0|B0)&C0) ; //8
assign C2 = A1&B1|(A1|B1)&(A0&B0|(A0|B0)&C0) ; //9
assign C4_2_0 = A3&B3|(A3|B3)&(A2&B2) ; assign C4_2_1 =
A3&B3|(A3|B3)&(A2|B2) ; //10
assign S5_4_0 = A5^B5^(A4&B4) ; assign S5_4_1 = A5^B5^(A4|B4) ; //11
assign C6_4_0 = A5&B5|(A5|B5)&(A4&B4) ; assign C6_4_1 =
A5&B5|(A5|B5)&(A4|B4) ; //12
assign S2 = A2^B2^C2 ; // start of level 2 //13
assign S3 = A3^B3^(A2&B2|(A2|B2)&C2) ; //14
assign S4 = A4^B4^(C4_2_0|C4_2_1&C2) ; //15
assign S5 = S5_4_0&
!(C4_2_0|C4_2_1&C2)|S5_4_1&(C4_2_0|C4_2_1&C2) ; //16
assign C6 = C6_4_0|C6_4_1&(C4_2_0|C4_2_1&C2) ; //17
assign S6 = A6^B6^C6 ; // start of level 3 //18
assign S7 = A7^B7^(A6&B6|(A6|B6)&C6) ; //19
assign C8 = A7&B7|(A7|B7s)&(A6&B6|(A6|B6)&C6) ; //20
endmodule //21
ASICs... THE COURSE 2.6 Datapath Logic Cells 27
Datapath adders
To recode (or encode) any binary number, B, as a CSD vector, D: Di = Bi + Ci – 2Ci + 1 ,
where Ci+1 is the carry from the sum of Bi+1 +Bi+Ci (we start with C0=0).
If B=011 (B2=0, B1=1, B0=1; decimal 3), then: D0 = B0 + C0 – 2C1 = 1 + 0 – 2 = 1,
D1 = B1 + C1 – 2C2 = 1 + 1 – 2 = 0,
D2 = B2 + C2 – 2C3 = 0 + 1 – 0 = 1,
so that D= 101 (decimal 4–1=3).
We can use a radix other than 2, for example Booth encoding (radix-4):
B=101001 (decimal 9–32=–23) ? E=1 21 (decimal –16–8+1=–23)
B=01011 (eleven) ? E=11 1 (16–4–1)
B=101 ? E=11
normalizeddelay
bits8 16 32 64
120
80
40
area/k λ2
bits8 16 32 64
3000
2000
10002-input
NAND =1
ripple-carrycarry-select
carry-save
ripple-carrycarry-select
carry-save
(b)(a)
28 SECTION 2 CMOS LOGIC ASICS... THE COURSE
Tree-based multiplication – at each stage we have the following three choices:
(1) sum three outputs using a full adder
(2) sum two outputs using a half adder
(3) pass the outputs to the next stage
FAA B
Sum
COUT CINfull adder
S31S51 S41
S22S42 S32
S23
S14
S13
S04
S33
S24
S50
P5 P4P6
'0''0' S40
'0'
'0'
S15 S05
'0'
'0'
S41S50 S32 S14S23 S05
P5
'0'
a0
b1
c2
d3
e4
f5
b0
c1
d2
e3
f4
a1
b2
c3
d4
e5
f6
5.1 5.2
5.3
5.4
5.5
Wallace treecarry-save chain(a) (b)
(c)
fulladder
halfadder
Each dotrepresentsan output ofone stage
and aninput to thenext.
P5
S50S41S32
S23S14S05
5.1
5.4
5.5
5.25.3
1
23
4
0
56
redundantcarry
ASICs... THE COURSE 2.6 Datapath Logic Cells 29
A Wallace-tree multiplier works forward from the multiplier inputs
? Full adder is a 3:2 compressor or (3, 2) counter
? Half adder is a (2, 2) counter
FAA B
Sum
COUT CINfulladder
S50 S32 S05
P5
'0'
12
34
56
7
0
S41
S15S33S24
P6
P7P8P9P10P11
S55
S42S51S45S54 S44S53 S43S52 S34S35
S25
P4
P3
P2
P1
'0' '0'
'0'
'0'
'0'S04 S03 S02
S31 S30
S14S23 S13S22 S12S21 S11S20 S01S10
S40
1
2
3
4
5
6
7
15
P5
S00
P01 2 310 11 12
17
4 5
13
18 19
22 23
25
26
27 28 29 30
6 7 8 9
1614
20
24
21
30 SECTION 2 CMOS LOGIC ASICS... THE COURSE
The Dadda multiplier works backward from the final product
? Each stage has a maximum of 2, 3, 4, 6, 9, 13, 19, ...outputs (each successive stage is
3/2 times larger—rounded down to an integer
The number of stages and thus delay (in units of an FA delay—excluding the CPA) for an n-bit
tree-based multiplier using (3, 2) counters is
log1.5 n = log10 n/log10 1.5 = log10 n/0.176
P1P2P3P4P5P6P8 P7P9P10P11
S04S13
S03 S12
S40
'0'S22S25S34
'0'S10S01
S02 S11
S35S44
S55
S54
S52
S53 S50 S21S31 S30
S24 '0'S33S42S51S15 S14 '0'S23S32S41S05S43
P0
S00
S20'0'S45
12
34
0
'0'
1
7 8 9 10
2 3 4
13 14 15 16 17
21 22 23 24 25 26
6
11 12
5
18 19
27 28
20
29 30
ASICs... THE COURSE 2.6 Datapath Logic Cells 31
Ferrari–Stefanelli architecture “nests” multipliers
(a) (b)
A0 B0
B3S32 (3, 2)
counter
A0 B0
B3
A3
Z0
2-bitsubmultiplier
(3, 2)counter
A3B2 S32 two-inputAND A0B0
A1B0A
0B1 B1
B1A0B0A1
Z'0
Z'1
Z'2
Z'4
(c)
A1A2
B1B2B1B2
A3 A1A2
32 SECTION 2 CMOS LOGIC ASICS... THE COURSE
2.6.5 Other Arithmetic Systems
? 101 (decimal) is 1100101 (in binary and CSD vector) or 11100111
? 188 (decimal) is 10111100 (in binary), 111000100, 101001100, or 101000100 (CSD
vector)
? 101 is represented as 010010 (using sign magnitude) — rather wasteful
Residue number system
? 11 (decimal) is represented as [1, 2] residue (5, 3)
? 11R5=11 mod 5=1 and 11R3=11 mod 3=2
? The size of this system is 3×5=15
? We can now add, subtract, or multiply without using any carry
binary decimal redundant binary CSD vector
1010111 87 10101001 10101001 addend
+ 1100101 101 + 11100111 + 01100101 augend
01001110 = 11001100 intermediate sum
11000101 11000000 intermediate carry
= 10111100 = 188 111000100 101001100 sum
Redundant binary addition ? redundant binary encoding avoids carry propagation
A[i] B[i] A[i–1] B[i–1] Intermediatesum Intermediate carry
1 1 x x 0 1
1 0
A[i–1]=0/1 and
B[i–1]=0/1 1 0
0 1 A[i–1]=1 or B[i–1]=1 1 1
1 1 x x 0 0
1 1 x x 0 0
0 0 x x 0 0
0 1
A[i–1]=0/1 and
B[i–1]=0/1 1 1
1 0 A[i–1]=1 or B[i–1]=1 1 0
1 1 x x 0 1
ASICs... THE COURSE 2.6 Datapath Logic Cells 33
4 [4, 1] 12 [2, 0] 3 [3, 0]
+ 7 + [2, 1] – 4 – [4, 1] × 4 × [4, 1]
= 11 = [1, 2] = 8 = [3, 2] = 12 = [2, 0]
The 5, 3 residue number system
n residue 5 residue 3 n residue 5 residue 3 n residue 5 residue 3
0 0 0 5 0 2 10 0 1
1 1 1 6 1 0 11 1 2
2 2 2 7 2 1 12 2 0
3 3 0 8 3 2 13 3 1
4 4 1 9 4 0 14 4 2
34 SECTION 2 CMOS LOGIC ASICS... THE COURSE
2.6.6 Other Datapath Operators
Symbols for datapath elements
Full subtracter DIFF = A ⊕ NOT(B) ⊕ ΝΟΤ(BIN)
= SUM(A, NOT(B), NOT(BIN))
NOT(BOUT
) = A · NOT(B) + A · NOT(BIN) + NOT(B) · NOT(BIN)
= MAJ(NOT(A), B, NOT(BIN))
Keywords: adder/subtracter ? barrel shifter ? normalizer ? denormalizer ? leading-one detector
? priority encoder ? exponent correcter ? accumulator ? multiplier–accumulator (MAC) ?
incrementer ? decrementer ? incrementer/decrementer ? all-zeros detector ? all-ones detector
? register file ? first-in first-out register (FIFO) ? last-in first-out register (LIFO)
Q[MSB:0]CLK PRED[MSB:0]
S0
1A[MSB:0]B[MSB:0] Z[MSB:0] Σ
S[MSB:0]A[MSB:0]B[MSB:0] ++/-Z[MSB:0]+/-1 =1 Z=0 Z(a)
A[MSB:0]B[MSB:0] Z[MSB:0] B[MSB:0]A Z[MSB:0]
(b) (c)
(d) (e) (f) (g) (h)
ASICs... THE COURSE 2.7 I/O Cells 35
2.7 I/O Cells
2.8 Cell Compilers
2.9 Summary
? The use of transistors as switches
? The difference between a flip-flop and a latch
? The meaning of setup time and hold time
Keywords:Tri-State? is a registered trademark of National Semiconductor) ? drivers ? con-
tention ? bus keeper or bus-hold cell (TI calls this Bus-Friendly logic) ? slew rate ? power-
supply bounce ? simultaneously switching outputs (SSOs) ? quiet-I/O ? bidirectional I/O
? open-drain? level shifter? electrostatic discharge, or ESD ? electrical overstress(EOS)
? ESD implant? human-body model(HBM) ? machine model(MM) ? charge-device model
(CDM, also called device charge–discharge) ? latch-up ? undershoot ? overshoot ? guard
rings
A three-state bidirectional output buffer
Keywords: silicon compilers ? RAM compiler ? multiplier compiler ? single-port RAM ? dual-port
RAMs ? multiport RAMs ? asynchronous ? synchronous ? model compiler ? netlist compiler ?
correct by construction
I/Opad
VDDOE
DATAout
M1
M2
ND1
NR1
I2
DATAin I1
outputenable
to corelogic
from corelogic
36 SECTION 2 CMOS LOGIC ASICS... THE COURSE
? Pipelines and latency
? The difference between datapath, standard-cell, and gate-array logic cells
? Strong and weak logic levels
? Pushing bubbles
? Ratio of logic
? Resistance per square of layers and their relative values in CMOS
? Design rules and λ
2.10 Problems
Suggested homework: 2.1, 2.2, 2.38, 2.39 (from ASICs... the book)