ASICs...THE COURSE (1 WEEK) 1 CMOS LOGIC ? CMOS transistor (or device) ? A transistor has three terminals: gate, source, drain (and a fourth that we ignore for a moment) ? An MOS transistor looks like a switch (conducting/on, nonconducting/off, not open or closed) Key concepts: The use of transistors as switches ? The difference between a flip-flop and a latch ? Setup time and hold time ? Pipelines and latency ? The difference between datapath, standard-cell, and gate-array logic cells ? Strong and weak logic levels ? Pushing bubbles ? Ratio of logic ? Resistance per square of layers and their relative values in CMOS ? Design rules and λ CMOS transistors viewed as switches ? a CMOS inverter gate drainsource '1' = '0' = n-channel transistor gate drainsource '1' = '0' = p-channel transistor '1' = '1' = '0' = '0' = VDDVDD'0' '1''1' '0' GND orVSS '0' '1' '0' '1' = VDDA F A F (a) (c)(b) off onoff on GND orVSS GND orVSS 2 2 SECTION 2 CMOS LOGIC ASICS... THE COURSE CMOS logic ? a two-input NAND gate ? a two-input NOR gate ? Good '1's ? Good '0's offoff 0 1AB 1 01 101 F=NAND(A, B) VDD off off F =1B=0 A=0 onon VDD off on F =0B=0 A=1 offon B=1 VDD A=1 off off onon F=0B=0 VDD A=1 on off offon F=1B=1 VDD A=0 off on onoff F=1 VDD on off F =0B=1 A=0 onoff VDD on on F =0B=1 A=1 0 1AB 0 01 001 F=NOR(A, B) p-channeln-channel p-channeln-channel (a) (b) F=1B=0 VDD A=0 on on offoff ASICs... THE COURSE 2.1 CMOS Transistors 3 2.1 CMOS Transistors ? Channel charge = Q (imagine taking a picture and counting the electrons) ? tf is time of flight or transit time ? μn is the electron mobility (μp is the hole mobility) ? E is the electric field (units Vm–1) An n-channel transistor ? channel ? source ? drain ? depletion region ? gate ? bulk current (amperes) = charge (coulombs) per unit time (second) The drain-to-source current IDSn = Q/tf The (vector) velocity of the electrons v = –μnE L L2 tf = ––– = ––––––– vx μnVDS GND orVSS +VDSLWVGS bulksource drainTox Exelectrons ++ VDSbulk draingate sourceVGS+ mobile channel charge depletionregionp-type n-type n-typegate fixed depletion charge 4 SECTION 2 CMOS LOGIC ASICS... THE COURSE ? The linear region (triode region) extends until VDS=VGS–Vtn ? VDS=VGS–Vtn=VDS(sat) (saturation voltage) ? VDS>VGS–Vtn (the saturation region, or pentode region, of operation) ? saturation current, IDSn(sat) Q = C(VGC – Vtn) = C [ (VGS – Vtn) – 0.5 VDS ] = WLCox [ (VGS – Vtn) – 0.5 VDS ] IDSn = Q/tf = (W/L)μnCox[ (VGS – Vtn) – 0.5 VDS ]VDS = (W/L)k'n [ (VGS – Vtn) – 0.5 VDS ]VDS k'n = μnCox is the process transconductance parameter (or intrinsic transconductance) βn = k'n(W/L) is the transistor gain factor (or just gain factor) IDSn(sat) = (βn/2)(VGS – Vtn)2 ; VGS > Vtn ASICs... THE COURSE 2.1 CMOS Transistors 5 2.1.1 P-Channel Transistors ? Vtp is negative ? VDS and VGS are normally negative (and –3V<–2V) (a) (b) MOS n-channel transistor characteristics (c) IDSp = –k'p(W/L)[ (VGS – Vtp) – 0.5 VDS ]VDS ; VDS > VGS – Vtp IDSp(sat) = –βp/2 (VGS – Vtp)2 ; VDS < VGS – Vtp . 0 1 2 3 0 1 2 3 n-ch. W/L=6/0.6 VDS /V IDS /mA n-ch. W/L=60/6 VGS /V 2.52.0 1.51.0 3.0 0.5, 0.0 1 0 1 2 3 012 30 3 n-ch.W/L=6/0.6IDS /mA VDS /V VGS /V 1 2 2 3 VDS =3.0 V 0 1 0 1 2 3 2 n-ch. W/L=6/0.6 n-ch. W/L =60/6 VGS /V IDS(sat) /mA IDS (sat) ∝ (VGS –V tn)2 IDS (sat) ∝ VGS –V tn 6 SECTION 2 CMOS LOGIC ASICS... THE COURSE 2.1.2 Velocity Saturation ? vmaxn=105ms–1 ? velocity saturation ? tf =Leff/vmaxn ? mobility degradation 2.1.3 SPICE Models ? KP (in μAV–2) = k'n (k'p) ? VT0 and TOX = Vtn (Vtp) and Tox ? U0 (in cm2V–1s–1) = μn (and μp) IDSn(sat) = WvmaxnCox (VGS – Vtn) ; VDS > VDS(sat) (velocity saturated). SPICE parameters .MODEL CMOSN NMOS LEVEL=3 PHI=0.7 TOX=10E-09 XJ=0.2U TPG=1 VTO=0.65 DELTA=0.7 + LD=5E-08 KP=2E-04 UO=550 THETA=0.27 RSH=2 GAMMA=0.6 NSUB=1.4E+17 NFS=6E+11 + VMAX=2E+05 ETA=3.7E-02 KAPPA=2.9E-02 CGDO=3.0E-10 CGSO=3.0E-10 CGBO=4.0E-10 + CJ=5.6E-04 MJ=0.56 CJSW=5E-11 MJSW=0.52 PB=1 .MODEL CMOSP PMOS LEVEL=3 PHI=0.7 TOX=10E-09 XJ=0.2U TPG=-1 VTO=- 0.92 DELTA=0.29 + LD=3.5E-08 KP=4.9E-05 UO=135 THETA=0.18 RSH=2 GAMMA=0.47 NSUB=8.5E+16 NFS=6.5E+11 + VMAX=2.5E+05 ETA=2.45E-02 KAPPA=7.96 CGDO=2.4E-10 CGSO=2.4E-10 CGBO=3.8E-10 + CJ=9.3E-04 MJ=0.47 CJSW=2.9E-10 MJSW=0.505 PB=1 ASICs... THE COURSE 2.1 CMOS Transistors 7 2.1.4 Logic Levels CMOS logic levels ? VSS is a strong '0' ? VDD is a strong '1' ? degraded logic levels: VDD–Vtn is a weak '1' ; VSS–Vtp (Vtp is negative) is a weak '0' '1' VGD >V tnVGS >V tn '0' '1' → '0'–Q strong '0' '1''0'VC tweak '0'SD '0' '0' VCG strong '1''0' '1'VC SDG '0' VGD< VtpVGS <V tp +Q VDD+ VDD +'0' →'1'strong '1' VGD=0 VGS = –Vtp'0' '1' VDD+ '1' → '0'– Vtpweak '0''0' '0' → '1''1''0'VC t (a) (b) (c) (d) '1' → '0'–V tp gate n-type gate n-type gate p-type '1' →'0'strong '0' '1''0'VC t DS '0' '1' VCG '1' →'0' weak '1''1' '1'VC DSG '1' '0' → '1'–V tn'0'VC t '0' →'1'–V tn VGD=0VGS =V tn VDD '1' '1' weak '1' + no channel charge gate p-type no channel charge p-typep-typedrain sourcep-typedrain p-typesource n-typesource n-typedrain n-typesource n-typedrain 8 SECTION 2 CMOS LOGIC ASICS... THE COURSE 2.2 The CMOS Process The CMOS manufacturing process Key words: boule ? wafer ? boat ? silicon dioxide ? resist ? mask ? chemical etch ? isotropic ? plasma etch ? anisotropic ? ion implantation ? implant energy and dose ? polysilicon ? chemical vapor deposition (CVD) ? sputtering ? photolithography ? submicron and deep-submicron process ? n-well process ? p-well process ? twin-tub (or twin-well) ? triple-well ? substrate contacts (well contacts or tub ties) ? active (CAA) ? gate oxide ? field ? field implant or chan- nel-stop implant ? field oxide (FOX) ? bloat ? dopant ? self-aligned process ? positive resist ? negative resist ? drain engineering ? LDD process ? lightly doped drain ? LDD diffusion or LDD implant ? stipple-pattern 1 2 43 6 As+ 5 7 8 9 10 11 12 1hour grow crystal saw resistspinfurnace mask etchresistoxide wafer grow oxide ASICs... THE COURSE 2.2 The CMOS Process 9 Mask/layer name Derivation from drawn layers Alternative names for mask/layer Mask label n-well =nwell bulk, substrate, tub, n-tub, moat CWN p-well =pwell bulk, substrate, tub, p-tub, moat CWP active =pdiff+ndiff thin oxide, thinox, island, gate oxide CAA polysilicon =poly poly, gate CPG n-diffusion implant =grow(ndiff) ndiff, n-select, nplus, n+ CSN p-diffusion implant =grow(pdiff) pdiff, p-select, pplus, p+ CSP contact =contact contact cut, poly contact, diffusion con-tact CCP and CCA metal1 =m1 first-level metal CMF metal2 =m2 second-level metal CMS via2 =via2 metal2/metal3 via, m2/m3 via CVS metal3 =m3 third-level metal CMT glass =glass passivation, overglass, pad COG 10 SECTION 2 CMOS LOGIC ASICS... THE COURSE (a) nwell (b) pwell (c) ndiff (d) pdiff (e) poly (f) contact (g) m1 (h) via (i) m2 (j) cell (k) phantom The mask layers of a standard cell ASICs... THE COURSE 2.2 The CMOS Process 11 Active mask CAA (mask) = ndiff (drawn) ∨ pdiff (drawn) Implant select masks CSN (mask) = grow (ndiff (drawn)) and CSP (mask) = grow (pdiff (drawn)) Source and drain diffusion (on the silicon) n-diffusion (silicon) = (CAA (mask) ∧ CSN (mask)) ∧ (?CPG (mask)) and p-diffusion(silicon)=(CAA(mask) ∧ CSP(mask)) ∧ (?CPG(mask)) Source and drain diffusion (on the silicon) in terms of drawn layers n-diffusion (silicon) = (ndiff (drawn)) ∧ (?poly (drawn)) and p-diffusion (silicon) = (pdiff (drawn)) ∧ (?poly (drawn)) 12 SECTION 2 CMOS LOGIC ASICS... THE COURSE Drawn layers and stipple patterns The transistor layers pdiff polynwell pwell ndiff contact via1 via2m1 m2 m3 glass(or solid) (or solid)(or solid) pdiff nwell poly p-diffusion polysiliconfield oxide n-well (or substrate) gate oxide (a) (b) y x field implant source/drain diffusionLDD diffusion 2λ xyz 2λ ASICs... THE COURSE 2.2 The CMOS Process 13 2.2.1 Sheet Resistance The interconnect layers Sheet resistance (1μm ) Sheet resistance (0.35μm) Layer Sheetresistance Units Layer Sheetresistance Units n-well 1.15± 0.25 k?/square n-well 1± 0.4 k?/square poly 3.5± 2.0 ?/square poly 10± 4.0 ?/square n-diffusion 75± 20 ?/square n-diffusion 3.5± 2.0 ?/square p-diffusion 140± 40 ?/square p-diffusion 2.5± 1.5 ?/square m1/2 70± 6 m?/square m1/2/3 60± 6 m?/square m3 30± 3 m?/square metal4 30± 3 m?/square Key words: diffusion ? ?/square (ohms per square) ? sheet resistance ? silicide ? self- aligned silicide (salicide) ? LI, white metal, local interconnect, metal0, or m0 ? m1 or metal1 ? diffusion contacts ? polysilicon contacts ? barrier metal ? contact plugs (via plugs) ? chemical–mechanical polishing (CMP) ? intermetal oxide (IMO) ? interlevel dielectric (ILD) ? metal vias, cuts, or vias ? stacked vias and stacked contacts ? two-level metal (2LM) ? 3LM (m3 or metal3) ? via1 ? via2 ? metal pitch ? electromigration ? contact resis- tance and via resistance via1 via2m3m2 m1 contact W plug(4000?) AlCu(3000?) Pt barrier(200?)m3m2 (a) (b) TiW y x xy z +via1contact+m1+m2 m2+via2 +m3 2λ 14 SECTION 2 CMOS LOGIC ASICS... THE COURSE 2.3 CMOS Design Rules Scalable CMOS design rules nwell pwell nwell pwell ndiff pdiff pdiffndiffndiffpdiff pdiff nwell poly nwell pwell p-selectn-selectndiff poly poly nwell poly metal2m1 polycontactpdiff polyactive contact m3 via2 m3 glass m2 m1 n-selectpdiffp-select ndiff pdiff 1. well 2. active 3. poly 4. select 5. polycontact 6. activecontact 7. metal1 9. metal2 15. metal3 10. overglass (microns) pwell nwell hot poly ndiff m1 m2via1 8. via1 m2 m3via2 via1m2 14. via2 m2via1 m1 21 4 3 5 6 7 8 15 10149 m3 0 (1.4) 9 (1.2) 10 (1.1) 0 or 6 (1.3) 3 (2.1) 3(2.2) 5 (2.3)0 or 4(2.5) 0 or 4(2.5) 3 (2.4)3(2.2) 3 (2.1) 5 (2.3) 3 (2.4) 2 (3.2) 2 (3.1) 2 (3.3) 1 (3.5)3 (3.4) 1.5(5.2a) 2 × 2 (5.1a) 2 (5.3a) 1.5 (6.2a) 2 × 2(6.1a) 2 (6.4a) 1.5(6.2a) 3 (8.2) 2 × 2 (8.1)2 (8.5) 2 (8.5) 2 (8.4)1 (8.3) 2 (6.3a)1 (4.3) 2 (4.2) 3 (7.1) 1 (7.3)3(7.2a) 1 (7.4) 3 (4.1) 2(7.2b) 3(14.2) 2 (14.4) 1(14.3)2 ×2 (14.1)3 (9.1) 4(9.2a) 1(9.3)3(9.2b) 6(15.1)4 (15.2) 2 (15.3) 6 (10.3) 30 (10.4) 15(10.5) 100 ×100 (10.1) ASICs... THE COURSE 2.4 Combinational Logic Cells 15 2.4 Combinational Logic Cells 2.4.1 Pushing Bubbles 2.4.2 Drive Strength We ratio a cell to adjust its drive strength and make βn=βp to create equal rise and fall times Naming of complex CMOS com- binational logic cells The AOI family of cells with three index numbers or less Cell type1 Cells Number of unique cells Xa1 X21, X31 2 Xa11 X211, X311 2 Xab X22, X33, X32 3 Xab1 X221, X331, X321 3 Xabc X222, X333, X332, X322 4 Total 14 1Xabc: X={AOI, AO, OAI, OA}; a, b, c = {2, 3}; {} means “choose one.” BCD E A Z BCDEA F Z AOI221 AOI221 OAI321 OAI321 (a) (b) OR AND INVERTORAND INVERT 16 SECTION 2 CMOS LOGIC ASICS... THE COURSE 2.4.3 Transmission Gates Charge sharing: suppose CBIG=0.2pF and CSMALL =0.02pF, VBIG=0V and VSMALL =5V; then Constructing a CMOS logic cell—an AOI221 ? pushing bubbles ? de Morgan’s theorem ? network duals CMOS transmission gate (TG, TX gate, pass gate, coupler) (0.2 × 10–12) (0) + (0.02 × 10–12) (5) VF = ––––––––––––––––––––––––––– – = 0.45 V (0.2 × 10–12) + (0.02 × 10–12) ZABCDE VDD Z AC E BD E AB CD ZABCDE push bubbles to the inputs OR = parallelAND = series OR = parallelAND = series1 3 VDD6/1 6/16/16/1 6/1 1/1 2/12/1 2/12/1 6/(1+1+1) =2/1 2 adjustsizes4 (a) (c)(b) (a) A '1' Z CBIGCSMALL charge sharing VBIG→VFVSMALL→VF (c) '0'A S' Z A ZS=0 ZA S=1 A S ZS' (b)S strong '1' strong '0' ASICs... THE COURSE 2.5 Sequential Logic Cells 17 2.5 Sequential Logic Cells Two choices for sequential logic: multiphase clocks or synchronous design. We choose the latter. 2.5.1 Latch CMOS latch ? enable ? transparent ? static ? sequential logic cell ? storage ? initial value CLKNCLK CLKNI4 CLKPI5 CLKN Q CLKPI2 I3I1D CLKP QI2 I3I1D QI2 I3I1D storageloop (a) (b) (c) DCLKQ t DCLKQ t latch is transparent 1DC1 18 SECTION 2 CMOS LOGIC ASICS... THE COURSE 2.5.2 Flip-Flop CMOS flip-flop ? master latch ? slave latch ? active clock edge ? negative-edge–triggered flip-flop ? setup time (tSU) ? hold time (tH) ? clock-to-Q propagation delay (tPD) ? decision window CLKN CLKN CLKPI2 I3I1D CLKP(a) DCLK t CLKP CLKNI6 I7 CLK CLKNI4 CLKPI5 CLKP Q QNI8I9 S (b) MI2 I3 I1D load master SI6 I7store Q QNI8I9 (c) MI2 I3 I1D load slave SI6 I7store Q QNI8I9 CLK=1 CLK=0 MQ(d) load master load slave load master load slave tSU tH50% tPD 1DC1 decisionwindow master slave MCLKN ASICs... THE COURSE 2.6 Datapath Logic Cells 19 2.6 Datapath Logic Cells ? parity function ('1' for an odd numbers of '1's) ? majority function ('1' if the majority of the inputs are '1') full adder (FA): SUM = A ⊕ B ⊕ CIN = SUM(A, B, CIN) = PARITY(A, B, CIN) , COUT = A · B + A · CIN + B · CIN = MAJ(A, B, CIN). S[i] = SUM (A[i], B[i], CIN) COUT = MAJ (A[i], B[i], CIN) A datapath adder ? Ripple-carry adder (RCA) ? Data signals ? control signals ? datapath ? datapath cell or datapath element ? Datapath advantages: predictable and equal delay for each bit ? built-in interconnect ? Disadvantages of a datapath: overhead ? harder design ? software is more complex (a) SUM B[1]A[1]B[0]A[0] B[2]A[2]B[3] A[3] VSS COUT[3] (b) S[3]S[2] S[1]S[0]AB CIN COUTADD (d) A B COUTCIN (c) COUT[3] VSS controldatam2 m1 S m2 m1COUT[2] m1m2 CIN COUT[2] CIN[0] 20 SECTION 2 CMOS LOGIC ASICS... THE COURSE 2.6.1 Datapath Elements ASICs... THE COURSE 2.6 Datapath Logic Cells 21 Binary arithmetic Operation Binary Number Representation Unsigned Signedmagnitude Ones’ complement Two’scomplement no change if positive then MSB=0 else MSB=1 if negative then flip bits if negative then {flip bits; add 1} 3= 0011 0011 0011 0011 –3= NA 1011 1100 1101 zero= 0000 0000 or 1000 1111 or 0000 0000 max. positive= 1111=15 0111=7 0111=7 0111=7 max. negative= 0000=0 1111=–7 1000=–7 1000=–8 addition= S= A+B =addend+auge nd SG(A)=sign of A S=A+B if SG(A)=SG(B) then S=A+B else {if B<A then S=A–B else S=B–A} S= A+B+COUT[MS B] COUT is carry out S=A+B addition result: OV=overflow, OR=out of range OR=COUT[M SB] COUT is carry out if SG(A)=SG(B) then OV=COUT[MSB] else OV=0 (impossi- ble) OV= XOR(COUT[MS B], COUT[MSB–1]) OV= XOR(COUT[MS B], COUT[MSB–1] ) SG(S)=sign of S S= A+B NA if SG(A)=SG(B) then SG(S)=SG(A) else {if B<A then SG(S)=SG(A) else SG(S)=SG(B)} NA NA subtraction= D= A–B =minuend –subtrahend D=A–B SG(B)=NOT(SG(B)); D=A+B Z=–B (negate); D=A+Z Z=–B (negate); D=A+Z 22 SECTION 2 CMOS LOGIC ASICS... THE COURSE 2.6.2 Adders Generate, G[i] and propagate, P[i] Carry signal: Carry chain using two-input NAND gates, one per cell: Carry-save adder (CSA) cell CSA(A1[i], A2[i], A3[i], CIN, S1[i], S2[i], COUT) has three out- puts: subtraction result: OV=overflow, OR=out of range OR=BOUT[M SB] BOUT is bor- row out as in addition as in addition as in addition negation: Z=–A (negate) NA Z=A; SG(Z)=NOT(SG(A)) Z=NOT(A) Z=NOT(A)+1 method 1 method 2 G[i] = A[i] · B[i] G[i] = A[i] · B[i] P[i] = A[i] ⊕ B[i P[i] = A[i] + B[i] C[i] = G[i] + P[i] · C[i–1] C[i] = G[i] + P[i] · C[i–1] S[i] = P[i] ⊕ C[i–1] S[i] = A[i] ⊕ B[i] ⊕ C[i–1] either C[i] = A[i] · B[i] + P[i] · C[i – 1] or C[i] = (A[i] + B[i]) · (P[i]' + C[i – 1]), where P[i]'=NOT(P[i]) even stages odd stages C1[i]' = P[i ] · C3[i – 1] · C4[i – 1] C3[i]' = P[i ] · C1[i – 1] · C2[i – 1] C2[i] = A[i] + B[i ] C4[i]' = A[i] · B[i ] C[i] = C1[i ] · C2[i ] C[i] = C3[i ]'+ C4[i ]' S1[i] = CIN , S2[i] = A1[i] ⊕ A2[i] ⊕ A3[i ] = PARITY(A1[i], A2[i], A3[i ]) COUT = A1[i] · A2[i] + [(A1[i] + A2[i]) · A3[i ]] = MAJ(A1[i], A2[i], A3[i ]) ASICs... THE COURSE 2.6 Datapath Logic Cells 23 Carry-propagate adder (CPA) carry-bypass adders (CBA): carry-skip adder: The carry-save adder (CSA) ? pipeline ? latency ? bit slice C[7]=(G[7]+P[7]·C[6])·BYPASS'+C[3]·BYPASS CSKIP[i] = (G[i] + P[i] · C[i – 1]) · SKIP' + C[i – 2] · SKIP (a) S1A1A2 CIN COUTCSA S2A3 COUT[MSB] CIN[0](b) COUT[MSB–1] ΣA2[MSB:0]A4[MSB:0] ++A3[MSB:0] + Σ+++ A1[MSB:0] Σ S[MSB:0]+ + (c) (d) Σ S[MSB:0]++ΣA2[MSB:0] A4[MSB:0] ++A3[MSB:0] + A1[MSB:0] Σ+++ CLKCLK (e) ΣA1[MSB:0]A3[MSB:0] S1[MSB:0]++A2[MSB:0] S2[MSB:0]+ OV (f) (g) CSA1 CSA2 RCA CSA1 CSA2 RCApipeline registers RCACLK CLK pipeline registers CSA1 CSA21 23 45 1 2 3 4 5 nn nnnn n n nn nn n n n n nn n nn COUT[MSB] COUT[MSB–1] CSA1 CSA2 RCAbit slice MSB LSB 24 SECTION 2 CMOS LOGIC ASICS... THE COURSE Carry-lookahead adder (CLA, for example the Brent–Kung adder): Carry-select adder duplicates two small adders for the cases CIN='0' and CIN='1' and then uses a MUX to select the case that we need C[1] = G[1] + P[1] · C[0] = G[1] + P[1] · (G[0] + P[1] · C[–1]) = G[1] + P[1] · G[0] C[2] = G[2] + P[2] · G[1] + P[2] · P[1] · G[0] , C[3] = G[3] + P[2] · G[2] + P[2] · P[1] · G[1] + P[3] · P[2] · P[1] · G[0] The Brent–Kung carry-lookahead adder A[i] B[i] G[i]P[i] G[i +1]P[i +1] G[0]P[0]G[1]P[1] C[1] =G[1]+P[0]P[2]P[0]P[1]G[2] C[2] =G[2]+P[2]G[1]+P[2]P[1]G[0]P[3]P[0]P[1]P[2]G[3] C[3]= G[3]+P[3]G[2]+ P[3]P[2]G[1] +P[3]P[2]P[1]G[0]P[0]P[1]P[2]P[3] CLG CLG CLG CLG G[i +1]+P[ i] P[i]P[i+ 1] G[0]P[0]G[1]P[1] G[2]P[2]G[3]P[3] CLG C[3] C[2]C[1] L1 L2 L3 L4 01 2 3 123 012 3 1 32 (a) (b) (c) (d) (e) (f) CLG CLG CLG L1 L2 L3 012 345 67 0 0 012 345 67 G[i]/P[i] in C[i] out Each wire is a bundle ofG[i +1]+P[ i] and P[i]P[i +1]. (g) A[i] B[i] G[i]P[i] Sum[i]C[i] orP[i] Create generate and propagate signals.Create carry signals.Create sum signals. ASICs... THE COURSE 2.6 Datapath Logic Cells 25 The conditional-sum adder A[0] B[0] C1_0_0 H0 C[0] C1_0_1 A[1] B[1]H1stage 0 1 2 S[1] C[2] S[0] bit 1 0 Q1_0 Q2_1 A[i] B[i]H A[i] ⊕B[i](A[i] ⊕ B[i])'A[i].B[i]A[i]+B[ i] (a) (c) Ci_j_kSi_j_1 orCi_j_1 Si_j_0 orCi_j_0 G111 Si_j_k orCi_j_kSi_j_k or Ci_j_k Qi_j (b) (k =0 or 1) Ci_j_k =carry in to the ith bit assuming the carry in to the jth bit is k (k =0 or 1)Si_j_k =sum at the ith bit assuming the carry in to the jth bit is k (k =0 or 1) Ci_j_kSi_j_0 orCi_j_0 Si_j_1 orCi_j_1 carry out (carry in=0)sum (carry in =0) sum (carry in =1)carry out (carry in=1) Q1_1 26 SECTION 2 CMOS LOGIC ASICS... THE COURSE 2.6.3 A Simple Example 2.6.4 Multipliers ? Mental arithmetic: 15 (multiplicand) × 19 (multiplier) = 15×(20–1) = 15×21 ? Suppose we want to multiply by B=00010111 (decimal 16+4+2+1=23) ? Use the canonical signed-digit vector (CSD vector) D=00101001 (decimal 32–8+1= 23) ? B has a weight of 4, but D has a weight of 3 — and saves hardware An 8-bit conditional-sum adder module m8bitCSum (C0, a, b, s, C8); // Verilog conditional-sum adder for an FPGA //1 input [7:0] C0, a, b; output [7:0] s; output C8; //2 wire A7,A6,A5,A4,A3,A2,A1,A0,B7,B6,B5,B4,B3,B2,B1,B0,S8,S7,S6,S5,S4,S3,S2 ,S1,S0; //3 wire C0, C2, C4_2_0, C4_2_1, S5_4_0, S5_4_1, C6, C6_4_0, C6_4_1, C8; //4 assign {A7,A6,A5,A4,A3,A2,A1,A0} = a; assign {B7,B6,B5,B4,B3,B2,B1,B0} = b; //5 assign s = { S7,S6,S5,S4,S3,S2,S1,S0 }; //6 assign S0 = A0^B0^C0 ; // start of level 1: & = AND, ^ = XOR, | = OR, ! = NOT //7 assign S1 = A1^B1^(A0&B0|(A0|B0)&C0) ; //8 assign C2 = A1&B1|(A1|B1)&(A0&B0|(A0|B0)&C0) ; //9 assign C4_2_0 = A3&B3|(A3|B3)&(A2&B2) ; assign C4_2_1 = A3&B3|(A3|B3)&(A2|B2) ; //10 assign S5_4_0 = A5^B5^(A4&B4) ; assign S5_4_1 = A5^B5^(A4|B4) ; //11 assign C6_4_0 = A5&B5|(A5|B5)&(A4&B4) ; assign C6_4_1 = A5&B5|(A5|B5)&(A4|B4) ; //12 assign S2 = A2^B2^C2 ; // start of level 2 //13 assign S3 = A3^B3^(A2&B2|(A2|B2)&C2) ; //14 assign S4 = A4^B4^(C4_2_0|C4_2_1&C2) ; //15 assign S5 = S5_4_0& !(C4_2_0|C4_2_1&C2)|S5_4_1&(C4_2_0|C4_2_1&C2) ; //16 assign C6 = C6_4_0|C6_4_1&(C4_2_0|C4_2_1&C2) ; //17 assign S6 = A6^B6^C6 ; // start of level 3 //18 assign S7 = A7^B7^(A6&B6|(A6|B6)&C6) ; //19 assign C8 = A7&B7|(A7|B7s)&(A6&B6|(A6|B6)&C6) ; //20 endmodule //21 ASICs... THE COURSE 2.6 Datapath Logic Cells 27 Datapath adders To recode (or encode) any binary number, B, as a CSD vector, D: Di = Bi + Ci – 2Ci + 1 , where Ci+1 is the carry from the sum of Bi+1 +Bi+Ci (we start with C0=0). If B=011 (B2=0, B1=1, B0=1; decimal 3), then: D0 = B0 + C0 – 2C1 = 1 + 0 – 2 = 1, D1 = B1 + C1 – 2C2 = 1 + 1 – 2 = 0, D2 = B2 + C2 – 2C3 = 0 + 1 – 0 = 1, so that D= 101 (decimal 4–1=3). We can use a radix other than 2, for example Booth encoding (radix-4): B=101001 (decimal 9–32=–23) ? E=1 21 (decimal –16–8+1=–23) B=01011 (eleven) ? E=11 1 (16–4–1) B=101 ? E=11 normalizeddelay bits8 16 32 64 120 80 40 area/k λ2 bits8 16 32 64 3000 2000 10002-input NAND =1 ripple-carrycarry-select carry-save ripple-carrycarry-select carry-save (b)(a) 28 SECTION 2 CMOS LOGIC ASICS... THE COURSE Tree-based multiplication – at each stage we have the following three choices: (1) sum three outputs using a full adder (2) sum two outputs using a half adder (3) pass the outputs to the next stage FAA B Sum COUT CINfull adder S31S51 S41 S22S42 S32 S23 S14 S13 S04 S33 S24 S50 P5 P4P6 '0''0' S40 '0' '0' S15 S05 '0' '0' S41S50 S32 S14S23 S05 P5 '0' a0 b1 c2 d3 e4 f5 b0 c1 d2 e3 f4 a1 b2 c3 d4 e5 f6 5.1 5.2 5.3 5.4 5.5 Wallace treecarry-save chain(a) (b) (c) fulladder halfadder Each dotrepresentsan output ofone stage and aninput to thenext. P5 S50S41S32 S23S14S05 5.1 5.4 5.5 5.25.3 1 23 4 0 56 redundantcarry ASICs... THE COURSE 2.6 Datapath Logic Cells 29 A Wallace-tree multiplier works forward from the multiplier inputs ? Full adder is a 3:2 compressor or (3, 2) counter ? Half adder is a (2, 2) counter FAA B Sum COUT CINfulladder S50 S32 S05 P5 '0' 12 34 56 7 0 S41 S15S33S24 P6 P7P8P9P10P11 S55 S42S51S45S54 S44S53 S43S52 S34S35 S25 P4 P3 P2 P1 '0' '0' '0' '0' '0'S04 S03 S02 S31 S30 S14S23 S13S22 S12S21 S11S20 S01S10 S40 1 2 3 4 5 6 7 15 P5 S00 P01 2 310 11 12 17 4 5 13 18 19 22 23 25 26 27 28 29 30 6 7 8 9 1614 20 24 21 30 SECTION 2 CMOS LOGIC ASICS... THE COURSE The Dadda multiplier works backward from the final product ? Each stage has a maximum of 2, 3, 4, 6, 9, 13, 19, ...outputs (each successive stage is 3/2 times larger—rounded down to an integer The number of stages and thus delay (in units of an FA delay—excluding the CPA) for an n-bit tree-based multiplier using (3, 2) counters is log1.5 n = log10 n/log10 1.5 = log10 n/0.176 P1P2P3P4P5P6P8 P7P9P10P11 S04S13 S03 S12 S40 '0'S22S25S34 '0'S10S01 S02 S11 S35S44 S55 S54 S52 S53 S50 S21S31 S30 S24 '0'S33S42S51S15 S14 '0'S23S32S41S05S43 P0 S00 S20'0'S45 12 34 0 '0' 1 7 8 9 10 2 3 4 13 14 15 16 17 21 22 23 24 25 26 6 11 12 5 18 19 27 28 20 29 30 ASICs... THE COURSE 2.6 Datapath Logic Cells 31 Ferrari–Stefanelli architecture “nests” multipliers (a) (b) A0 B0 B3S32 (3, 2) counter A0 B0 B3 A3 Z0 2-bitsubmultiplier (3, 2)counter A3B2 S32 two-inputAND A0B0 A1B0A 0B1 B1 B1A0B0A1 Z'0 Z'1 Z'2 Z'4 (c) A1A2 B1B2B1B2 A3 A1A2 32 SECTION 2 CMOS LOGIC ASICS... THE COURSE 2.6.5 Other Arithmetic Systems ? 101 (decimal) is 1100101 (in binary and CSD vector) or 11100111 ? 188 (decimal) is 10111100 (in binary), 111000100, 101001100, or 101000100 (CSD vector) ? 101 is represented as 010010 (using sign magnitude) — rather wasteful Residue number system ? 11 (decimal) is represented as [1, 2] residue (5, 3) ? 11R5=11 mod 5=1 and 11R3=11 mod 3=2 ? The size of this system is 3×5=15 ? We can now add, subtract, or multiply without using any carry binary decimal redundant binary CSD vector 1010111 87 10101001 10101001 addend + 1100101 101 + 11100111 + 01100101 augend 01001110 = 11001100 intermediate sum 11000101 11000000 intermediate carry = 10111100 = 188 111000100 101001100 sum Redundant binary addition ? redundant binary encoding avoids carry propagation A[i] B[i] A[i–1] B[i–1] Intermediatesum Intermediate carry 1 1 x x 0 1 1 0 A[i–1]=0/1 and B[i–1]=0/1 1 0 0 1 A[i–1]=1 or B[i–1]=1 1 1 1 1 x x 0 0 1 1 x x 0 0 0 0 x x 0 0 0 1 A[i–1]=0/1 and B[i–1]=0/1 1 1 1 0 A[i–1]=1 or B[i–1]=1 1 0 1 1 x x 0 1 ASICs... THE COURSE 2.6 Datapath Logic Cells 33 4 [4, 1] 12 [2, 0] 3 [3, 0] + 7 + [2, 1] – 4 – [4, 1] × 4 × [4, 1] = 11 = [1, 2] = 8 = [3, 2] = 12 = [2, 0] The 5, 3 residue number system n residue 5 residue 3 n residue 5 residue 3 n residue 5 residue 3 0 0 0 5 0 2 10 0 1 1 1 1 6 1 0 11 1 2 2 2 2 7 2 1 12 2 0 3 3 0 8 3 2 13 3 1 4 4 1 9 4 0 14 4 2 34 SECTION 2 CMOS LOGIC ASICS... THE COURSE 2.6.6 Other Datapath Operators Symbols for datapath elements Full subtracter DIFF = A ⊕ NOT(B) ⊕ ΝΟΤ(BIN) = SUM(A, NOT(B), NOT(BIN)) NOT(BOUT ) = A · NOT(B) + A · NOT(BIN) + NOT(B) · NOT(BIN) = MAJ(NOT(A), B, NOT(BIN)) Keywords: adder/subtracter ? barrel shifter ? normalizer ? denormalizer ? leading-one detector ? priority encoder ? exponent correcter ? accumulator ? multiplier–accumulator (MAC) ? incrementer ? decrementer ? incrementer/decrementer ? all-zeros detector ? all-ones detector ? register file ? first-in first-out register (FIFO) ? last-in first-out register (LIFO) Q[MSB:0]CLK PRED[MSB:0] S0 1A[MSB:0]B[MSB:0] Z[MSB:0] Σ S[MSB:0]A[MSB:0]B[MSB:0] ++/-Z[MSB:0]+/-1 =1 Z=0 Z(a) A[MSB:0]B[MSB:0] Z[MSB:0] B[MSB:0]A Z[MSB:0] (b) (c) (d) (e) (f) (g) (h) ASICs... THE COURSE 2.7 I/O Cells 35 2.7 I/O Cells 2.8 Cell Compilers 2.9 Summary ? The use of transistors as switches ? The difference between a flip-flop and a latch ? The meaning of setup time and hold time Keywords:Tri-State? is a registered trademark of National Semiconductor) ? drivers ? con- tention ? bus keeper or bus-hold cell (TI calls this Bus-Friendly logic) ? slew rate ? power- supply bounce ? simultaneously switching outputs (SSOs) ? quiet-I/O ? bidirectional I/O ? open-drain? level shifter? electrostatic discharge, or ESD ? electrical overstress(EOS) ? ESD implant? human-body model(HBM) ? machine model(MM) ? charge-device model (CDM, also called device charge–discharge) ? latch-up ? undershoot ? overshoot ? guard rings A three-state bidirectional output buffer Keywords: silicon compilers ? RAM compiler ? multiplier compiler ? single-port RAM ? dual-port RAMs ? multiport RAMs ? asynchronous ? synchronous ? model compiler ? netlist compiler ? correct by construction I/Opad VDDOE DATAout M1 M2 ND1 NR1 I2 DATAin I1 outputenable to corelogic from corelogic 36 SECTION 2 CMOS LOGIC ASICS... THE COURSE ? Pipelines and latency ? The difference between datapath, standard-cell, and gate-array logic cells ? Strong and weak logic levels ? Pushing bubbles ? Ratio of logic ? Resistance per square of layers and their relative values in CMOS ? Design rules and λ 2.10 Problems Suggested homework: 2.1, 2.2, 2.38, 2.39 (from ASICs... the book)