ASICs...THE COURSE (1 WEEK)
1
PROGRAMMABLE
ASIC LOGIC
CELLS
5.1 Actel ACT
5.1.1 ACT 1 Logic Module
Key concepts: basic logic cell ? multiplexer-based cell ? look-up table (LUT) ? programmable
array logic (PAL) ? influence of programming technology ? timing ? worst-case design
The Actel ACT architecture
(a) Organization of the basic logic cells
(b) The ACT 1 Logic Module (LM, the Actel basic logic cell). The ACT 1 family uses just
one type of LM. ACT 2 and ACT 3 FPGA families both use two different types of LM
(c) An example LM implementation using pass transistors (without any buffering)
(d) An example logic macro. Connect logic signals to some or all of the LM inputs, the re-
maining inputs to VDD or GND
F
(b) (c) (d)
S3
FA0
SA F1A1B0
SB
F2B1
S0S1 O1
S
A0A1
SA
01 0
1SB0
B1SB 01S
S0S1
M1
M2
O1
M3
S3
F1
F2
01 0
101
D'1'
D
A'1'
C
'0'B
F
(a)
Actel ACT
Logic Module Logic Module Logic Module
F=(A ·B) +(B' ·C)+D
F1
F2
5
2 SECTION 5 PROGRAMMABLE ASIC LOGIC CELLS ASICS... THE COURSE
5.1.2 Shannon’s Expansion Theorem
? We can use the Shannon expansion theorem to expand F =A·F(A='1') +
A'·F(A='0')
Example: F =A'·B + A·B·C' + A'·B'·C = A·(B·C') + A'·(B + B'·C)
? F(A='1')=B·C' is the cofactor of F with respect to (wrt) A or FA
? If we expand F wrt B, F =A'·B + A·B·C' + A'·B'·C = B·(A' + A·C') + B'·(A'·C)
? Eventually we reach the unique canonical form, which uses only minterms
? (A minterm is a product term that contains all the variables of F—such as A·B'·C)
Another example: F=(A·B) + (B'·C) + D
? Expand F wrt B: F=B·(A + D) + B'·(C + D) =B·F2 + B'·F1
? F = 2:1 MUX, with B selecting between two inputs: F(A='1') and F(A='0')
? F also describes the output of the ACT 1 LM
? Now we need to split up F1 and F2
? Expand F2 wrt A, and F1 wrt C: F2=A + D=(A·1) + (A'·D); F1=C + D=(C·1) + (C'·D)
? A, B, C connect to the select lines and '1' and D are the inputs of the MUXes in the
ACT 1 LM
? Connections: A0=D, A1='1', B0=D, B1='1', SA=C, SB=A, S0='0', and S1=B
ASICs... THE COURSE 5.1 Actel ACT 3
5.1.3 Multiplexer Logic as Function Generators
Example of using the WHEEL functions to implement F=NAND(A, B)=(A·B)'
? 1. First express F as the output of a 2:1 MUX: we do this by expanding F wrt A (or wrt
B; since F is symmetric) F=A·(B') + A'·('1')
? 2. Assign WHEEL1 to implement INV(B), and WHEEL2 to implement '1'
? 3. Set the select input to the MUX connecting WHEEL1 and WHEEL2, S0+S1=A. We
can do this using S0=A, S1='1'
The 16 logic functions of 2 variables:
? 2 of the 16 functions are not very in-
teresting (F='0', and F='1')
? There are 10 functions that we can
implement using just one 2:1 MUX
? 6 functions are useful: INV, BUF,
AND, OR, AND1-1, NOR1-1
Boolean functions using a 2:1 MUX
Function, F F= Canonical form Min-terms
Min-
term
code
Func-
tion
number
M1
A0 A1 SA
1 '0' '0' '0' none 0000 0 0 0 0
2 NOR1-1(A, B) (A+B') A'·B 1 0010 2 B 0 A
3 NOT(A) A' A'·B' + A'·B 0, 1 0011 3 0 1 A
4 AND1-1(A, B) A·B' A·B' 2 0100 4 A 0 B
5 NOT(B) B' A'·B' + A·B' 0, 2 0101 5 0 1 B
6 BUF(B) B A'·B + A·B 1, 3 1010 6 0 B 1
7 AND(A, B) A·B A·B 3 1000 8 0 B A
8 BUF(A) A A·B' + A·B 2, 3 1100 9 0 A 1
9 OR(A, B) A+B A'·B + A·B' + A·B 1, 2, 3 1110 13 B 1 A
10 '1' '1' A'·B' + A'·B + A·B' + A·B 0, 1, 2, 3 1111 15 1 1 1
14 functions of 2 variables (and F='0', F ='1' makes 16)
0 1AB
1 01 101
F
4 ways toarrangeone '0'
0 1AB
1 10 001
F
6 ways toarrangetwo '1's
0 1AB
0 10 001
F
4 ways toarrangeone '1'
4 SECTION 5 PROGRAMMABLE ASIC LOGIC CELLS ASICS... THE COURSE
5.1.4 ACT 2 and ACT 3 Logic Modules
? ACT 1 requires 2 LMs per flip-flop: with unknown interconnect capacitance
? ACT 2 and ACT 3 use two types of LMs, one includes a D flip-flop
? ACT 2 C-Module is similar to the ACT 1 LM but can implement five-input logic func-
tions
? combinatorial module implements combinational logic (blame MMI for the misuse of
terms)
? ACT 2 S-Module (sequential module) contains a C-Module and a sequential ele-
ment
The ACT 1 Logic Module as a Boolean function generator
(a) A 2:1 MUX viewed as a function wheel
(b) The ACT 1 Logic Module is two function wheels, an OR gate, and a 2:1 MUX
? A 2:1 MUX is a function wheel that can generate BUF, INV, AND-11, AND1-1, OR, AND
? WHEEL(A, B) =MUX(A0, A1, SA)
? MUX(A0, A1, SA)=A0·SA' + A1·SA
? The inputs (A0, A1, SA) ={A, B, '0', '1'}
? Each of the inputs (A0, A1, and SA) may be A, B, '0', or '1'
? The ACT 1 LM is built from two function wheels, a 2:1 MUX, and a two-input OR gate
? ACT 1 LM =MUX [WHEEL1, WHEEL2, OR(S0, S1)]
(a)
A0A1
SA
01 M1
(b)
F01 010
1
01
M1
M2 WHEEL1
WHEEL2
F
C, D
A, BBUF INVAND-11
NOR1-1
AND1-1
NOR-11OR AND S0
S1S0S1A two-input MUXcan implementthese functions,selected by A0,A1, and SA. The ACT 1 Logic Module can
implement these functions.
F1 M3 M3
S3 S3
M2
WHEEL
M1
ASICs... THE COURSE 5.1 Actel ACT 5
5.1.5 Timing Model and Critical Path
Example of timing calculations (a rather complex examination of internal module timing):
? The setup and hold times, measured inside (not outside) the S-Module, are t'SUD and
t'H (a prime denotes parameters that are measured inside the S-Module)
? The clock–Q propagation delay is t'CO
? The parameters t'SUD, t'H, and t'CO are measured using the internal clock signal CLKi
? The propagation delay of the combinational logic inside the S-Module is t'PD
? The delay of the combinational logic that drives the flip-flop clock signal is t'CLKD
? From outside the S-Module, with reference to the outside clock signal CLK1:
tSUD=t'SUD + (t'PD – t'CLKD), tH=t'H + (t'PD – t'CLKD), tCO=t'CO + t'CLKD
? We do not know the internal parameters t'SUD, t'H, and t'CO, but assume reasonable
values:
t'SUD=0.4ns, t'H=0.1ns, t'CO=0.4ns.
? t'PD (combinational logic inside the S-Module) is equal to the C-Module delay, so
t'PD=3ns for the ACT 3
? We do not know t'CLKD; assume a value of t'CLKD=2.6ns (the exact value does not mat-
ter)
? Thus the external S-Module parameters are: tSUD=0.8ns, tH=0.5ns, tCO=3.0ns
? These are the same as the ACT 3 S-Module parameters (I chose t'CLKD so they would
be)
? Of the 3.0ns combinational logic delay: 0.4ns increases the setup time and 2.6ns
increases the clock–output delay, tCO
? Actel says that the combinational logic delay is buried in the flip-flop setup time. But this
is borrowed money—you have to pay it back.
5.1.6 Speed Grading
? Speed grading (or speed binning) uses a binning circuit
? Measure tPD=(tPLH + tPHL)/2 — and use the fact that properties match across a chip
? Actel speed grades are based on 'Std' speed grade
Keywords and concepts: timing model ? deals only with internal logic ? estimates delays ?
before place-and-route step ? nondeterministic architecture ? find slowest register–register
delay or critical path
6 SECTION 5 PROGRAMMABLE ASIC LOGIC CELLS ASICS... THE COURSE
? '1' speed grade is approximately 15 percent faster than 'Std'
? '2' speed grade is approximately 25 percent faster than 'Std'
? '3' speed grade is approximately 35 percent faster than 'Std'.
Actel ACT 2 and ACT 3 Logic Modules
(a) The C-Module for combinational logic
(b) The ACT 2 S-Module
(c) The ACT 3 S-Module
(d) The equivalent circuit (without buffering) of the SE (sequential element)
(e) The SE configured as a positive-edge–triggered D flip-flop
D00D11D01D10
S1S0
Y OUT
(a)
A1B1A0
B0 S1
D00D11D01D10 Y Q
CLRCLK (c)
A1B1
S0A0B0S1
D00D11D01D10 Y Q
CLRCLK
(b)
A1B1
S0A0
D QZS01 ZS01
C2C1
CLR
(d)
SE SE
SE (sequential element)
C2D QC1
CLR
SE
D Q
CLR
CLK
DCLK
(e)
masterlatch slavelatch
combinationallogic for clockand clear 1D
C1
Q
C-Module S-Module (ACT 2) S-Module (ACT 3)
flip-flop macro
ASICs... THE COURSE 5.1 Actel ACT 7
Timing views from inside and outside the Actel ACT S-module
(a) Timing parameters for a 'Std' speed grade ACT 3
(b) Flip-flop timing
(c) An example of flip-flop timing based on ACT 3 parameters
CLKi QiDi
CLK1
CL
t'PD
t'CLKD
t'SUD(t'H) t'CO
D1
QD
tSUD = t'SUD + t'PD – t'CLKD
(tH) tCOtSUD
tCO = t'CO + t'CLKDtH = t'H + t'PD – t'CLKD
Q1
Q1D1CLK1 QD
tSUD = (0.4 + 3.0 – 2.6) = 0.8ns
tCO = (0.4 + 2.6) = 3.0nstH = (0.1 + 3.0 – 2 ˇ) = 0.5ns
0.8ns(0.5ns)3.0 ns
CLKi QiDi
3ns
2.6ns
0.4 ns(0.1 ns)0.4ns
D1
(c)(b)
Q1
Q1D1CLK1
S-Module S-Module
CL
(a)
S1 S1
Viewfrominsidelooking
out. Viewfrom
outsidelookingin.
CLK1
combinationallogic delay setuptime clock tooutput delay setuptime clock tooutput delay
S2clock buffer S1C1
tPD tSUD tCO
Q
CLK1
D
3.0ns 0.8ns 3.0ns
CL CL?
internal clockCLK ? = variable routing delay?
O1I1 D Q?
?CLK2
tSUD tCO
0.8ns 3.0ns
?
clockpad
internalsignalinternalsignal S-Module(tH)(hold time)(0.5ns)C-Module S-Module
timingparameters typical
figures
8 SECTION 5 PROGRAMMABLE ASIC LOGIC CELLS ASICS... THE COURSE
5.1.7 Worst-Case Timing
5.1.8 Actel Logic Module Analysis
? Actel uses a fine-grain architecture which allows you to use almost all of the FPGA
? Synthesis can map logic efficiently to a fine-grain architecture
Keywords and concepts: Using synchronous design you worry about how slow your circuit
may be—not how fast ? ambient temperature, TA ? package case temperature, TC (military)
? temperature of the chip, the junction temperature, TJ ? nominal operating conditions:
VDD=5.0V, and TJ=25°C ? worst-case commercial conditions: VDD=4.75V, and TJ=+70°C
? always design using worst-case timing ? derating factors ? critical path delay between
registers ? process corner (slow–slow ? fast–fast ? slow–fast ? fast–slow) ? Commercial.
VDD=5V ± 5%, TA (ambient)=0 to +70°C ? Industrial. VDD=5V ± 10%, TA (ambient)=–40 to
+85°C ? Military: VDD=5V ± 10%, TC (case)=–55 to +125°C ? Military: Standard MIL-STD-
883C Class B ? Military extended: unmanned spacecraft
ACT 3 timing parameters
Fanout
Family Delay 1 2 3 4 8
ACT 3-3 (data book) tPD 2.9 3.2 3.4 3.7 4.8
ACT3-2 (calculated) tPD/0.85 3.41 3.76 4.00 4.35 5.65
ACT3-1 (calculated) tPD/0.75 3.87 4.27 4.53 4.93 6.40
ACT3-Std (calculated) tPD/0.65 4.46 4.92 5.23 5.69 7.38
ACT 3 derating factors
Temperature TJ (junction)/°C
VDD/V –55 –40 0 25 70 85 125
4.5 0.72 0.76 0.85 0.90 1.04 1.07 1.17
4.75 0.70 0.73 0.82 0.87 1.00 1.03 1.12
5.00 0.68 0.71 0.79 0.84 0.97 1.00 1.09
5.25 0.66 0.69 0.77 0.82 0.94 0.97 1.06
5.5 0.63 0.66 0.74 0.79 0.90 0.93 1.01
ASICs... THE COURSE 5.2 Xilinx LCA 9
? Physical symmetry simplifies place-and-route (swapping equivalent pins on opposite
sides of the LM to ease routing)
? Matched to small antifuse programming technology
? LMs balance efficiency of implementation and efficiency of utilization
? A simple LM reduces performance, but allows fast and robust place-and-route
5.2 Xilinx LCA
5.2.1 XC3000 CLB
? A 32-bit look-up table (LUT)
? CLB propagation delay is fixed (the LUT access time) and independent of the logic
function
? 7 inputs to the XC3000 CLB: 5 CLB inputs (A–E), and 2 flip-flop outputs (QX and QY)
? 2 outputs from the LUT (F and G). Since a 32-bit LUT requires only five variables to
form a unique address (32=25), there are several ways to use the LUT:
? Use 5 of the 7 possible inputs (A–E, QX, QY) with the entire 32-bit LUT (the CLB out-
puts (F and G) are then identical)
? Split the 32-bit LUT in half to implement 2 functions of 4 variables each; choose 4 input
variables from the 7 inputs (A–E, QX, QY).You have to choose 2 of the inputs from the 5
CLB inputs (A–E); then one function output connects to F and the other output connects
to G.
? You can split the 32-bit LUT in half, using one of the 7 input variables as a select input
to a 2:1 MUX that switches between F and G (to implemen some functions of 6 and 7
variables).
5.2.2 XC4000 Logic Block
Keywords and concepts: Xilinx LCA (a trademark, logic cell array) ? configurable logic block
? coarse-grain architecture
10 SECTION 5 PROGRAMMABLE ASIC LOGIC CELLS ASICS... THE COURSE
The Xilinx XC3000 CLB (configurable logic block)
(Source: Xilinx.)
D Q
RD QXF
GQY
D Q
RD
FG
QX
QY
combinationalfunctionA BC
DE
EC enable clock
K clockRD reset direct'1' (enable)
'0' (inhibit)(global reset)
DI data in
X
Y
CLBoutputs
flip-flop
flip-flop
M M
M
M
M M
CL
M
M programmable MUX
ASICs... THE COURSE 5.2 Xilinx LCA 11
The Xilinx XC4000 family CLB (configurable logic block). (Source: Xilinx.)
YCLBoutputsG'H'
D QY
EC RD
SD
1 M
XF'
G'H'F'DIN
G'H'F'DIN
1 M
D QX
EC RD
SDSET/RSTcontrol
SET/RSTcontrolH1 DIN EC S/R
C1 C2 C3 C4
K global clock M
four control lines per CLB for internalcontrol or SRAM control
F1:F4 4
G1:G4 4
programmableMUX
carrylogic
carryin
carryin carryout
carryout
carrylogic4
4 M
M
M
M
to/from adjacent CLB = programmable MUXM
to/from adjacent CLB
flip-flop
flip-flop
CL
CL
LUT
LUT LUT
clockenable
12 SECTION 5 PROGRAMMABLE ASIC LOGIC CELLS ASICS... THE COURSE
5.2.3 XC5200 Logic Block
5.2.4 Xilinx CLB Analysis
The use of a LUT has advantages and disadvantages:
? An inverter is as slow as a five-input NAND
? A LUT simplifies timing of synchronous logic
? Matched to large SRAM programming technology
Xilinx uses two speed-grade systems:
? Maximum guaranteed toggle rate of a CLB flip-flop (in MHz) as a suffix—higher is
faster
? Example: Xilinx XC3020-125 has a toggle frequency of 125MHz
? Delay time of the combinational logic in a CLB in ns—lower is faster
? Example: XC4010-6 has tILO=6.0ns
? Correspondence between grade and tILO is fairly accurate for the XC2000, XC4000,
and XC5200 but not for the XC3000
The Xilinx XC5200 family Logic Cell (LC) and configurable logic block (CLB).(Source: Xilinx.)
D Q
CLRcombinational
function
Qflip-flop orlatch
DO
X
M
CI
COF5_MUX
F
data in
LUT
carryin
carryoutLC0 to LC1 andLC2 to LC3 only
F4:F1
DI LC3LC2
LC1LC0
CLBCE, CK, CLR
01 SM
M
4
Logic Cell (LC)
(4 LCs in a CLB) CE,CK,CLR
carrychain
M = programmable MUX
CECLK
3 3
ASICs... THE COURSE 5.2 Xilinx LCA 13
Xilinx LCA timing model (XC5210-6)
(Source: Xilinx.) O1
CLB3CLB2CLB1 Q
CLKC3
DD Q CL CL? ??
internal clock
? ?
I1
CLKC1
tCKO tILO tICK tCKOtDICK clock to
outputdelay combinationallogic delay setuptime clock tooutput delaysetuptime0.8 ns 5.6ns 2.3ns 5.8ns5.8ns
IK
I2
? = variable routing delay
internalsignal
14 SECTION 5 PROGRAMMABLE ASIC LOGIC CELLS ASICS... THE COURSE
5.3 Altera FLEX
The Altera FLEX architecture
(a) Chip floorplan
(b) Logic Array Block (LAB)
(c) Details of the Logic Element (LE)
(Source: Altera (adapted with permission).)
D Q
CLRflip-flop
OUTM
CRYI
CRYO
F
carryin
carryout
D4:D1
CASCOcascade
out
CASCIcascadein
carrychain
cascadechainLC2:LC1 CLK
PRE
PRE, CLR
CLK
LC4:LC1 M = programmableMUXLC4:LC3
D3
D4:D1
CL CL44
Logic Element (LE)
Logic ArrayBlock (LAB)
Altera FLEX
8 LEsper LAB
CL
LUT
(a)
(b)
(c)
LE3LE2
LE1
localinterconnect
LE0
LE2 M
ASICs... THE COURSE 5.4 Altera MAX 15
5.4 Altera MAX
A registered PAL with i inputs, j product terms, and k macrocells. (Source: Altera (adapted with
permission).)
Features and keywords:
? product-term line
? programmable array logic
? bit line
? word line
? programmable-AND array (or product-term array)
? pull-up resistor
? wired-logic
? wired-AND
? macrocell
? 22V10 PLD
1
D Q
productterm
i inputs
j-wide OR array
j
OUT
A B C i
macrocell
programmable AND array (2i × jk) k macrocells
j
CLK
16 SECTION 5 PROGRAMMABLE ASIC LOGIC CELLS ASICS... THE COURSE
5.4.1 Logic Expanders
The Altera MAX architecture (the macrocell details vary between the MAX families—the func-
tions shown here are closest to those of the MAX 9000 family macrocells) (Source: Altera
(adapted with permission).) (a) Organization of logic and interconnect (b) LAB (Logic Array
Block) (c) Macrocell
Features:
? Logic expanders and expander terms (helper terms) increase term efficiency
? Shared logic expander (shared expander, intranet) and parallel expander (internet)
? Deterministic architecture allows deterministic timing before logic assignment
? Any use of two-pass logic breaks deterministic timing
? Programmable inversion increases term efficiency
MDQ
systemclock(s)
sharedexpander
macrocell 2
chipwideinterconnect
(a) LAB
LAB
LAB
LAB
LAB
LAB
LAB(Logic Array Block)
16macrocellsper LAB
(b)
(c)
AlteraMAX
systemclear
parallel expanderto next macrocell
3
5 producttermselect5
clock, clear,preset, enableprogrammableinversion
macrocelloutput
othermacrocellsin LABmacrocell feedback
OUT
114
macrocell 1
LA
LA(localarray)
ASICs... THE COURSE 5.4 Altera MAX 17
5.4.2 Timing Model
Altera MAX timing model (ns for the MAX 9000 series, '15' speed grade) (Source: Altera .)
(a) A direct path through the logic array and a register
(b) Timing for the direct path
(c) Using a parallel expander
(d) Parallel expander timing
(e) Making two passes through the logic array to use a shared expander
(f) Timing for the shared expander (there is no register in this path)
logicarraytLAD4.0
O1I1
setup registerdelaytSU tRD3.0 1.0
M1 internalsignalinternalsignal
localarray
LA
tLOCAL
0.5 t1 t2 t3
t1 t2 t3
t4
localarray macrocellarray
M1
M2
O1
I1
I2
O2M1M2I2
internalsignal
parallelexpander
M1
tPEXP
1.0
O2
setup registerdelaytSU tRD3.0 1.0
M2 internalsignal
localarray
LA
tLOCAL
0.5 logicarray
tLAD
4.0
I3internalsignal
sharedexpander
M1
tSEXP
5.0logicarray
tLAD
4.0 M1
M2
I3
O3
t4
t5
t1 t2 t3localarray
LA
tLOCAL
0.5 localarrayLA
tLOCAL
0.5
O3
combinationaltCOMB1.0
M2 internalsignal t4 t5
LA
LA
LA
t1 t2 t3t4
t1 t2
t1 t2
t3 t4t5
t3 t4 t5
(c)
(a)
(e)
(d)
(b)
(f)
total=8.5ns
total=9.5ns
total=11ns
18 SECTION 5 PROGRAMMABLE ASIC LOGIC CELLS ASICS... THE COURSE
5.4.3 Power Dissipation in Complex PLDs
5.5 Summary
5.6 Problems
Key points: static power ? Turbo Bit
Key points: The use of multiplexers, look-up tables, and programmable logic arrays ? The dif-
ference between fine-grain and coarse-grain FPGA architectures ? Worst-case timing design ?
Flip-flop timing ? Timing models ? Components of power dissipation in programmable ASICs ?
Deterministic and nondeterministic FPGA architectures