? 1997, VLSI Technology 1
The Ten Commandments of Excellent Design
Dowload from: http:// www.fpga.com.cn
Peter Chambers
Engineering Fellow
VLSI Technology
This report will give you some pointers that
will help you design synchronous circuits
that work first time. Ten commandments
that shouldalways be followed!
Using Synchronous Circuits
Synchronous digital systems are pervasive in today’s designs. Engineers create
clocked circuits for every conceivable application, with frequencies from DC to
GHz. Every synchronous system employs certain common characteristics, and
is prone to a group of common faults. These faults can cause instability and
unreliability, and may not be uncovered in the typical design process. The net
result is a poor product that fails to meet the design criteria, and the engineer
has to go through the suffering of design modification and revision. This is time-
consuming and costly. However, by applying a few simple rules, you can avoid
synchronous design faults in your designs and achieve consistent first-pass
success. In this article you’ll learn the sources of the most common problems
and their solutions, and how to apply these ideas to your designs.
Digital Systems 101
2 The Ten Commandments of Excellent Design
Digital Systems 101
We’ll begin by describing a typical synchronous circuit. Many variations are pos-
sible, but a simple example will be adequate to illustrate the sources of error.
Figure 1 shows the circuit and timing for one clocked element of the example.
One issue that deserves mention is this: Why use synchronous logic at all?
Wouldn’t asynchronous logic be faster? The answers to these questions could
take a book, but here are some reasons to use synchronous designs:
? Synchronous designs eliminate the problems associated with speed varia-
tions through different paths of logic. By sampling signals at well-defined
time intervals, fast paths and slow paths can be handled in a simple manner.
? Synchronous designs work well under variations of temperature, voltage and
process. This stability is key for high-volume manufacturing.
? Many designs must be portable—that is, they must be easy to migrate to a
new and improved technology (say, moving from .6 micron to .35 micron).
The deterministic behavior of synchronous designs makes them much more
straightforward to move to a new technology.
? Interfacing between two blocks of logic is simplified by defining standardized
synchronous behavior. Asynchronous interfaces demand elaborate hand-
shaking or token passing to ensure integrity of information; synchronous
designs with known timing characteristics can guarantee correct reception of
data.
Heck, I Know What a Flip-
Flop Is!
Synchronous circuits are made with a mixture of combinatorial logic and
clocked elements, such as flip flops or registers. The clocked elements share a
common clock, and all transition from one state to another on the rising edge of
Digital Systems 101
The Ten Commandments of Excellent Design 3
the clock. When the rising edge occurs, the registers propagate the logic levels
at their D inputs to their Q outputs.
FIGURE 1. Simple Example of a Synchronous Circuit
In Figure 1, two important timing parameters are defined:
? Setup Time—Tsu
Setup time is the time that the D input to a register must be valid before the
clock transitions.
? Hold Time—Th
Hold time is the period that the D input to a register must be maintained valid
after the clock has transitioned.
If the setup or hold time parameters are violated terrible things happen. We’ll
discuss this later in the section on synchronization.
D
Clock
Q
Combinatorial
Logic
Inputs Output
Clock
D
Q
Th
Tsu
Clock Distribution (Yawn)
4 The Ten Commandments of Excellent Design
Clock Distribution (Yawn)
The distribution of clocks throughout a design has received considerable atten-
tion with the increase in logic speed. Common-or-garden personal computers
have bus speeds of 66 MHz, and processor clocks run at 300 MHz or greater. In
this article we’re concerned more with the possible pitfalls in the synchronous
logic itself, not with the production of decent clocks. However, for completeness,
here are the important parameters necessary for a good clock distribution sys-
tem design:
? Skew Minimization
Clock skew is the variation in time of the clock’s active transition being
detected by different devices within a system. Skew must be kept to a mini-
mum to ensure that setup and hold times are not violated at any one device.
Methods for managing skew include equal-length traces, zero-delay PLL-
based buffers, and additional logic for extending hold times.
? Clock Fidelity
The clock’s waveform must be as clean and deterministic as possible. Tech-
niques used to guarantee consistent clock behavior include transmission line
termination, ground-bounce minimization, and the use of identical clock driv-
ers.
Good State Machine Design
The Ten Commandments of Excellent Design 5
Good State Machine Design
One of the designer’s most powerful constructs for synchronous design is the
state machine. Combining combinatorial logic and a number of registers, the
state machine is capable of making decisions based on its inputs and its current
state. The behavior of the state machine is entirely synchronous, with all deci-
sions taken at the time of the clock transition. There are two conventional forms
of state machine: Mealy and Moore. The characteristics of these machines are
shown in Figure 2.
FIGURE 2. Characteristics of Mealy and Moore Machines
? Moore Machines
Moore machines are the simpler of the two standard types. The output is a
function only of the current state of the machine.
? Mealy Machines
The outputs of Mealy machines are a function of the current state of the
machine plus the inputs. This additional path provides more flexibility, but
may complicate the understanding of the machine.
Clock
State
Combinatorial
Logic
Inputs
Outputs
Combinatorial
Logic
Register
Mealy Machine
Clock
State
Combinatorial
Logic
Inputs OutputsCombinatorial
Logic
Register
Moore Machine
Good State Machine Design
6 The Ten Commandments of Excellent Design
Books on high-level design languages (HDLs) expound at great length on the
construction of state machines. The results are frequently disappointing. If you
define your state machine in an HDL and run your design through a synthesizer,
you may find spaghetti logic that no self-respecting designer would ever put
together.
What’s Wrong with Mealy/
Moore?
Figure 2 shows that the outputs of both the Mealy and Moore forms of state
machine are combinatorial decodes of the current state and, in the Mealy form,
the inputs. While this is fine in principle, there are pitfalls here waiting to trap the
unwary.
The outputs of the state machine may include the following types of function:
? Latch enables (low- or high-going pulses to open or close latches)
? Tristate enables (signals to turn on and off drivers onto on-chip or off-chip
buses)
? Register enables (enables to synchronously clocked registers)
? Other general control signals, such as counter enables, flags, and so on.
Most of these signals have one characteristic in common—glitches are abso-
lutely unacceptable at any time. As the state registers and inputs of the Mealy or
Moore state machines transition and settle, the combinatorial gates are quite
capable of generating glitches as a consequence of the varying gate propaga-
tion delays. These transitory glitches may well contain enough energy to open
latches, clock registers, and other highly undesirable effects.
Wouldn’t Gray Code Fix the
Problem?
We all learn at an early age that gray code counters are wonderful since only
one bit changes at a time. When fed to an asynchronous decoder, theory sug-
gests that the outputs should settle to their new state without noise. Your author
is suspicious of this when the implementation is created by synthesized logic;
unclocked feed-forward paths might well negate the advantage of gray code.
There is, however, a greater challenge to the use of gray code. The sequence of
transitions taken by a state machine as it does its stuff is likely to be quite elab-
orate; many state machines are very complex with many branches between the
possible states. Since gray code-driven decodes are only glitch free when a sin-
gle bit changes at each clock edge, the designer must assure that all possible
state transitions result in only a single bit change of the state variable. This is
practical in only the simplest of state machines.
A Much Better State Machine Figure 3 shows a much better design for a state machine. By adding an output
register (with cleanly clocked D-type flip-flops) that is reloaded at each clock
edge, the outputs of the state machine are guaranteed to be always glitch-free.
Feeding Inputs and Resets to Your State Machine
The Ten Commandments of Excellent Design 7
It is suggested that all state machines be implemented in this form, since the
quality of the outputs is independent of the number of states or outputs.
FIGURE 3. A Much Better State Machine
Feeding Inputs and Resets to Your State Machine
Reset signals are traditionally asynchronous and are routed directly to the clear
inputs of state machine register elements. When the reset is asserted, all regis-
ters (state and output bits) are cleared immediately. All well and good, but what
happens when the reset is deasserted? Consider a state machine that will tran-
sition from the reset state to some other state directly after the reset is deas-
serted. If the reset deasserts close to a clock edge, some of the state bits will
assume their new states, while others might not. The state machine ends up in
an undefined error state, and, yet again, you have egg on your face.
The solution? Synchronize that darned reset! That way, the reset will be
removed well before the clock edge, and all register elements will correctly tran-
sition to their new states.
Synchronize All State
Machine Inputs
In fact, every input to your state machine must be synchronous. At the very
least, you must be absolutely certain that no input will violate the setup and hold
times of the state machine’s state and output registers.
Clock
State
Combinatorial
Logic
Inputs
Outputs
Register
Output
Register
Dead States—The Purgatory of State Machines
8 The Ten Commandments of Excellent Design
Dead States—The Purgatory of State Machines
State machines with encoded state bits don’t always use all possible states. For
example, if you have a 20-state state machine, you would use a five-bit state
register. This would leave 12 unused state values. Since states are usually
counted incrementally from zero, our example would look like this:
If the state machine ever enters a state 20-31, errors are likely; worse, the
machine may lock up totally, with the state machine forever in one of these ille-
gal states. It may require a hard reset to recover from this condition.
Clearly, it’s best to ensure your state machine never reaches a dead state. How-
ever, a robust design will at a minimum ensure that if the state machine does
enter a dead state, it will exit the dead state immediately and then perhaps enter
a quiescent state.
States What The States are Used For
0-19 Normal operation.
20-31 Not used: these are “dead” states.
Crossing Clock Domains
The Ten Commandments of Excellent Design 9
Crossing Clock Domains
Moving information from one clock domain to another is rather like descending
into Dante’s inferno. All sorts of evils lie in wait to beset the naive. Setup and
hold violations, metastability conditions, unreliable data, and other perils are
manifest when moving from one clock domain to another. Indeed, the whole
issue of synchronization might merit its own article. Here, a few tips will be pre-
sented which might help in resolving the block-to-block synchronization issues.
First, let’s define the problem; please see Figure 4.
FIGURE 4. Crossing Clock Domains
We have two blocks of logic, A and B. Block A operates with Clock A, while
Block B operates with Clock B. We make no assumptions at all about the fre-
quencies of Clock A and Clock B; nor do we assume any integer or multiple
relationship between the two. The two clocks are totally independent.
We need to send a strobe from Block A to Block B (Strobe A-B), and also some
data, Data A-B. In response, Strobe B-A returns, together with Data B-A. The
transmission of information between the blocks must be absolutely reliable. To
accomplish this, we will look at several aspects of the cross-domain problem.
Clock Domain A
Clock A
Clock Domain B
Clock B
Strobe A-B
Strobe B-A
Data A-B
Data B-A
Block A Block B
Crossing Clock Domains
10 The Ten Commandments of Excellent Design
Synchronization 101 Crossing between clock domains is a similar issue to managing asynchronous
inputs. Since no relationship between the multiple clock domains can be
assumed, the inputs from Block A to Block B must be assumed to be asynchro-
nous inputs. The traditional way of synchronizing an asynchronous input signal
is shown in Figure 5:
FIGURE 5. Synchronizing an Asynchronous Input
Two D-type flip-flops are used; two synchronization stages are usually sufficient.
Only the rarest applications might demand three stages of synchronization. If
your silicon library supports metastable-hardened flip-flops, then the first stage
should use such a device. Typically, metastable-hardened flip-flops guarantee
that their Q outputs will settle after a given maximum time, no matter how close
the data transition is to the flip-flop’s clock edge.
This method of information interchange has one drawback. If the strobe has the
form of a pulse, it may not be seen by the destination block if the pulse width is
less than the destination block’s clock (sampling) frequency. This is not a prob-
lem if the two blocks exchange levels instead of pulses; however, this is slow, as
typically four level exchanges must occur for a two-way handshake. The toggle
method described later is an excellent solution to this problem.
Single-Point Information Imagine that Block A needs to send two bits of information to Block B. We could
simply duplicate the circuit in Figure 5, with one synchronization circuit for each
bit. There is a serious problem which should be clear: occasionally, the circum-
stance will arise when one bit gets through the two-stage synchronization cir-
cuit, while the other does not. The result is ambiguous information and errors.
The solution is shown back in Figure 4—use a single strobe from Block A to
Block B, and send the rest of the information separately. The single-point strobe
from A to B informs the destination block that the Data A-B is valid; the originat-
ing block ensures that there is adequate setup time.
D
Clock B
Q
Input from
Output to
DQ
Block A
Block B’s
Logic
Crossing Clock Domains
The Ten Commandments of Excellent Design 11
Toggleo, Toggleas, Toggleat A nifty way of doing a two-way handshake without worrying about levels and
pulse widths is to use a toggle exchange protocol. This is illustrated in Figure 6.
FIGURE 6. Using Toggle Signals to Cross Clock Domains
In this case, the signal from Block A to Block B that indicates the data (Data A-
B) is valid is a transition of the signal Toggle A-B. This transition may be low-to-
Clock Domain A
Clock A
Clock Domain B
Clock B
Toggle A-B
Toggle B-A
Data A-B
Data B-A
Block A Block B
Crossing Clock Domains
12 The Ten Commandments of Excellent Design
high or high-to-low. Both transitions have the same meaning: the Data A-B bus
is valid. This is illustrated in Figure 7
FIGURE 7. Toggle Signal Timing: One Edge Does It All
It may be seen that each transfer is complete with only two events: a toggle of
each of the two Toggle strobes. While each toggle must, of course, be synchro-
nized carefully at the receiving end, this method guarantees successful trans-
mission and reception of wide data busses across clock domains of arbitrary
frequency. From gigahertz to kilohertz, the toggle method is predictable and reli-
able.
Toggle A-B
Data A-B
Valid Data Valid Data
Transfer 1 Transfer 2
Toggle B-A
Latches Look Lovely!
The Ten Commandments of Excellent Design 13
Latches Look Lovely!
When creating a set of clocked elements, there is often a compelling reason to
use latch-based designs. A single-bit register implemented with a latch may use
just 60% of the gates that a conventional D-type flip-flop requires. If your design
uses great numbers of configuration registers, FIFOs, or has elaborate data
paths, the savings when using latches might be considerable. And since the
latch control might be the same signal as the clock enable to a D-type flip-flop
with a clock enable, why not use latches? Look at Figure 8, which shows how a
latch works.
FIGURE 8. How a Latch Works.
The latch’s Q output is stable while the latch is closed. When the latch is open,
the input is continuously copied to the output. Two potential pitfalls exist with
latches:
1. Noisy Inputs
Any glitches on the latch’s D input are propagated directly through to the
output. This is, of course, manageable by ensuring that there aren’t any
glitches on the input. However, in a synchronous system, busses tend to
switch states at clock edges, and the latch enable typically straddles a clock
DQ
Input
Output
Input
Output
Latch EnableLatch Open/Close#
Latch Open/Close#
OpenClosed Closed
Nefarious Glitch
Glitch gets through
the latch, darn it
Latches Look Lovely!
14 The Ten Commandments of Excellent Design
edge, requiring that the D input be perfectly clean right through the same
clock edge. This is the worst time for switching noise, particularly on wide
busses. What’s more, the latch needs the D input to be stable for two clock
periods (so it’s clean through the clock edge). If you change the D input with
the same edge that closes the latch, you have a race which you’re bound to
lose (Murphy and his law, you know).
2. Noisy Latch Enable
Perhaps worse than noise on latch inputs is noise on the enable line. If a
latch enable glitches as a result of an asynchronous decode, your design is
toast. The first part of this article discussed how to eliminate glitches on
decoded signals; but if you get it wrong, a register-based design is still likely
to be robust, since glitches on clock enables don’t matter except when the
clock transitions. Glitches on latch enables always mean instant death
whenever they occur.
Registers Rule! Register-based designs suffer from none of the disadvantages listed above.
Race conditions are rare to non-existent, glitches on the control or D signals are
unlikely to cause harm, and signals can be reliably latched in one clock period.
A register-based design may be larger than its latch-based equivalent, but it will
be more robust and will contribute toward first-silicon success.
Bottom line: If you absolutely have to use latches, beware!
The Fast Path to Disaster
The Ten Commandments of Excellent Design 15
The Fast Path to Disaster
What’s wrong with the circuit in Figure 9?
FIGURE 9. Fast Paths and Race Conditions.
This is a classic example of a race condition; the transition as the output of the
first flip-flop changes might well violate the hold time on the D input of the sec-
ond flip-flop. This situation can be worsened if there is skew between the clocks
to each of the two flip-flops; if flip-flop B’s clock lags A’s, then B’s output might
actually replicate the output of A, rather than add the extra clock delay that is
required. Figure 10 shows how to fix the problem:
FIGURE 10. After an Application of Fast Paths-B-Gone.
D
Clock
QInput OutputDQ
AB
D
Clock
Q
Input OutputDQ
Delay Element
AB
The Fast Path to Disaster
16 The Ten Commandments of Excellent Design
The delay element ensures that there is sufficient time for flip-flop B to complete
its transition before the result of A’s transition reaches B.
Some synthesizer tools have a “fix hold” option which claims to take care of this
situation. But if your design fails, who gets the blame: the designer or some well-
hidden option in a synthesizer? Check carefully for fast paths.
Have Sympathy for the Test Engineer
The Ten Commandments of Excellent Design 17
Have Sympathy for the Test Engineer
If all goes well, your chip will enter mass production and the world will rejoice (or
at least the shareholders). To do this, your design must be testable. Testability is
a much-neglected aspect of many designs; here are a few tips to help test engi-
neers sleep better at night.
? Break long counters into bite-size chunks
Counters require lots of test vectors to ensure that all bits toggle correctly,
and that carry bits are generated as they should be. To help keep the number
of test vectors to a reasonable number, provide the ability to partition a
counter into multiple smaller (for example, four bits each) counters. Then
provide visibility of the most significant bit of each stage. That way the test
sequence can verify that every counter stage works by observing the most
significant bit’s low-to-high and high-to-low transitions, and can reasonably
conclude that the counter will work as a unit.
? Asynchronous feedback paths are a federal offense
Even without considering the effect it has on a test engineer’s disposition,
logic that uses asynchronous feedback is generally bad for a number of rea-
sons. It is hard to simulate, it may well be dependent on voltage, tempera-
ture, and process, it may be very susceptible to transients. Just as bad, it
may be impossible to test on a fixed-frequency tester. If there are unclocked
feedback paths in your design, make sure that they can be broken and ana-
lyzed from the tester. Better still, get rid of them altogether.
Simulators Seduce the Unwary...
18 The Ten Commandments of Excellent Design
Simulators Seduce the Unwary...
It is easy and tempting to say “I’ll just design it quickly, then find the bugs in sim-
ulation.” This is a bad idea and is doomed from the start. Simulators are notori-
ous for hiding the quirky details of your design. Examples include:
? Clock Synchronization
Synchronizing flip-flops constantly battle metastability and glitching inputs.
Their behavior is not even closely approximated by your average simulator;
all you see is a clean transition at the clock edge. Crossing clock domains
must always be correct by design from the earliest stages.
? Asynchronous Logic
In a similar way, asynchronous logic is often simulated poorly. Certainly, fast
paths and race conditions may be hidden. Some environments will deter-
mine (and optionally correct) hold-time violations, but this is not a universal
panacea for correct asynchronous logic.
Correct by Design
and
Correct by Inspection
When designing logic that is outside the protected realm of clock-to-clock regis-
ter-to-register implementations, the only solution for robust design is to do it
right from the start. Your logic must be:
? Correct by Design
Each gate, each line of VHDL or Verilog, must be understood completely.
Don’t hope that some set of simulations will find your bugs; you may neglect
to test a part of your design, and if it was designed sloppily, it will fail.
? Correct by Inspection
Disciplined layout will also make your design more robust, comprehensible,
and maintainable. It should not be necessary to sort through a mass of ugly
code or spaghetti gates to understand the operation of your function. Orga-
nized gates, commented code, and thorough accompanying documentation
will provide a basis for a reliable design.
Peter’s Provocative Pontifications— The Ten Commandments for Successful Design
The Ten Commandments of Excellent Design 19
Peter’s Provocative Pontifications—
The Ten Commandments for Successful Design
1. All state machine outputs shall always be registered
2. Thou shalt use registers, never latches
3. Thy state machine inputs, including resets, shall be synchronous
4. Beware fast paths lest they bite thine ankles
5. Minimize skew of thine clocks
6. Cross clock domains with the greatest of caution. Synchronize thy sig-
nals!
7. Have no dead states in thy state machines
8. Have no logic with unbroken asynchronous feedback lest the fleas of
myriad Test Engineers infest thee
9. All decode logic must be crafted carefully—eschew asynchronicity
10. Trust not thy simulator—it may beguile thee when thy design is junk
Latches, Schmatches
Since this material first appeared, the second commandment, Thou shalt use
registers, never latches, has been somewhat controversial (to say the least).
Dyed-in-the-wool latch users have been squealing that latches are wondrous
things, and are the solution to good designs, compact chips, and peace on
earth. Three clear advantages of latches are:
? Considerably smaller than D-type flip-flops
? Provide anticipation of the data (for example, the decode of a latched
address can begin before the latch is closed)
? Lower power, compared with continuously clocked flip-flops.
If you do insist on a latch-based design, watch out for the following:
? A glitch-free enable—remember that glitches on the enable can corrupt the
latch’s data. If you are synthesizing the code to create the enable, consider
seriously the direct instantiation of the gate that drives the enable to the
latch. Don’t trust optimized equations!
? Data input hold time—ensure that the data is held for long enough as you
close the latch. If your latch enable is derived from a clock, the latch will lag
the clock, requiring the latch’s D inputs to be held valid after the clock edge
Contact Information
20 The Ten Commandments of Excellent Design
Contact Information
Here’s how to contact the author:
Peter Chambers
VLSI Technology, Inc.
8375 South River Parkway, M/S 250
Tempe, Arizona 85284
Phone: 602 752 6395
Email: peter.chambers@vlsi.com