Verilog reg, Verilog wire, SystemVerilog logic. What’s the difference?

The difference between Verilog reg and Verilog wire frequently confuses many programmers just starting with the language (certainly confused me!). As a beginner, I was told to follow these guidelines, which seemed to generally work:

  • Use Verilog reg for left hand side (LHS) of signals assigned inside in always blocks
  • Use Verilog wire for LHS of signals assigned outside always blocks

Then when I adopted SystemVerilog for writing RTL designs, I was told everything can now be “type logic”. That again generally worked, but every now and then I would run into a cryptic error message about variables, nets, and assignment.

So I decided to find out exactly how these data types worked to write this article. I dug into the language reference manual, searched for the now-defunct Verilog-2005 standard document, and got into a bit of history lesson. Read on for my discovery of the differences between Verilog reg, Verilog wire, and SystemVerilog logic.

Verilog data types, Verilog reg, Verilog wire

Verilog data types are divided into two main groups: nets and variables. The distinction comes from how they are intended to represent different hardware structures.

A net data type represents a physical connection between structural entities (think a plain wire), such as between gates or between modules. It does not store any value. Its value is derived from what is being driven from its driver(s). Verilog wire is probably the most common net data type, although there are many other net data types such as tri, wand, supply0.

A variable data type generally represents a piece of storage. It holds a value assigned to it until the next assignment. Verilog reg is probably the most common variable data type. Verilog reg is generally used to model hardware registers (although it can also represent combinatorial logic, like inside an always@(*) block). Other variable data types include integer, time, real, realtime.

Almost all Verilog data types are 4-state, which means they can take on 4 values:

  • 0 represents a logic zero, or a false condition
  • 1 represents a logic one, or a true condition
  • X represents an unknown logic value
  • Z represents a high-impedance state
Verilog rule of thumb 1: use Verilog reg when you want to represent a piece of storage, and use Verilog wire when you want to represent a physical connection.

Assigning values to Verilog reg, Verilog wire

Verilog net data types can only be assigned values by continuous assignments. This means using constructs like continuous assignment statement (assign statement), or drive it from an output port. A continuous assignment drives a net similar to how a gate drives a net. The expression on the right hand side can be thought of as a combinatorial circuit that drives the net continuously.

Verilog variable data types can only be assigned values using procedural assignments. This means inside an always block, an initial block, a task, a function. The assignment occurs on some kind of trigger (like the posedge of a clock), after which the variable retains its value until the next assignment (at the next trigger). This makes variables ideal for modeling storage elements like flip-flops.

Verilog rule of thmb 2: drive a Verilog wire with assign statement or port output, and drive a Verilog reg from an always block. If you want to drive a physical connection with combinatorial logic inside an always@(*) block, then you have to declare the physical connection as Verilog reg.

SystemVerilog logic, data types, and data objects

SystemVerilog introduces a new 2-state data type—where only logic 0 and logic 1 are allowed, not X or Z—for testbench modeling. To distinguish the old Verilog 4-state behaviour, a new SystemVerilog logic data type is added to describe a generic 4-state data type.

What used to be data types in Verilog, like wire, reg, wand, are now called data objects in SystemVerilog. Wire, reg, wand (and almost all previous Verilog data types) are 4-state data objects. Bit, byte, shortint, int, longint are the new SystemVerilog 2-state data objects.

There are still the two main groups of data objects: nets and variables. All the Verilog data types (now data objects) that we are familiar with, since they are 4-state, should now properly also contain the SystemVerilog logic keyword.

There is a new way to declare variables, beginning with the keyword var. If the data type (2-state or 4-state) is not specified, then it is implicitly declared as logic. Below are some variable declaration examples. Although some don’t seem to be fully supported by tools.

Don’t worry too much about the var keyword. It was added for language preciseness (it’s what happens as a language evolves and language gurus strive to maintain backward-compatibility), and you’ll likely not see it in an RTL design.

I’m confused… Just tell me how I should use SystemVerilog logic!

After all that technical specification gobbledygook, I have good news if you’re using SystemVerilog for RTL design. For everyday usage in RTL design, you can pretty much forget all of that!

The SystemVerilog logic keyword standalone will declare a variable, but the rules have been rewritten such that you can pretty much use a variable everywhere in RTL design. Hence, you see in my example code from other articles, I use SystemVerilog logic to declare variables and ports.

When you use SystemVerilog logic standalone this way, there is another advantage of improved checking for unintended multiple drivers. Multiple assignments, or mixing continuous and procedural (always block) assignments, to a SystemVerilog variable is an error, which means you will most likely see a compile time error. Mixing and multiple assignments is allowed for a net. So if you really want a multiply-driven net you will need to declare it a wire.

In Verilog it was legal to have an assignment to a module output port (declared as Verilog wire or Verilog reg) from outside the module, or to have an assignment inside the module to a net declared as an input port. Both of these are frequently unintended wiring mistakes, causing contention. With SystemVerilog, an output port declared as SystemVerilog logic variable prohibits multiple drivers, and an assignment to an input port declared as SystemVerilog logic variable is also illegal. So if you make this kind of wiring mistake, you will likely again get a compile time error.

SystemVerilog rule of thumb 1: if using SystemVerilog for RTL design, use SystemVerilog logic to declare:

  • All point-to-point nets. If you specifically need a multi-driver net, then use one of the traditional net types like wire
  • All variables (logic driven by always blocks)
  • All input ports
  • All output ports

If you follow this rule, you can pretty much forget about the differences between Verilog reg and Verilog wire! (well, most of the time)

Conclusion

When I first wondered why it was possible to always write RTL using SystemVerilog logic keyword, I never expected it to become a major undertaking that involved reading and interpreting two different specifications, understanding complex language rules, and figuring out their nuances. At least I can say that the recommendations are easy to remember.

I hope this article gives you a good summary of Verilog reg, Verilog wire, SystemVerilog logic, their history, and a useful set of recommendations for RTL coding. I do not claim to be a Verilog or SystemVerilog language expert, so please do correct me if you felt I misinterpreted anything in the specifications.

References

Sample Source Code

The accompanying source code for this article is a SystemVerilog design and testbench toy example that demonstrates the difference between using Verilog reg, Verilog wire, and SystemVerilog logic to code design modules. Download the code to see how it works!

Download Article Companion Source Code

Get the FREE, EXECUTABLE test bench and source code for this article, notification of new articles, and more!
.

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

SystemVerilog Struct and Union – for Designers too

SystemVerilog struct (ure) and union are very similar to their C programming counterparts, so you may already have a good idea of how they work. But have you tried using them in your RTL design? When used effectively, they can simplify your code and save a lot of typing. Recently, I tried incorporating SystemVerilog struct and union in new ways that I had not done before with surprisingly (or not surprisingly?) good effect. In this post I would like to share with you some tips on how you can also use them in your RTL design.

What is a SystemVerilog Struct (ure)?

A SystemVerilog struct is a way to group several data types. The entire group can be referenced as a whole, or the individual data type can be referenced by name. It is handy in RTL coding when you have a collection of signals you need to pass around the design together, but want to retain the readability and accessibility of each separate signal.

When used in RTL code, a packed SystemVerilog struct is the most useful. A packed struct is treated as a single vector, and each data type in the structure is represented as a bit field. The entire structure is then packed together in memory without gaps. Only packed data types and integer data types are allowed in a packed struct. Because it is defined as a vector, the entire structure can also be used as a whole with arithmetic and logical operators.

An unpacked SystemVerilog struct, on the other hand, does not define a packing of the data types. It is tool-dependent how the structure is packed in memory. Unpacked struct probably will not synthesize by your synthesis tool, so I would avoid it in RTL code. It is, however, the default mode of a structure if the packed keyword is not used when defining the structure.

SystemVerilog struct is often defined with the typedef keyword to give the structure type a name so it can be more easily reused across multiple files. Here is an example:

What is a SystemVerilog Union?

A SystemVerilog union allows a single piece of storage to be represented different ways using different named member types. Because there is only a single storage, only one of the data types can be used at a time. Unions can also be packed and unpacked similarly to structures. Only packed data types and integer data types can be used in packed union. All members of a packed (and untagged, which I’ll get to later) union must be the same size. Like packed structures, packed union can be used as a whole with arithmetic and logical operators, and bit fields can be extracted like from a packed array.

A tagged union is a type-checked union. That means you can no longer write to the union using one member type, and read it back using another. Tagged union enforces type checking by inserting additional bits into the union to store how the union was initially accessed. Due to the added bits, and inability to freely refer to the same storage using different union members, I think this makes it less useful in RTL coding.

Take a look at the following example, where I expand the earlier SystemVerilog struct into a union to provide a different way to access that same piece of data.

Ways to Use SystemVerilog Struct in a Design

There are many ways to incorporate SystemVerilog struct into your RTL code. Here are some common usages.

Encapsulate Fields of a Complex Type

One of the simplest uses of a structure is to encapsulate signals that are commonly used together into a single unit that can be passed around the design more easily, like the opcode structure example above. It both simplifies the RTL code and makes it more readable. Simulators like Synopsys VCS will display the fields of a structure separately on a waveform, making the structure easily readable.

If you need to use the same structure in multiple modules, a tip is to put the definition of the structure (defined using typedef) into a SystemVerilog package, then import the package into each RTL module that requires the definition. This way you will only need to define the structure once.

SystemVerilog Struct as a Module Port

A module port can have a SystemVerilog struct type, which makes it easy to pass the same bundle of signals into and out of multiple modules, and keep the same encapsulation throughout a design. For example a wide command bus between two modules with multiple fields can be grouped into a structure to simplify the RTL code, and to avoid having to manually decode the bits of the command bus when viewing it on a waveform (a major frustration!).

Using SystemVerilog Struct with Parameterized Data Type

A structure can be used effectively with modules that support parameterized data type. For example if a FIFO module supports parameterized data type, the entire structure can be passed into the FIFO with no further modification to the FIFO code.

Ways to Use SystemVerilog Union in a Design

Until very recently, I had not found a useful way to use a SystemVerilog union in RTL code. But I finally did in my last project! The best way to think about a SystemVerilog union is that it can give you alternative views of a common data structure. The packed union opcode example above has a “fields view” and a “dword view”, which can be referred to in different parts of a design depending on which is more convenient. For example, if the opcode needs to be buffered in a 64-bit buffer comprised of two 32-bit wide memories, then you can assign one dword from the “dword view” as the input to each memory, like this:

In my last project, I used a union this way to store a wide SystemVerilog struct into multiple 39-bit memories in parallel (32-bit data plus 7-bit SECDED encoding). The memories were divided this way such that each 32-bit dword can be individually protected by SECDED encoding so it is individually accessible by a CPU. I used a “dword view” of the union in a generate loop to feed the data into the SECDED encoders and memories. It eliminated alot of copying and pasting, and made the code much more concise!

Conclusion

SystemVerilog struct and union are handy constructs that can encapsulate data types and simplify your RTL code. They are most effective when the structure or union types can be used throughout a design, including as module ports, and with modules that support parameterized data types.

Do you have another novel way of using SystemVerilog struct and union? Leave a comment below!

References

Download Article Companion Source Code

Get the FREE, EXECUTABLE test bench and source code for this article, notification of new articles, and more!
.

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

Clock Domain Crossing Design – Part 3

In Clock Domain Crossing (CDC) Design – Part 2, I discussed potential problems with passing multiple signals across a clock domain, and one effective and safe way to do so. That circuit, however, does hot handle the case when the destination side logic cannot accept data and needs to back-pressure the source side. The two feedback schemes in this article add this final piece.

The concepts in this article are again taken from Cliff Cummings’ very comprehensive paper Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog. I highly recommend taking an hour or two to read through it.

Multi-cycle path (MCP) formulation with feedback

How can the source domain logic know when it is safe to send the next piece of data to the synchronizer? It can wait a fixed number of cycles, which can be determined from the synchronizer circuit. But a better way is to have logic in the synchronizer to indicate this to the source domain. The following figure illustrates how this can be done. Compared to the MCP without feedback circuit, it adds a 1-bit finite state machine (FSM) to indicate to the source domain whether the synchronizer is ready to accept a new piece of data.

MCP synchronizer with feedback

The 1-bit FSM has 2 states, 2 inputs, and 1 output (besides clock and reset).

  • States: Idle, Busy
  • Inputs: src_send, src_ack
  • Output: src_rdy

The logic is simple. src_send causes the FSM to transition to Busy, src_ack causes the FSM to transition back to Idle. src_rdy output is 1 when Idle, and 0 when Busy. The user logic outside the synchronizer needs to monitor these signals to determine when it is safe to send a new piece of data.

Multi-cycle path (MCP) formulation with feedback acknowledge

What if the destination domain further needs to back-pressure data from the source domain? To do so, the destination domain will need control over when to send feedback acknowledgement to the source domain. This can be accomplished by adding a similar 1-bit FSM in the destination side of the synchronizer. This addition allows the destination clock domain to throttle the source domain to not send any data until the destination domain logic is ready. The following figure illustrates this design.

MCP synchronizer with acknowledge feedback

And there you have it. A complete multi-bit synchronizer solution with handshaking. Note that this particular design is slightly different from the design described in Cliff Cummings’ paper. I have changed the destination side to output data on dest_data_out as soon as available, rather than waiting for an external signal to load the data like the bload signal in Cliff Cummings’ circuit. It doesn’t seem efficient to me to incur another cycle to make the data available.

I apologize that the diagram is a bit messy (if anyone knows of a better circuit drawing tool, please let me know). The source code will be provided below so you can study that in detail.

1-Deep Asynchronous FIFO

Obviously the dual-clock asynchronous FIFO can also be used to pass multiple bits across a clock domain crossing (CDC). What is interesting, however, is a 1-deep asynchronous FIFO (which actually has space for 2 entries, but will only ever fill 1 at a time) provides the same feedback acknowledge capability as the MCP formulation circuit above, but with 1 cycle lower latency on both the send and feedback paths.

In this configuration, pointers to the FIFO become single bit only, and toggle between 0 and 1. Read and write pointer comparison is redefined to produce a single bit value that indicates either ready to receive data (FIFO is empty, pointers equal) or not ready to receive data (FIFO is not empty, pointers not equal). Notice that when the logic indicates not ready, the write pointer has already incremented and is pointing to the next entry of the 2-entry storage to store data. Also, the FIFO read/write pointer circuitry essentially replaces the 1-bit state machine in the MCP formulation feedback scheme, therefore requiring 1 less step and 1 less cycle.

Conclusion

Over this 3-part series, we have looked at potential problems and proven design techniques to handle clock domain crossing (CDC) logic for single-bit signals (Part 1), and multi-bit signals (Part 2, and Part 3). Cliff Cummings gives a good summary from his paper:

Recommended 1-bit CDC techniques

  • Register the signal in the source domain to remove combinational settling (glitches)
  • Synchronize the signal into the destination domain

Recommended multi-bit CDC techniques

  • Consolidate multiple signals into a 1-bit representation, then synchronize the 1-bit signal
  • Use multi-cycle path (MCP) formulation to pass multi-bit data buses
  • Use asynchronous FIFOs to pass multi-bit data or control buses
  • Use gray code counters

Writing about this topic has been a much bigger (but also more rewarding) undertaking than I imagined. I must confess I’m still relatively new to asynchronous designs. So if you have any experience or design techniques to share, please leave a comment below! How about investigating clock domain crossing verification next?

References

Sample Source Code

The accompanying source code for this article is the MCP feedback acknowledge synchronizer design and testbench, which generates the following waveform when run. Download and run the code to see how it works!

Multi-bit MCP synchronizer with feedback ack wave

Download Article Companion Source Code

Get the FREE, EXECUTABLE test bench and source code for this article, notification of new articles, and more!
.

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

Clock Domain Crossing Design – Part 2

In Clock Domain Crossing (CDC) Techniques – Part 1, I briefly discussed metastability and two methods to safely synchronize a single bit. While those techniques are commonly used, in many applications we need to synchronize multiple control or data bits, like an encoded state or a data bus. Synchronizing multiple bits brings a host of other potential problems that need to be carefully examined, and solutions that build upon the basic blocks we discussed in part 1.

The concepts in this article are again mostly taken from Cliff Cumming’s very comprehensive paper Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog. I highly recommend taking an hour or two to read through it.

Problems With Passing Multiple Control Signals Across Clock Domain Crossing (CDC)

The fundamental problem with passing multiple bits is if they are synchronized individually, they cannot be guaranteed to arrive in the destination clock domain on the same clock edge. When the individual bits are launched from the source clock domain, they may be skewed relative to each other due to trace length, process variation, etc. Since in an asynchronous clock domain crossing (CDC) the destination clock can have every possible alignment relative to the source clock (and relative to the skewed data bits), the destination clock can (and will) sample at a time when not all the bits are at their stable final values. Therefore synchronizing individual bits of a multi-bit signal is not sufficient! Let’s look at several potential problems.

Two simultaneously required signals

The waveform below shows how data skew from the source clock domain can cause two signals to arrive in different clock cycles in the destination domain, if they are synchronized individually using two flip-flop synchronizers. Don’t do this!

Problematic clock domain crossing (CDC): two simultaneously required signals

Two sequenced signals

Individually synchronizing two signals that require precise sequencing across an asynchronous clock domain crossing (CDC) is a recipe for disaster. In fact a recent ASIC project at work had a problem like this that resulted in a chip that only booted 50% of the time, months of debug, and finally a respin (we make mistakes too).

The waveform below shows how two separate signals that are intended to arrive 1 cycle apart in the destination domain, can actually arrive either 1 or 2 cycles apart depending on data skew. It’s difficult to even analyze the frequency difference from the source to destination clock domain and come up with a potential sequence that may work… Just don’t do this. There are better ways!

Problematic clock domain crossing (CDC): two sequenced signals

Encoded control signals

There are many scenarios where you may want to pass a multi-bit signal across a clock domain crossing (CDC), such as an encoded signal. By now we understand the potential problem, right? Due to data skew, the different bits may take different number of cycles to synchronize, and we may not be able to read the same encoded value on the destination clock domain. You may get away with using a simple two flip-flop synchronizer if you know there will be sufficient time for the signal to settle before reading the synchronized value (like a relatively static encoded status signal). But it’s still not the best practice.

Solutions For Passing Multiple Signals Across Clock Domain Crossing (CDC)

So how do we deal with synchronizing multiple signals? There are at least several solutions with different levels of complexity:

  1. Multi-bit signal consolidation
  2. Multi-cycle path (MCP) formulation without feedback
  3. Multi-cycle path (MCP) formulation with feedback acknowledge
  4. Dual-Clock Asynchronous FIFO
  5. Two-Deep FIFO

Multi-cycle path (MCP) formulation is a particularly interesting and widely applicable solution. It refers to sending unsynchronized data from the source clock domain to the destination clock domain, paired with a synchronized control (e.g. a load enable) signal. The data and control signals are sent simultaneously from the source clock domain. The data signals do not go through any synchronization, but go straight into a multi-bit flip-flop in the destination clock domain. The control signal is synchronized through a two-flip-flop synchronizer, then used to load the unsynchronized data into the flip-flops in the destination clock domain. This allows the data signals to settle (while the control signal is being synchronized), and captured together on a single clock edge. We will get into two variations of this technique in later sections.

Multi-bit signal consolidation

Consolidating multiple bits across clock domain crossing (CDC) into one is more of a best practice, than a technique. It’s always good to reduce as much as possible the number of signals that need to cross a clock domain crossing (CDC). However, this can be applied directly to the problem of sequencing two signals into the destination clock domain. A single signal can be synchronized across the clock domain crossing (CDC), and the two sequenced signals can be recreated in the destination clock domain once the synchronizing signal is received.

Multi-cycle path (MCP) formulation without feedback

The multi-cycle path (MCP) synchronizer is comprised of several components:

  1. Logic that converts a synchronization event from source clock domain to a toggle to pass across the clock domain crossing (CDC)
  2. Logic that converts the toggle into a load pulse in the destination domain
  3. Flip-flops to capture the unsynchronized data bits

One key idea in this design is that the synchronization event (a pulse) is converted into a single toggle (either low to high, or high to low) before being synchronized into the destination clock domain. Each toggle represents one event. The advantage of synchronizing a toggle is it eliminates the problem of a missing pulse when crossing from a fast clock to a slow clock domain. However, you need to be careful when resetting the synchronizer such that no unintended events are generated (i.e. if the source domain is reset on its own, and the toggle signal goes from high to low due to reset).

Source clock domain event to toggle generator

The following circuit resides in the source clock domain, and converts an event that needs to traverse the clock domain crossing (CDC) into a toggle, which cannot be missed due to sampling in the destination clock domain.
CDC pulse to toggle generator (source clock) diagram

CDC pulse to toggle generator (source clock) wave

Destination clock domain toggle to load pulse generator

Next, we need a circuit in the destination clock domain to convert the toggle back into a pulse to capture the multi-bit signal.
CDC toggle to pulse generator (destination clock) diagram

CDC toggle to pulse generator (destination clock) wave

Putting it together

Finally, putting the entire synchronizer circuit together, we get the following.
MCP synchronizer without feedback

Notice the multi-bit data signal passes straight from source (clock) flip-flop to destination (clock) flip-flop to avoid problems with synchronizing multiple bits. A single control signal is synchronized to allow time for the multi-bit data to settle from possible metastable state. The load pulse from the source clock domain first gets converted into a toggle. The toggle is synchronized across the clock domain crossing (CDC), then gets converted back to a load pulse in the destination clock domain. Finally that load pulse is used to load the multi-bit data signal into flip-flops in the destination clock domain.

Conclusion

Passing multiple signals across an asynchronous clock domain crossing (CDC) can become a recipe for disaster if not done properly. This article described some potential pitfalls, and one very effective technique called multi-cycle path (MCP) formulation to synchronize multiple bits across a clock domain crossing (CDC). There is one missing piece, however. How does logic in the source clock domain know when it is safe to send another piece of data? In Part 3 of the series, I will put in the final piece and enhance the multi-cycle path (MCP) synchronizer with feedback acknowledgement.

References

Sample Source Code

The accompanying source code for this article is the multi-bit MCP synchronizer without feedback design and testbench, which generates the following waveform when run. Download and run the code to see how it works!

Multi-bit MCP synchronizer without feedback wave

Download Article Companion Source Code

Get the FREE, EXECUTABLE test bench and source code for this article, notification of new articles, and more!
Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

Clock Domain Crossing Design – 3 Part Series

Thank you for all your interest in my last post on Dual-Clock Asynchronous FIFO in SystemVerilog! I decided to continue the theme of clock domain crossing (CDC) design techniques, and look at several other methods for passing control signals and data between asynchronous clock domains. This is perfect timing because I’m just about to create a new revision of one of my design blocks at work, which incorporates many of these concepts. I, too, can use a refresher 🙂

The concepts in this article are mostly taken from Cliff Cumming’s very comprehensive paper Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog. I’ve broken the topics into 3 parts:

  • Part 1 – metastability and challenges with passing single bit signals across a clock domain crossing (CDC), and single-bit synchronizer
  • Part 2 – challenges with passing multi-bit signals across a CDC, and multi-bit synchronizer
  • Part 3 – design of a complete multi-bit synchronizer with feedback acknowledge

Let’s get right to it!

What is Metastability?

Any discussion of clock domain crossing (CDC) should start with a basic understanding of metastability and synchronization. In layman’s terms, metastability refers to an unstable intermediate state, where the slightest disturbance will cause a resolution to a stable state. When applied to flip-flops in digital circuits, it means a state where the flip-flop’s output may not have settled to the final expected value.

One of the ways a flip-flop can enter a metastable state is if its setup or hold time is violated. In an asynchronous clock domain crossing (CDC), where the source and destination clocks have no frequency relationship, a signal from the source domain has a non-zero probability of changing within the setup or hold time of a destination flip-flop it drives. Synchronization failure occurs when the output of the destination flip-flop goes metastable and does not converge to a legal state by the time its output must be sampled again (by the next flip-flop in the destination domain). Worse yet, that next flip-flop may also go metastable, causing metastability to propagate through the design!

Synchronizers for Clock Domain Crossing (CDC)

A synchronizer is a circuit whose purpose is to minimize the probability of a synchronization failure. We want the metastability to resolve within a synchronization period (a period of the destination clock) so that we can safely sample the output of the flip-flop in the destination clock domain. It is possible to calculate the failure rate of a synchronizer, and this is called the mean time between failure (MTBF).

Without going into the math, the takeaway is that the probability of hitting a metastable state in a clock domain crossing (CDC) is proportional to:

  1. Frequency of the destination domain
  2. Rate of data crossing the clock boundary

This result gives us some ideas on how to design a good synchronizer. Interested readers can refer to Metastability and Synchronizers: A Tutorial for a tutorial on the topic of metastability, and some interesting graphs of how flip-flops can become metastable.

Two flip-flop synchronizer

Two flip-flop synchronizer for clock domain crossing (CDC)

The most basic synchronizer is two flip-flop in series, both clocked by the destination clock. This simple and unassuming circuit is called a two flip-flop synchronizer. If the input data changes very close to the receiving clock edge (within setup/hold time), the first flip-flop in the synchronizer may go metastable, but there is still a full clock for the signal to become stable before being sampled by the second flip-flop. The destination domain logic then uses the output from the second flip-flop. Theoretically it is possible for the signal to still be metastable by the time it is clocked into the second flip-flop (every MTBF years). In that case a synchronization failure occurs and the design would likely malfunction.

The two flip-flop synchronizer is sufficient for many applications. Very high speed designs may require a three flip-flop synchronizer to give sufficient MTBF. To further increase MTBF, two flip-flop synchronizers are sometimes built from fast library cells (low threshold voltage) that have better setup/hold time characteristics.

Registering source signals into the synchronizer

It is a generally good practice to register signals in the source clock domain before sending them across the clock domain crossing (CDC) into synchronizers. This eliminates combinational glitches, which can effectively increase the rate of data crossing the clock boundary, reducing MTBF of the synchronizer.

Synchronizing Slow Signals Into Fast Clock Domain

The easy case is passing signals from a slow clock domain to a fast clock domain. This is generally not a problem as long as the faster clock is > 1.5x frequency of the slow clock. The fast destination clock will simply sample the slow signal more than once. In these cases, a simple two-flip-flop synchronizer may suffice.

If the fast clock is < 1.5x frequency of the slow clock, then there can be a potential problem, and you should use one of the solutions in the next section.

Synchronizing Fast Signals Into Slow Clock Domain

The more difficult case is, of course, passing a fast signal into a slow clock domain. The obvious problem is if a pulse on the fast signal is shorter than the period of the slow clock, then the pulse can disappear before being sampled by the slow clock. This scenario is shown in the waveform below.

Fast source pulse missed by slow clock in clock domain crossing (CDC)

A less obvious problem is even if the pulse was just slightly wider than the period of the slow clock, the signal can change within the setup/hold time of the destination flip-flop (on the slow clock), violating timing and causing metastability.

Before deciding on how to handle this clock domain crossing (CDC), you should first ask yourself whether every value of the source signal is needed in the destination domain. If you don’t (meaning it is okay to drop some values) then it may suffice to use an “open-loop” synchronizer without acknowledgement. An example is the gray code pointer from my Dual-Clock Asynchronous FIFO in SystemVerilog; it needs to be accurate when read, but the FIFO pointer may advance several times before a read occurs and the value is used. If, on the other hand, you need every value in the destination domain, then a “closed-loop” synchronizer with acknowledgement may be needed.

Single bit — two flip-flop synchronizer

A simple two flip-flop synchronizer is the fastest way to pass signals across a clock domain crossing. It can be sufficient in many applications, as long as the signal generated in the fast clock domain is wider than the cycle time of the slow clock. For example, if you just need to synchronize a slow changing status signal, this may work. A safe rule of thumb is the signal must be wider than 1.5x the cycle width of the destination clock (“three receiving clock edge” requirement coined in Mark Litterick’s paper on clock domain crossing verification). This guarantees the signal will be sampled by at least one (but maybe more) clock edge of the destination clock. The requirement can be easily checked using SystemVerilog Assertions (SVA).

The 1.5x cycle width is easy to enforce when the relative clock frequencies of the source and destination are fixed. But there are real world scenarios where they won’t be. In a memory controller design I worked on, the destination clock can take on three different frequencies, which can be either faster/slower/same as the source clock. In that situation, it was not easy to design the clock domain crossing signals to meet the 1.5x cycle width of the slowest destination clock.

Single bit — synchronizer with feedback acknowledge

A synchronizer with feedback acknowledge is slightly more involved, but not much. The figure below illustrates how it works.

Single bit feedback synchronizer for clock domain crossing (CDC) diagram

The source domain sends the signal to the destination clock domain through a two flip-flop synchronizer, then passes the synchronized signal back to the source clock domain through another two flip-flop synchronizer as a feedback acknowledgement. The figure below shows a waveform of the synchronizer.

Single bit feedback synchronizer for clock domain crossing (CDC) waveform

This solution is very safe, but it does come at a cost of increased delay due to synchronizing in both directions before allowing the signal to change again. This solution would work in my memory controller design to handle the varying clock frequency relationship.

Conclusion

Even though we would all like to live in a purely synchronous world, in real world applications you will undoubtedly run into designs that require multiple asynchronous clocks. This article described two basic techniques to pass a single control signal across a clock domain crossing (CDC). Clock domain crossing (CDC) logic bugs are elusive and extremely difficult to debug, so it is imperative to design synchronization logic correctly from the start!

Passing a single control signal across a clock domain crossing (CDC) isn’t very exciting. In Clock Domain Crossing Techniques – Part 2, I will discuss the difficulties with passing multiple control signals, and some possible solutions.

References

Sample Source Code

The accompany source code for this article is the single-bit feedback synchronizer and testbench, which generates the following waveform when run. Download and run it to see how it works!

Single bit feedback synchronizer simulation

Download Article Companion Source Code

Get the FREE, EXECUTABLE test bench and source code for this article, notification of new articles, and more!
.

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

Dual-Clock Asynchronous FIFO in SystemVerilog

Apology for the lack of updates, but I have been on a rather long vacation to Asia and am slowly getting back into the rhythm of work and blogging. One of my first tasks after returning to work was to check over the RTL of an asynchronous FIFO in Verilog. What better way to relearn a topic than to write about it! Note that this asynchronous FIFO design is based entirely on Cliff Cumming’s paper Simulation and Synthesis Techniques for Asynchronous FIFO Design. Please treat this as a concise summary of the design described his paper (specifically, FIFO style #1 with gray code counter style #2). This article assumes knowledge of basic synchronous FIFO concepts.

Metastability and synchronization is an extremely complex topic that has been the subject of over 60 years of research. There are many known design methods to safely pass data asynchronously from one clock domain to another, one of which is using an asynchronous FIFO. An asynchronous FIFO refers to a FIFO where data is written from one clock domain, read from a different clock domain, and the two clocks are asynchronous to each other.

Clock domain crossing logic is inherently difficult to design, and even more difficult to verify. An almost correct design may function 99% of the time, but the 1% failure will cost you countless hours of debugging, or worse, a respin. Therefore, it is imperative to design them correctly from the beginning! This article describes one proven method to design an asynchronous FIFO.

Asynchronous FIFO Pointers

In a synchronous FIFO design, one way to determine whether a FIFO is full or empty is to use separate count register to track the number of entries in the FIFO. This requires the ability to both increment and decrement the counter, potentially on the same clock. The same technique cannot be used in an asynchronous FIFO, however, because two different clocks will be needed to control the counter.

Instead, the asynchronous FIFO design uses a different technique (also derived from synchronous FIFO design) of using an additional bit in the FIFO pointers to detect full and empty. In this scheme, full and empty is determined by comparing the read and write pointers. The write pointer always points to the next location to be written; the read pointer always points to the current FIFO entry to be read. On reset, both pointers are reset to zero. The FIFO is empty when the two pointers (including the extra bit) are equal. It is full when the MSB of the pointers are different, but remaining bits are equal. This FIFO pointer convention has the added benefit of low access latency. As soon as data has been written, the FIFO drives read data to the FIFO data output port, hence the receive side does not need to use two clocks (first set a read enable, then read the data) to read out the data.

Synchronizing Pointers Across Clock Domains

Synchronizing a binary count (pointer) across clock domains is going to pose a difficulty, however. All bits of a binary counter can change simultaneously, for example a 4-bit count changing from 7->8 (4’b0111->4’b1000). To pass this value safely across a clock domain crossing requires careful synchronization and handshaking such that all bits are sampled and synchronized on the same edge (otherwise the value will be incorrect). It can be done with some difficulty, but a simpler method that bypasses this problem altogether is to use Gray code to encode the pointers.

Gray codes are named after Frank Gray, who first patented the codes. The code distance between any two adjacent Gray code words is 1, which means only 1 bit changes from one Gray count to the next. Using Gray code to encode pointers eliminates the problem of synchronizing multiple changing bits on a clock edge. The most common Gray code is a reflected code where the bits in any column except the MSB are symmetrical about the sequence mid-point. An example 4-bit Gray code counter is show below. Notice the MSB differs between the first and 2nd half, but otherwise the remaining bits are mirrored about the mid-point. The Gray code never changes by more than 1 bit in a transition.

Decimal CountBinary CountGray Code Count
04'b00004'b0000
14'b00014'b0001
24'b00104'b0011
34'b00114'b0010
44'b01004'b0110
54'b01014'b0111
64'b01104'b0101
74'b01114'b0100
84'b10004'b1100
94'b10014'b1101
104'b10104'b1111
114'b10114'b1110
124'b11004'b1010
134'b11014'b1011
144'b11104'b1001
154'b11114'b1000

Gray Code Counter

The Gray code counter used in this design is “Style #2” as described in Cliff Cumming’s paper. The FIFO counter consists of an n-bit binary counter, of which bits [n-2:0] are used to address the FIFO memory, and an n-bit Gray code register for storing the Gray count value to synchronize to the opposite clock domain. One important aspect about a Gray code counter is they generally must have power-of-2 counts, which means a Gray code pointer FIFO will have power-of-2 number of entries. The binary count value can be used to implement FIFO “almost full” or “almost empty” conditions.

Asynchronous FIFO in SystemVerilog binary and gray pointers

Converting Binary to Gray

To convert a binary number to Gray code, notice that the MSB is always the same. All other bits are the XOR of pairs of binary bits:

Converting Gray to Binary

To convert a Gray code to a binary number, notice again that the MSB is always the same. Each other binary bit is the XOR of all of the more significant Gray code bits:

Generating full & empty conditions

The FIFO is empty when the read pointer and the synchronized write pointer, including the extra bit, are equal. In order to efficiently register the rempty output, the synchronized write pointer is actually compared against the rgraynext (the next Gray code to be registered into rptr).

The full flag is trickier to generate. Dissecting the Gray code sequence, you can come up with the following conditions that all need to be true for the FIFO to be full:

  1. MSB of wptr and synchronized rptr are not equal
  2. Second MSB of wptr and synchronized rptr are not equal
  3. Remaining bits of wptr and synchronized rptr are equal

Similarly, in order to efficiently register the wfull output, the synchronized read pointer is compared against the wgnext (the next Gray code that will be registered in the wptr).

Asynchronous FIFO (Style #1) – Putting It Together

Here is the complete asynchronous FIFO put together in a block diagram.

Asynchronous FIFO in SystemVerilog block diagram

The design is partitioned into the following modules.

  • fifo1 – top level wrapper module
  • fifomem – the FIFO memory buffer that is accessed by the write and read clock domains
  • sync_r2w – 2 flip-flop synchronizer to synchronize read pointer to write clock domain
  • sync_w2r – 2 flip-flop synchronizer to synchronize write pointer to read clock domain
  • rptr_empty – synchronous logic in the read clock domain to generate FIFO empty condition
  • wptr_full – synchronous logic in the write clock domain to generate FIFO full condition

Sample source code can be downloaded at the end of this article.

Conclusion

An asynchronous FIFO is a proven design technique to pass multi-bit data across a clock domain crossing. This article describes one known good method to design an asynchronous FIFO by synchronizing Gray code pointers across the clock domain crossing to determine full and empty conditions.

Whew! This has been one of the longer articles. I’m simultaneously surprised that 1) this article took 1300 words, and that 2) it only took 1300 words to explain an asynchronous FIFO. Do you have other asynchronous FIFO design techniques? Please share in the comments below!

References

Sample Source Code

The accompanying source code for this article is the dual-clock asynchronous FIFO design with testbench, which generates the following waveform when run. Download and run the code to see how it works!

Asynchronous FIFO 1 waveform

Download Article Companion Source Code

Get the FREE, EXECUTABLE test bench and source code for this article, notification of new articles, and more!
.

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

One-hot State Machine in SystemVerilog – Reverse Case Statement

Finite state machine (FSM) is one of the first topics taught in any digital design course, yet coding one is not as easy as first meets the eye. There are Moore and Mealy state machines, encoded and one-hot state encoding, one or two or three always block coding styles. Recently I was reviewing a coworker’s RTL code and came across a SystemVerilog one-hot state machine coding style that I was not familiar with. Needless to say, it became a mini research topic resulting in this blog post.

When coding state machines in Verilog or SystemVerilog, there are a few general guidelines that can apply to any state machine:

  1. If coding in Verilog, use parameters to define state encodings instead of ‘define macro definition. Verilog ‘define macros have global scope; a macro defined in one module can easily be redefined by a macro with the same name in a different module compiled later, leading to macro redefinition warnings and unexpected bugs.
  2. If coding in SystemVerilog, use enumerated types to define state encodings.
  3. Always define a parameter or enumerated type value for each state so you don’t leave it to the synthesis tool to choose a value for you. Otherwise it can make for a very difficult ECO when it comes time to reverse engineer the gate level netlist.
  4. Make curr_state and next_state declarations right after the parameter or enumerated type assignments. This is simply clean coding style.
  5. Code all sequential always block using nonblocking assignments (<=). This helps guard against simulation race conditions.
  6. Code all combinational always block using blocking assignments (=). This helps guard against simulation race conditions.

SystemVerilog enumerated types are especially useful for coding state machines. An example of using an enumerated type as the state variable is shown below.

Notice that enumerated types allow X assignments. Enumerated types can be displayed as names in simulator waveforms, which eliminates the need of a Verilog trick to display the state name in waveform as a variable in ASCII encoding.

One-hot refers to how each of the states is encoded in the state vector. In a one-hot state machine, the state vector has as many bits as number of states. Each bit represents a single state, and only one bit can be set at a time—one-hot. A one-hot state machine is generally faster than a state machine with encoded states because of the lack of state decoding logic.

SystemVerilog and Verilog has a unique (pun intended) and efficient coding style for coding one-hot state machines. This coding style uses what is called a reverse case statement to test if a case item is true by using a case header of the form case (1’b1). Example code is shown below:

In this one-hot state machine coding style, the state parameters or enumerated type values represent indices into the state and next vectors. Synthesis tools interpret this coding style efficiently and generates output assignment and next state logic that does only 1-bit comparison against the state vectors. Notice also the use of always_comb and always_ff SystemVerilog always statements, and unique case to add some run-time checking.

An alternate one-hot state machine coding style to the “index-parameter” style is to completely specify the one-hot encoding for the state vectors, as shown below:

According to Cliff Cummings’ 2003 paper, this coding style yields poor performance because the Design Compiler infers a full 4-bit comparison against the state vector, in effect defeating the speed advantage of a one-hot state machine. However, the experiments conducted in this paper were done in 2003, and I suspect synthesis tools have become smarter since then.

State machines may look easy on paper, but are often not so easy in practice. Given how frequently state machines appear in designs, it is important for every RTL designer to develop a consistent and efficient style for coding them. One-hot state machines are generally preferred in applications that can trade-off area for a speed advantage. This article demonstrated how they can be coded in Verilog and SystemVerilog using a unique and very efficient “reverse case statement” coding style. It is a technique that should be in every RTL designer’s arsenal.

What are your experiences with coding one-hot state machines? Do you have another coding style or synthesis results to share? Leave a comment below!

References

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

SystemVerilog always_comb, always_ff. New and Improved.

Verilog engineers will be familiar with using always to code recurring procedures like sequential logic, and most will have used always @(*) to code combinational logic. SystemVerilog defines four forms of always procedures: always, always_comb, always_ff, always_latch. What do the three new always procedures bring, and should you be using them? This article will attempt to answer these questions.

Verilog always @*

Verilog 2001 first introduced always @* as a shortcut to code combinational logic. The always @* is intended to infer a complete sensitivity list for both RTL simulation and synthesis from the contents of the block, saving a designer from having to manually specify a sensitivity list to code combinational logic. However, it does not infer a complete sensitivity list when the always @* block contains functions. Namely, it will not infer sensitivity to signals that are externally referenced in a function or a task that is called from the always block. It will only be sensitive to the signals passed into the function or task. Here is an example that illustrates the behaviour:

When the code is executed, it gives the following output. Notice that change on a, b does not trigger the always @* block to be reevaluated, but change on c does.

SystemVerilog always_comb

SystemVerilog always_comb solves this limitation. It descends into functions to infer a complete sensitivity list that is sensitive to changes within contents of a function, as illustrated in the example above. However, it doesn’t infer sensitivity from tasks (if a task takes zero time, you can use a void function instead to infer sensitivity). It improves upon always @* in some other ways as well:

  • Variables written on the left-hand side of assignments within always_comb cannot be written to by any other process
  • The procedure is automatically triggered once after all always and initial blocks have executed to ensure the outputs of the always_comb block match the input conditions before advancing simulation time; always @* waits until a change occurs on a signal in the inferred sensitivity list, which can sometimes cause unexpected behaviour due to race conditions at time zero
  • To ensure it models combinational logic, always_comb cannot include blocking timing or event controls, or fork-join
  • Extending usage to SystemVerilog assertions, always_comb is sensitive to expressions in immediate assertions within the procedure

In short, SystemVerilog always_comb is a better version of always @* and should always (pun intended) be used.

SystemVerilog always_ff

SystemVerilog always_ff procedure is used to model sequential flip-flop logic. It has similar improvements compared to using plain Verilog always. A always_ff procedure adds a restriction that it can contain one and only one event control and no blocking timing controls. Variables written on the left-hand side of assignments within always_ff, including variables from contents of a called function, cannot be written by other processes. Also, it is recommended that software tools perform additional checks to ensure code within the procedure models flip-flop behaviour. However, these checks are not defined in the SystemVerilog LRM.

SystemVerilog always_latch

Finally, SystemVerilog always_latch is used to model latch logic. It has identical rules to always_comb, and the SystemVerilog LRM recommends software tools perform additional checks to ensure code within the procedure models latch behaviour.

The three new SystemVerilog always procedures bring some enhanced capabilities. SystemVerilog always_comb, in particular, improves upon the Verilog always @* in several positive ways, and is undoubtedly the most useful of the three. I would recommend using all three in all newly written SystemVerilog code, if not for their new features, at least to convey design intent.

Do you have other experiences or gotchas with using (or not using) the new SystemVerilog always procedures? Leave a comment below!

References

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

SystemVerilog Unique And Priority – How Do I Use Them?

Improperly coded Verilog case statements can frequently cause unintended synthesis optimizations or unintended latches. These problems, if not caught in pre-silicon simulations or gate level simulations, can easily lead to a non-functional chip. The new SystemVerilog unique and priority keywords are designed to address these coding traps. In this article, we will take a closer look at how to use these new SystemVerilog keywords in RTL coding. The reader is assumed to have knowledge of how Verilog case statements work. Those who are not familiar can refer to my previous post “Verilog twins: case, casez, casex. Which Should I Use?

The SystemVerilog unique and priority modifiers are placed before an if, case, casez, casex statement, like this:

With the if…else statement, the SystemVerilog unique or priority keyword is placed only before the first if, but affects all subsequent else if and else statements.

SystemVerilog Unique Keyword

The unique keyword tells all software tools that support SystemVerilog, including those for simulation, synthesis, lint-checking, formal verification, that each selection item in a series of decisions is unique from any other selection item in that series, and that all legal cases have been listed. In other words, each item is mutually exclusive, and the if…else or case statement specifies all valid selection items.

It is easier to illustrate the effects of SystemVerilog unique using a case statement. unique case causes a simulator to add run-time checks that will report a warning if any of the following conditions are true:

  1. More than one case item matches the case expression
  2. No case item matches the case expression, and there is no default case

To illustrate how SystemVerilog unique affects simulation of case statements, let’s look at a wildcard casez statement:

You may recognize that this code resembles the priority decoder example from my previous post “Verilog twins: case, casez, casex. Which Should I Use?” However, by adding the SystemVerilog unique keyword, the behaviour is now completely different.

Firstly, by adding the SystemVerilog unique keyword, the designer asserts that only one case item can match at a time. If more than one bit of irq is set in simulation, the simulator will generate a warning, flagging that the assumption of irq being one-hot has been violated. Secondly, to synthesis tools, the unique keyword tells the tool that all valid case items have been specified, and can be evaluated in parallel. Synthesis is free to optimize the case items that are not listed.

Read the second point again, it is paramount! In a unique case statement (without a default case), outputs after synthesis from any unlisted case item is indeterminate. In simulation you may see a deterministic behaviour, maybe even an output that looks correct (along with an easy-to-miss warning), but that may not match what you see in silicon. I have personally seen a chip that did not work because of this coding error.

Back to the example, because of the unique keyword, synthesis will remove the priority logic. Thus, this code example is actually a decoder with no priority logic. Eliminating unnecessary priority logic typically results in smaller and faster logic, but only if it is indeed the designer’s intention.

The SystemVerilog unique keyword can be applied similarly to an if…else statement to convey the same uniqueness properties. For a unique if statement, a simulator will generate a run-time warning if any of the following is false:

  1. If two or more of the if conditions are true at the same time
  2. If all of the if conditions (including else if) are false, and there is no final else branch

SystemVerilog 2012 adds the keyword unique0 which, when used with a case or if statement, generates a warning only for the first condition above.

SystemVerilog Priority Keyword

The priority keyword instructs all tools that support SystemVerilog that each selection item in a series of decisions must be evaluated in the order in which they are listed, and all legal cases have been listed. A synthesis tool is free to optimize the logic assuming that all other unlisted conditions are don’t cares. If the priority case statement includes a case default statement, however, then the effect of the priority keyword is disabled because the case statement has then listed all possible conditions. In other words, the case statement is full.

Since the designer asserts that all conditions have been listed, a priority case will cause simulators to add run-time checks that will report a warning for the following condition:

  1. If the case expression does not match any of the case item expressions, and there is no default case

A priority if will cause simulators to report a warning if all of the if…if else conditions are false, and there is no final else branch. An else branch will disable the effect of the priority if.

When to Use Them

SystemVerilog unique and priority should be used especially in case statements that infer priority or non-priority logic. Using these keywords help convey design intent, guide synthesis tools to the correct result, and adds simulation and formal verification assertions that check for violation of design assumptions. One suggestion from “full_case parallel_case”, the Evil Twins of Verilog Synthesis is to code intentional priority encoders using if…else if statements rather than case statements, as it is easier for the typical engineer to recognize a priority encoder coded that way.

SystemVerilog unique and priority do not guarantee the removal of unwanted latches. Any case statement that makes assignments to more than one output in each case item statement can still generate latches if one or more output assignments are missing from other case item statements. One of the easiest ways to avoid these unwanted latches is by making a default assignment to the outputs before the case statement.

The unique and priority keywords should not be blindly added to any case and if statements either. Below is an example where the priority keyword will cause a design to break. The hardware that is intended is a decoder with enable en. When en=0, the decoder should output 4’b0000 on y.

The logic will synthesize to something like this:
SystemVerilog priority case incorrect usage

Here the priority keyword indicates that all unlisted case items are don’t cares, and can be optimized. As a result, the synthesis tool will simply optimize away en, which results in a different hardware than what was intended. A simulator will report a warning whenever en=0, which should raise an alarm warning that something is wrong. The unique keyword will have the same result here.

Conclusion

SystemVerilog unique and priority help avoid bugs from incorrectly coded case and if…else statements. They are part of the SystemVerilog language which means all tools that support SystemVerilog, including those for simulation, lint-checking, formal verification, synthesis, all have to implement the same specification of these keywords. Using these keywords help convey design intent, guide synthesis tools to the correct result, and adds simulation and formal verification checks for violation of design assumption.

In this post I have purposely tried to avoid discussing the Verilog pragmas full_case and parallel_case to write a more stand-alone discussion of the SystemVerilog unique and priority keywords. Those who are interested in the historical development of these keywords from Verilog full_case and parallel_case can refer to “full_case parallel_case”, the Evil Twins of Verilog Synthesis.

Do you have other experiences or examples of when to use or not use unique and priority? Leave a comment below!

References

Quiz and Sample Source Code

Now it’s time for a quiz! How will each of the following variations of case statement behave when the case expression

  1. matches one of the non-default case items
  2. does not match any non-default case item
  3. contains all X’s (e.g. if the signal comes from an uninitialized memory)
  4. contains all Z’s (e.g. if the signal is unconnected)
  5. contains a single bit X
  6. contains a single bit Z?

Case statement variations:

  1. Plain case statement
  2. Plain case with default case
  3. Casez
  4. Casez with default case
  5. Casex
  6. Casex with default case
  7. Unique case
  8. Unique case with default case
  9. Unique casez
  10. Unique casex

Stumped? Download the executable source code to find out the answer!

Download Article Companion Source Code

Get the FREE, EXECUTABLE test bench and source code for this article, notification of new articles, and more!
Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

Verilog twins: case, casez, casex. Which Should I Use?

The Verilog case statement is a convenient structure to code various logic like decoders, encoders, onehot state machines. Verilog defines three versions of the case statement: case, casez, casex. Not only is it easy to confuse them, but there are subtleties between them that can trip up even experienced coders. In this article I will highlight the identifying features of each of the twins, and discuss when each should be used.

Basic Verilog Case Statement

Let’s start by reviewing the basic case statement:

A case statement has the following parts:

  • Case statement header—consists of the case, casez, or casex keyword followed by case expression
  • Case expression—the expression in parentheses immediately following the case keyword. Valid expressions include constants (e.g. 1’b1), an expression that evaluates to a constant, or a vector
  • Case item—the expression that is compared against the case expression. Note that C-style break is implied following each case item statement
  • Case item statement—one or more statements that is executed if the case item matches the current case expression. If more than one statement is required, they must be enclosed with begin…end
  • Case default—optional, but can include statements to be executed if none of the defined case items match the current case expression

Wildcard Case Statement: casez

The plain case statement is simple but rigid—everything must be explicitly coded. In some situations, you may want to specify a case item that can match multiple case expressions. This is where “wildcard” case expressions casez and casex come in. casez allows “Z” and “?” to be treated as don’t care values in either the case expression and/or the case item when doing case comparison. For example, a case item 2’b1? (or 2’b1Z) in a casez statement can match case expression of 2’b10, 2’b11, 2’b1X, 2’b1Z. It is generally recommended to use “?” characters instead of “Z” characters in the case item to indicate don’t care bits.

Verilog “wildcard” case statements can have overlapping case items. If more than one case item can match a case expression, the first matching case item has priority. Thus, priority logic can be inferred from a case statement. The following code snippet illustrates how casez can be used to code priority logic. It simulates and synthesizes correctly as a priority decoder.

The logic will look something like this:
Verilog casez priority decoder

Even though this may seem an elegant way to code a priority decoder, the priority intention may only be apparent to the most experienced coders. Therefore, it is generally recommended to code priority logic using the more explicit if…else statement to clearly convey the intention.

While wildcard case comparison can be useful, it also has its dangers. Imagine a potentially dangerous casez statement where the case expression is a vector and one bit resolves to a “Z”, perhaps due to a mistakenly unconnected input. That expression will match a case item with any value for the “Z” bit! To put in more concrete terms, if the LSB of irq in the above code snippet is unconnected such that the case expression evaluates to 3’b00Z, the third case item will still match and int0 will be set to 1, potentially masking a bug!

Even wilder: casex

Now that we understand the usage and dangers of casez, it is straight-forward to extend the discussion to casex. casex allows “Z”, “?”, and “X” to be treated as don’t care values in either the case expression and/or the case item when doing case comparison. That means, everything we discussed for casez also applies for casex, plus “X” is now also a wildcard. In my previous article on Verilog X Optimism I discussed how X’s can propagate around a design and mask design issues. These propagated X’s can easily cause problems when combined with casex statements. To avoid these problems, the recommendation from RTL Coding Styles That Yield Simulation and Synthesis Mismatches is not to use casex at all for synthesizable code.

Verilog case, casez, casex each has its place and use cases. Understanding the differences between them is key to using them correctly and avoiding bugs. It may also help you in your next job interview 🙂

Have you come across any improper usage of these constructs? What are your recommendations for using them, especially casex? Leave a comment below!

SystemVerilog Unique Case

SystemVerilog adds a possible unique modifier to case statements. How does that change the behaviour? Head over to my post SystemVerilog Unique And Priority – How Do I Use Them?

References

  1. RTL Coding Styles That Yield Simulation and Synthesis Mismatches
  2. “full_case parallel_case”, the Evil Twins of Verilog Synthesis

Quiz and Sample Source Code

Now it’s time for a quiz! How will each of the following variations of case statement behave when the case expression

  1. matches one of the non-default case items
  2. does not match any non-default case item
  3. contains all X’s (e.g. if the signal comes from an uninitialized memory)
  4. contains all Z’s (e.g. if the signal is unconnected)
  5. contains a single bit X
  6. contains a single bit Z

Case statement variations:

  1. Plain case statement
  2. Plain case with default case
  3. Casez
  4. Casez with default case
  5. Casex
  6. Casex with default case
  7. Unique case
  8. Unique case with default case
  9. Unique casez
  10. Unique casex

Stumped? Download the executable source code to find out the answer!

Download Article Companion Source Code

Get the FREE, EXECUTABLE test bench and source code for this article, notification of new articles, and more!
Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu