SystemVerilog Struct and Union – for Designers too

SystemVerilog struct (ure) and union are very similar to their C programming counterparts, so you may already have a good idea of how they work. But have you tried using them in your RTL design? When used effectively, they can simplify your code and save a lot of typing. Recently, I tried incorporating SystemVerilog struct and union in new ways that I had not done before with surprisingly (or not surprisingly?) good effect. In this post I would like to share with you some tips on how you can also use them in your RTL design.

What is a SystemVerilog Struct (ure)?

A SystemVerilog struct is a way to group several data types. The entire group can be referenced as a whole, or the individual data type can be referenced by name. It is handy in RTL coding when you have a collection of signals you need to pass around the design together, but want to retain the readability and accessibility of each separate signal.

When used in RTL code, a packed SystemVerilog struct is the most useful. A packed struct is treated as a single vector, and each data type in the structure is represented as a bit field. The entire structure is then packed together in memory without gaps. Only packed data types and integer data types are allowed in a packed struct. Because it is defined as a vector, the entire structure can also be used as a whole with arithmetic and logical operators.

An unpacked SystemVerilog struct, on the other hand, does not define a packing of the data types. It is tool-dependent how the structure is packed in memory. Unpacked struct probably will not synthesize by your synthesis tool, so I would avoid it in RTL code. It is, however, the default mode of a structure if the packed keyword is not used when defining the structure.

SystemVerilog struct is often defined with the typedef keyword to give the structure type a name so it can be more easily reused across multiple files. Here is an example:

What is a SystemVerilog Union?

A SystemVerilog union allows a single piece of storage to be represented different ways using different named member types. Because there is only a single storage, only one of the data types can be used at a time. Unions can also be packed and unpacked similarly to structures. Only packed data types and integer data types can be used in packed union. All members of a packed (and untagged, which I’ll get to later) union must be the same size. Like packed structures, packed union can be used as a whole with arithmetic and logical operators, and bit fields can be extracted like from a packed array.

A tagged union is a type-checked union. That means you can no longer write to the union using one member type, and read it back using another. Tagged union enforces type checking by inserting additional bits into the union to store how the union was initially accessed. Due to the added bits, and inability to freely refer to the same storage using different union members, I think this makes it less useful in RTL coding.

Take a look at the following example, where I expand the earlier SystemVerilog struct into a union to provide a different way to access that same piece of data.

Ways to Use SystemVerilog Struct in a Design

There are many ways to incorporate SystemVerilog struct into your RTL code. Here are some common usages.

Encapsulate Fields of a Complex Type

One of the simplest uses of a structure is to encapsulate signals that are commonly used together into a single unit that can be passed around the design more easily, like the opcode structure example above. It both simplifies the RTL code and makes it more readable. Simulators like Synopsys VCS will display the fields of a structure separately on a waveform, making the structure easily readable.

If you need to use the same structure in multiple modules, a tip is to put the definition of the structure (defined using typedef) into a SystemVerilog package, then import the package into each RTL module that requires the definition. This way you will only need to define the structure once.

SystemVerilog Struct as a Module Port

A module port can have a SystemVerilog struct type, which makes it easy to pass the same bundle of signals into and out of multiple modules, and keep the same encapsulation throughout a design. For example a wide command bus between two modules with multiple fields can be grouped into a structure to simplify the RTL code, and to avoid having to manually decode the bits of the command bus when viewing it on a waveform (a major frustration!).

Using SystemVerilog Struct with Parameterized Data Type

A structure can be used effectively with modules that support parameterized data type. For example if a FIFO module supports parameterized data type, the entire structure can be passed into the FIFO with no further modification to the FIFO code.

Ways to Use SystemVerilog Union in a Design

Until very recently, I had not found a useful way to use a SystemVerilog union in RTL code. But I finally did in my last project! The best way to think about a SystemVerilog union is that it can give you alternative views of a common data structure. The packed union opcode example above has a “fields view” and a “dword view”, which can be referred to in different parts of a design depending on which is more convenient. For example, if the opcode needs to be buffered in a 64-bit buffer comprised of two 32-bit wide memories, then you can assign one dword from the “dword view” as the input to each memory, like this:

In my last project, I used a union this way to store a wide SystemVerilog struct into multiple 39-bit memories in parallel (32-bit data plus 7-bit SECDED encoding). The memories were divided this way such that each 32-bit dword can be individually protected by SECDED encoding so it is individually accessible by a CPU. I used a “dword view” of the union in a generate loop to feed the data into the SECDED encoders and memories. It eliminated alot of copying and pasting, and made the code much more concise!

Conclusion

SystemVerilog struct and union are handy constructs that can encapsulate data types and simplify your RTL code. They are most effective when the structure or union types can be used throughout a design, including as module ports, and with modules that support parameterized data types.

Do you have another novel way of using SystemVerilog struct and union? Leave a comment below!

References

Sample Source Code

I’d love to get in touch with you to learn more about what you do, your major challenges, and potential topics that may be of interest to you! Please fill in the form below to get access to the source code of a SystemVerilog design and testbench toy example that utilizes structures and unions expanded from the above code snippets. You will also get my articles as soon as they are published, delivered directly to your inbox. I may occasionally send out questionnaires to better understand your needs, or newsletters to bring you valuable information in addition to this blog. You will always have a chance to unsubscribe from any communication if you find it is not useful to you.

DOWNLOAD THE FREE SOURCE CODE

Simply fill in the form below, verify your email address
and you’ll be sent a link to download the example source code in this article.
.

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

Clock Domain Crossing Design – Part 3

In Clock Domain Crossing (CDC) Design – Part 2, I discussed potential problems with passing multiple signals across a clock domain, and one effective and safe way to do so. That circuit, however, does hot handle the case when the destination side logic cannot accept data and needs to back-pressure the source side. The two feedback schemes in this article add this final piece.

The concepts in this article are again taken from Cliff Cummings’ very comprehensive paper Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog. I highly recommend taking an hour or two to read through it.

Multi-cycle path (MCP) formulation with feedback

How can the source domain logic know when it is safe to send the next piece of data to the synchronizer? It can wait a fixed number of cycles, which can be determined from the synchronizer circuit. But a better way is to have logic in the synchronizer to indicate this to the source domain. The following figure illustrates how this can be done. Compared to the MCP without feedback circuit, it adds a 1-bit finite state machine (FSM) to indicate to the source domain whether the synchronizer is ready to accept a new piece of data.

MCP synchronizer with feedback

The 1-bit FSM has 2 states, 2 inputs, and 1 output (besides clock and reset).

  • States: Idle, Busy
  • Inputs: src_send, src_ack
  • Output: src_rdy

The logic is simple. src_send causes the FSM to transition to Busy, src_ack causes the FSM to transition back to Idle. src_rdy output is 1 when Idle, and 0 when Busy. The user logic outside the synchronizer needs to monitor these signals to determine when it is safe to send a new piece of data.

Multi-cycle path (MCP) formulation with feedback acknowledge

What if the destination domain further needs to back-pressure data from the source domain? To do so, the destination domain will need control over when to send feedback acknowledgement to the source domain. This can be accomplished by adding a similar 1-bit FSM in the destination side of the synchronizer. This addition allows the destination clock domain to throttle the source domain to not send any data until the destination domain logic is ready. The following figure illustrates this design.

MCP synchronizer with acknowledge feedback

And there you have it. A complete multi-bit synchronizer solution with handshaking. Note that this particular design is slightly different from the design described in Cliff Cummings’ paper. I have changed the destination side to output data on dest_data_out as soon as available, rather than waiting for an external signal to load the data like the bload signal in Cliff Cummings’ circuit. It doesn’t seem efficient to me to incur another cycle to make the data available.

I apologize that the diagram is a bit messy (if anyone knows of a better circuit drawing tool, please let me know). The source code will be provided below so you can study that in detail.

1-Deep Asynchronous FIFO

Obviously the dual-clock asynchronous FIFO can also be used to pass multiple bits across a clock domain crossing (CDC). What is interesting, however, is a 1-deep asynchronous FIFO (which actually has space for 2 entries, but will only ever fill 1 at a time) provides the same feedback acknowledge capability as the MCP formulation circuit above, but with 1 cycle lower latency on both the send and feedback paths.

In this configuration, pointers to the FIFO become single bit only, and toggle between 0 and 1. Read and write pointer comparison is redefined to produce a single bit value that indicates either ready to receive data (FIFO is empty, pointers equal) or not ready to receive data (FIFO is not empty, pointers not equal). Notice that when the logic indicates not ready, the write pointer has already incremented and is pointing to the next entry of the 2-entry storage to store data. Also, the FIFO read/write pointer circuitry essentially replaces the 1-bit state machine in the MCP formulation feedback scheme, therefore requiring 1 less step and 1 less cycle.

Conclusion

Over this 3-part series, we have looked at potential problems and proven design techniques to handle clock domain crossing (CDC) logic for single-bit signals (Part 1), and multi-bit signals (Part 2, and Part 3). Cliff Cummings gives a good summary from his paper:

Recommended 1-bit CDC techniques

  • Register the signal in the source domain to remove combinational settling (glitches)
  • Synchronize the signal into the destination domain

Recommended multi-bit CDC techniques

  • Consolidate multiple signals into a 1-bit representation, then synchronize the 1-bit signal
  • Use multi-cycle path (MCP) formulation to pass multi-bit data buses
  • Use asynchronous FIFOs to pass multi-bit data or control buses
  • Use gray code counters

Writing about this topic has been a much bigger (but also more rewarding) undertaking than I imagined. I must confess I’m still relatively new to asynchronous designs. So if you have any experience or design techniques to share, please leave a comment below! How about investigating clock domain crossing verification next?

References

Sample Source Code

I’d love to get in touch with you to learn more about what you do, your major challenges, and potential topics that may be of interest to you! Please fill in the form below to get access to full sample SystemVerilog source code of the MCP feedback acknowledge synchronizer design and testbench described in this article. You will also get my articles as soon as they are published, delivered directly to your inbox. I may occasionally send out questionnaires to better understand your needs, or newsletters to bring you valuable information in addition to this blog. You will always have a chance to unsubscribe from any communication if you find it is not useful to you.

Multi-bit MCP synchronizer with feedback ack wave

Please provide your name and email address for your free download.

.

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

Clock Domain Crossing Design – Part 2

In Clock Domain Crossing (CDC) Techniques – Part 1, I briefly discussed metastability and two methods to safely synchronize a single bit. While those techniques are commonly used, in many applications we need to synchronize multiple control or data bits, like an encoded state or a data bus. Synchronizing multiple bits brings a host of other potential problems that need to be carefully examined, and solutions that build upon the basic blocks we discussed in part 1.

The concepts in this article are again mostly taken from Cliff Cumming’s very comprehensive paper Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog. I highly recommend taking an hour or two to read through it.

Problems With Passing Multiple Control Signals Across Clock Domain Crossing (CDC)

The fundamental problem with passing multiple bits is if they are synchronized individually, they cannot be guaranteed to arrive in the destination clock domain on the same clock edge. When the individual bits are launched from the source clock domain, they may be skewed relative to each other due to trace length, process variation, etc. Since in an asynchronous clock domain crossing (CDC) the destination clock can have every possible alignment relative to the source clock (and relative to the skewed data bits), the destination clock can (and will) sample at a time when not all the bits are at their stable final values. Therefore synchronizing individual bits of a multi-bit signal is not sufficient! Let’s look at several potential problems.

Two simultaneously required signals

The waveform below shows how data skew from the source clock domain can cause two signals to arrive in different clock cycles in the destination domain, if they are synchronized individually using two flip-flop synchronizers. Don’t do this!

Problematic clock domain crossing (CDC): two simultaneously required signals

Two sequenced signals

Individually synchronizing two signals that require precise sequencing across an asynchronous clock domain crossing (CDC) is a recipe for disaster. In fact a recent ASIC project at work had a problem like this that resulted in a chip that only booted 50% of the time, months of debug, and finally a respin (we make mistakes too).

The waveform below shows how two separate signals that are intended to arrive 1 cycle apart in the destination domain, can actually arrive either 1 or 2 cycles apart depending on data skew. It’s difficult to even analyze the frequency difference from the source to destination clock domain and come up with a potential sequence that may work… Just don’t do this. There are better ways!

Problematic clock domain crossing (CDC): two sequenced signals

Encoded control signals

There are many scenarios where you may want to pass a multi-bit signal across a clock domain crossing (CDC), such as an encoded signal. By now we understand the potential problem, right? Due to data skew, the different bits may take different number of cycles to synchronize, and we may not be able to read the same encoded value on the destination clock domain. You may get away with using a simple two flip-flop synchronizer if you know there will be sufficient time for the signal to settle before reading the synchronized value (like a relatively static encoded status signal). But it’s still not the best practice.

Solutions For Passing Multiple Signals Across Clock Domain Crossing (CDC)

So how do we deal with synchronizing multiple signals? There are at least several solutions with different levels of complexity:

  1. Multi-bit signal consolidation
  2. Multi-cycle path (MCP) formulation without feedback
  3. Multi-cycle path (MCP) formulation with feedback acknowledge
  4. Dual-Clock Asynchronous FIFO
  5. Two-Deep FIFO

Multi-cycle path (MCP) formulation is a particularly interesting and widely applicable solution. It refers to sending unsynchronized data from the source clock domain to the destination clock domain, paired with a synchronized control (e.g. a load enable) signal. The data and control signals are sent simultaneously from the source clock domain. The data signals do not go through any synchronization, but go straight into a multi-bit flip-flop in the destination clock domain. The control signal is synchronized through a two-flip-flop synchronizer, then used to load the unsynchronized data into the flip-flops in the destination clock domain. This allows the data signals to settle (while the control signal is being synchronized), and captured together on a single clock edge. We will get into two variations of this technique in later sections.

Multi-bit signal consolidation

Consolidating multiple bits across clock domain crossing (CDC) into one is more of a best practice, than a technique. It’s always good to reduce as much as possible the number of signals that need to cross a clock domain crossing (CDC). However, this can be applied directly to the problem of sequencing two signals into the destination clock domain. A single signal can be synchronized across the clock domain crossing (CDC), and the two sequenced signals can be recreated in the destination clock domain once the synchronizing signal is received.

Multi-cycle path (MCP) formulation without feedback

The multi-cycle path (MCP) synchronizer is comprised of several components:

  1. Logic that converts a synchronization event from source clock domain to a toggle to pass across the clock domain crossing (CDC)
  2. Logic that converts the toggle into a load pulse in the destination domain
  3. Flip-flops to capture the unsynchronized data bits

One key idea in this design is that the synchronization event (a pulse) is converted into a single toggle (either low to high, or high to low) before being synchronized into the destination clock domain. Each toggle represents one event. The advantage of synchronizing a toggle is it eliminates the problem of a missing pulse when crossing from a fast clock to a slow clock domain. However, you need to be careful when resetting the synchronizer such that no unintended events are generated (i.e. if the source domain is reset on its own, and the toggle signal goes from high to low due to reset).

Source clock domain event to toggle generator

The following circuit resides in the source clock domain, and converts an event that needs to traverse the clock domain crossing (CDC) into a toggle, which cannot be missed due to sampling in the destination clock domain.
CDC pulse to toggle generator (source clock) diagram

CDC pulse to toggle generator (source clock) wave

Destination clock domain toggle to load pulse generator

Next, we need a circuit in the destination clock domain to convert the toggle back into a pulse to capture the multi-bit signal.
CDC toggle to pulse generator (destination clock) diagram

CDC toggle to pulse generator (destination clock) wave

Putting it together

Finally, putting the entire synchronizer circuit together, we get the following.
MCP synchronizer without feedback

Notice the multi-bit data signal passes straight from source (clock) flip-flop to destination (clock) flip-flop to avoid problems with synchronizing multiple bits. A single control signal is synchronized to allow time for the multi-bit data to settle from possible metastable state. The load pulse from the source clock domain first gets converted into a toggle. The toggle is synchronized across the clock domain crossing (CDC), then gets converted back to a load pulse in the destination clock domain. Finally that load pulse is used to load the multi-bit data signal into flip-flops in the destination clock domain.

Conclusion

Passing multiple signals across an asynchronous clock domain crossing (CDC) can become a recipe for disaster if not done properly. This article described some potential pitfalls, and one very effective technique called multi-cycle path (MCP) formulation to synchronize multiple bits across a clock domain crossing (CDC). There is one missing piece, however. How does logic in the source clock domain know when it is safe to send another piece of data? In Part 3 of the series, I will put in the final piece and enhance the multi-cycle path (MCP) synchronizer with feedback acknowledgement.

References

Sample Source Code

You can download sample source code of the multi-bit MCP synchronizer without feedback design and testbench by clicking the following link. The waveform below is generated by running the testbench and design. Do you have any comments on the source code or anything described in this article? Please leave a comment!

multi_bit_mcp_synchronizer_no_feedbacktb.zip (237 downloads)

Multi-bit MCP synchronizer without feedback wave

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

Clock Domain Crossing Design – 3 Part Series

Thank you for all your interest in my last post on Dual-Clock Asynchronous FIFO in SystemVerilog! I decided to continue the theme of clock domain crossing (CDC) design techniques, and look at several other methods for passing control signals and data between asynchronous clock domains. This is perfect timing because I’m just about to create a new revision of one of my design blocks at work, which incorporates many of these concepts. I, too, can use a refresher 🙂

The concepts in this article are mostly taken from Cliff Cumming’s very comprehensive paper Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog. I’ve broken the topics into 3 parts:

  • Part 1 – metastability and challenges with passing single bit signals across a clock domain crossing (CDC), and single-bit synchronizer
  • Part 2 – challenges with passing multi-bit signals across a CDC, and multi-bit synchronizer
  • Part 3 – design of a complete multi-bit synchronizer with feedback acknowledge

Let’s get right to it!

What is Metastability?

Any discussion of clock domain crossing (CDC) should start with a basic understanding of metastability and synchronization. In layman’s terms, metastability refers to an unstable intermediate state, where the slightest disturbance will cause a resolution to a stable state. When applied to flip-flops in digital circuits, it means a state where the flip-flop’s output may not have settled to the final expected value.

One of the ways a flip-flop can enter a metastable state is if its setup or hold time is violated. In an asynchronous clock domain crossing (CDC), where the source and destination clocks have no frequency relationship, a signal from the source domain has a non-zero probability of changing within the setup or hold time of a destination flip-flop it drives. Synchronization failure occurs when the output of the destination flip-flop goes metastable and does not converge to a legal state by the time its output must be sampled again (by the next flip-flop in the destination domain). Worse yet, that next flip-flop may also go metastable, causing metastability to propagate through the design!

Synchronizers for Clock Domain Crossing (CDC)

A synchronizer is a circuit whose purpose is to minimize the probability of a synchronization failure. We want the metastability to resolve within a synchronization period (a period of the destination clock) so that we can safely sample the output of the flip-flop in the destination clock domain. It is possible to calculate the failure rate of a synchronizer, and this is called the mean time between failure (MTBF).

Without going into the math, the takeaway is that the probability of hitting a metastable state in a clock domain crossing (CDC) is proportional to:

  1. Frequency of the destination domain
  2. Rate of data crossing the clock boundary

This result gives us some ideas on how to design a good synchronizer. Interested readers can refer to Metastability and Synchronizers: A Tutorial for a tutorial on the topic of metastability, and some interesting graphs of how flip-flops can become metastable.

Two flip-flop synchronizer

Two flip-flop synchronizer for clock domain crossing (CDC)

The most basic synchronizer is two flip-flop in series, both clocked by the destination clock. This simple and unassuming circuit is called a two flip-flop synchronizer. If the input data changes very close to the receiving clock edge (within setup/hold time), the first flip-flop in the synchronizer may go metastable, but there is still a full clock for the signal to become stable before being sampled by the second flip-flop. The destination domain logic then uses the output from the second flip-flop. Theoretically it is possible for the signal to still be metastable by the time it is clocked into the second flip-flop (every MTBF years). In that case a synchronization failure occurs and the design would likely malfunction.

The two flip-flop synchronizer is sufficient for many applications. Very high speed designs may require a three flip-flop synchronizer to give sufficient MTBF. To further increase MTBF, two flip-flop synchronizers are sometimes built from fast library cells (low threshold voltage) that have better setup/hold time characteristics.

Registering source signals into the synchronizer

It is a generally good practice to register signals in the source clock domain before sending them across the clock domain crossing (CDC) into synchronizers. This eliminates combinational glitches, which can effectively increase the rate of data crossing the clock boundary, reducing MTBF of the synchronizer.

Synchronizing Slow Signals Into Fast Clock Domain

The easy case is passing signals from a slow clock domain to a fast clock domain. This is generally not a problem as long as the faster clock is > 1.5x frequency of the slow clock. The fast destination clock will simply sample the slow signal more than once. In these cases, a simple two-flip-flop synchronizer may suffice.

If the fast clock is < 1.5x frequency of the slow clock, then there can be a potential problem, and you should use one of the solutions in the next section.

Synchronizing Fast Signals Into Slow Clock Domain

The more difficult case is, of course, passing a fast signal into a slow clock domain. The obvious problem is if a pulse on the fast signal is shorter than the period of the slow clock, then the pulse can disappear before being sampled by the slow clock. This scenario is shown in the waveform below.

Fast source pulse missed by slow clock in clock domain crossing (CDC)

A less obvious problem is even if the pulse was just slightly wider than the period of the slow clock, the signal can change within the setup/hold time of the destination flip-flop (on the slow clock), violating timing and causing metastability.

Before deciding on how to handle this clock domain crossing (CDC), you should first ask yourself whether every value of the source signal is needed in the destination domain. If you don’t (meaning it is okay to drop some values) then it may suffice to use an “open-loop” synchronizer without acknowledgement. An example is the gray code pointer from my Dual-Clock Asynchronous FIFO in SystemVerilog; it needs to be accurate when read, but the FIFO pointer may advance several times before a read occurs and the value is used. If, on the other hand, you need every value in the destination domain, then a “closed-loop” synchronizer with acknowledgement may be needed.

Single bit — two flip-flop synchronizer

A simple two flip-flop synchronizer is the fastest way to pass signals across a clock domain crossing. It can be sufficient in many applications, as long as the signal generated in the fast clock domain is wider than the cycle time of the slow clock. For example, if you just need to synchronize a slow changing status signal, this may work. A safe rule of thumb is the signal must be wider than 1.5x the cycle width of the destination clock (“three receiving clock edge” requirement coined in Mark Litterick’s paper on clock domain crossing verification). This guarantees the signal will be sampled by at least one (but maybe more) clock edge of the destination clock. The requirement can be easily checked using SystemVerilog Assertions (SVA).

The 1.5x cycle width is easy to enforce when the relative clock frequencies of the source and destination are fixed. But there are real world scenarios where they won’t be. In a memory controller design I worked on, the destination clock can take on three different frequencies, which can be either faster/slower/same as the source clock. In that situation, it was not easy to design the clock domain crossing signals to meet the 1.5x cycle width of the slowest destination clock.

Single bit — synchronizer with feedback acknowledge

A synchronizer with feedback acknowledge is slightly more involved, but not much. The figure below illustrates how it works.

Single bit feedback synchronizer for clock domain crossing (CDC) diagram

The source domain sends the signal to the destination clock domain through a two flip-flop synchronizer, then passes the synchronized signal back to the source clock domain through another two flip-flop synchronizer as a feedback acknowledgement. The figure below shows a waveform of the synchronizer.

Single bit feedback synchronizer for clock domain crossing (CDC) waveform

This solution is very safe, but it does come at a cost of increased delay due to synchronizing in both directions before allowing the signal to change again. This solution would work in my memory controller design to handle the varying clock frequency relationship.

Conclusion

Even though we would all like to live in a purely synchronous world, in real world applications you will undoubtedly run into designs that require multiple asynchronous clocks. This article described two basic techniques to pass a single control signal across a clock domain crossing (CDC). Clock domain crossing (CDC) logic bugs are elusive and extremely difficult to debug, so it is imperative to design synchronization logic correctly from the start!

Passing a single control signal across a clock domain crossing (CDC) isn’t very exciting. In Clock Domain Crossing Techniques – Part 2, I will discuss the difficulties with passing multiple control signals, and some possible solutions.

References

Sample Source Code

I’d love to get in touch with you to learn more about what you do, your major challenges, and potential topics that may be of interest to you! Please fill in the form below to get access to the source code of the single-bit feedback synchronizer and testbench. You will also get my articles as soon as they are published, delivered directly to your inbox. I may occasionally send out questionnaires to better understand your needs, or newsletters to bring you valuable information in addition to this blog. You will always have a chance to unsubscribe from any communication if you find it is not useful to you.

Single bit feedback synchronizer simulation

Please provide your name and email address for your free download.

.

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

Dual-Clock Asynchronous FIFO in SystemVerilog

Apology for the lack of updates, but I have been on a rather long vacation to Asia and am slowly getting back into the rhythm of work and blogging. One of my first tasks after returning to work was to check over the RTL of an asynchronous FIFO in Verilog. What better way to relearn a topic than to write about it! Note that this asynchronous FIFO design is based entirely on Cliff Cumming’s paper Simulation and Synthesis Techniques for Asynchronous FIFO Design. Please treat this as a concise summary of the design described his paper (specifically, FIFO style #1 with gray code counter style #2). This article assumes knowledge of basic synchronous FIFO concepts.

Metastability and synchronization is an extremely complex topic that has been the subject of over 60 years of research. There are many known design methods to safely pass data asynchronously from one clock domain to another, one of which is using an asynchronous FIFO. An asynchronous FIFO refers to a FIFO where data is written from one clock domain, read from a different clock domain, and the two clocks are asynchronous to each other.

Clock domain crossing logic is inherently difficult to design, and even more difficult to verify. An almost correct design may function 99% of the time, but the 1% failure will cost you countless hours of debugging, or worse, a respin. Therefore, it is imperative to design them correctly from the beginning! This article describes one proven method to design an asynchronous FIFO.

Asynchronous FIFO Pointers

In a synchronous FIFO design, one way to determine whether a FIFO is full or empty is to use separate count register to track the number of entries in the FIFO. This requires the ability to both increment and decrement the counter, potentially on the same clock. The same technique cannot be used in an asynchronous FIFO, however, because two different clocks will be needed to control the counter.

Instead, the asynchronous FIFO design uses a different technique (also derived from synchronous FIFO design) of using an additional bit in the FIFO pointers to detect full and empty. In this scheme, full and empty is determined by comparing the read and write pointers. The write pointer always points to the next location to be written; the read pointer always points to the current FIFO entry to be read. On reset, both pointers are reset to zero. The FIFO is empty when the two pointers (including the extra bit) are equal. It is full when the MSB of the pointers are different, but remaining bits are equal. This FIFO pointer convention has the added benefit of low access latency. As soon as data has been written, the FIFO drives read data to the FIFO data output port, hence the receive side does not need to use two clocks (first set a read enable, then read the data) to read out the data.

Synchronizing Pointers Across Clock Domains

Synchronizing a binary count (pointer) across clock domains is going to pose a difficulty, however. All bits of a binary counter can change simultaneously, for example a 4-bit count changing from 7->8 (4’b0111->4’b1000). To pass this value safely across a clock domain crossing requires careful synchronization and handshaking such that all bits are sampled and synchronized on the same edge (otherwise the value will be incorrect). It can be done with some difficulty, but a simpler method that bypasses this problem altogether is to use Gray code to encode the pointers.

Gray codes are named after Frank Gray, who first patented the codes. The code distance between any two adjacent Gray code words is 1, which means only 1 bit changes from one Gray count to the next. Using Gray code to encode pointers eliminates the problem of synchronizing multiple changing bits on a clock edge. The most common Gray code is a reflected code where the bits in any column except the MSB are symmetrical about the sequence mid-point. An example 4-bit Gray code counter is show below. Notice the MSB differs between the first and 2nd half, but otherwise the remaining bits are mirrored about the mid-point. The Gray code never changes by more than 1 bit in a transition.

Decimal CountBinary CountGray Code Count
04'b00004'b0000
14'b00014'b0001
24'b00104'b0011
34'b00114'b0010
44'b01004'b0110
54'b01014'b0111
64'b01104'b0101
74'b01114'b0100
84'b10004'b1100
94'b10014'b1101
104'b10104'b1111
114'b10114'b1110
124'b11004'b1010
134'b11014'b1011
144'b11104'b1001
154'b11114'b1000

Gray Code Counter

The Gray code counter used in this design is “Style #2” as described in Cliff Cumming’s paper. The FIFO counter consists of an n-bit binary counter, of which bits [n-2:0] are used to address the FIFO memory, and an n-bit Gray code register for storing the Gray count value to synchronize to the opposite clock domain. One important aspect about a Gray code counter is they generally must have power-of-2 counts, which means a Gray code pointer FIFO will have power-of-2 number of entries. The binary count value can be used to implement FIFO “almost full” or “almost empty” conditions.

Asynchronous FIFO in SystemVerilog binary and gray pointers

Converting Binary to Gray

To convert a binary number to Gray code, notice that the MSB is always the same. All other bits are the XOR of pairs of binary bits:

Converting Gray to Binary

To convert a Gray code to a binary number, notice again that the MSB is always the same. Each other binary bit is the XOR of all of the more significant Gray code bits:

Generating full & empty conditions

The FIFO is empty when the read pointer and the synchronized write pointer, including the extra bit, are equal. In order to efficiently register the rempty output, the synchronized write pointer is actually compared against the rgraynext (the next Gray code to be registered into rptr).

The full flag is trickier to generate. Dissecting the Gray code sequence, you can come up with the following conditions that all need to be true for the FIFO to be full:

  1. MSB of wptr and synchronized rptr are not equal
  2. Second MSB of wptr and synchronized rptr are not equal
  3. Remaining bits of wptr and synchronized rptr are equal

Similarly, in order to efficiently register the wfull output, the synchronized read pointer is compared against the wgnext (the next Gray code that will be registered in the wptr).

Asynchronous FIFO (Style #1) – Putting It Together

Here is the complete asynchronous FIFO put together in a block diagram.

Asynchronous FIFO in SystemVerilog block diagram

The design is partitioned into the following modules.

  • fifo1 – top level wrapper module
  • fifomem – the FIFO memory buffer that is accessed by the write and read clock domains
  • sync_r2w – 2 flip-flop synchronizer to synchronize read pointer to write clock domain
  • sync_w2r – 2 flip-flop synchronizer to synchronize write pointer to read clock domain
  • rptr_empty – synchronous logic in the read clock domain to generate FIFO empty condition
  • wptr_full – synchronous logic in the write clock domain to generate FIFO full condition

Sample source code can be downloaded at the end of this article.

Conclusion

An asynchronous FIFO is a proven design technique to pass multi-bit data across a clock domain crossing. This article describes one known good method to design an asynchronous FIFO by synchronizing Gray code pointers across the clock domain crossing to determine full and empty conditions.

Whew! This has been one of the longer articles. I’m simultaneously surprised that 1) this article took 1300 words, and that 2) it only took 1300 words to explain an asynchronous FIFO. Do you have other asynchronous FIFO design techniques? Please share in the comments below!

References

Sample Source Code

I’d love to get in touch with you to learn more about what you do, your major challenges, and potential topics that may be of interest to you! Please fill in the form below to get access to full sample SystemVerilog source code of the dual-clock asynchronous FIFO design with testbench. You will also get my articles as soon as they are published, delivered directly to your inbox. I may occasionally send out questionnaires to better understand your needs, or newsletters to bring you valuable information in addition to this blog. You will always have a chance to unsubscribe from any communication if you find it is not useful to you.

Here is the waveform generated by the sample source code.

Asynchronous FIFO 1 waveform

Please provide your name and email address for your free download.

.

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

One-hot State Machine in SystemVerilog – Reverse Case Statement

Finite state machine (FSM) is one of the first topics taught in any digital design course, yet coding one is not as easy as first meets the eye. There are Moore and Mealy state machines, encoded and one-hot state encoding, one or two or three always block coding styles. Recently I was reviewing a coworker’s RTL code and came across a SystemVerilog one-hot state machine coding style that I was not familiar with. Needless to say, it became a mini research topic resulting in this blog post.

When coding state machines in Verilog or SystemVerilog, there are a few general guidelines that can apply to any state machine:

  1. If coding in Verilog, use parameters to define state encodings instead of ‘define macro definition. Verilog ‘define macros have global scope; a macro defined in one module can easily be redefined by a macro with the same name in a different module compiled later, leading to macro redefinition warnings and unexpected bugs.
  2. If coding in SystemVerilog, use enumerated types to define state encodings.
  3. Always define a parameter or enumerated type value for each state so you don’t leave it to the synthesis tool to choose a value for you. Otherwise it can make for a very difficult ECO when it comes time to reverse engineer the gate level netlist.
  4. Make curr_state and next_state declarations right after the parameter or enumerated type assignments. This is simply clean coding style.
  5. Code all sequential always block using nonblocking assignments (<=). This helps guard against simulation race conditions.
  6. Code all combinational always block using blocking assignments (=). This helps guard against simulation race conditions.

SystemVerilog enumerated types are especially useful for coding state machines. An example of using an enumerated type as the state variable is shown below.

Notice that enumerated types allow X assignments. Enumerated types can be displayed as names in simulator waveforms, which eliminates the need of a Verilog trick to display the state name in waveform as a variable in ASCII encoding.

One-hot refers to how each of the states is encoded in the state vector. In a one-hot state machine, the state vector has as many bits as number of states. Each bit represents a single state, and only one bit can be set at a time—one-hot. A one-hot state machine is generally faster than a state machine with encoded states because of the lack of state decoding logic.

SystemVerilog and Verilog has a unique (pun intended) and efficient coding style for coding one-hot state machines. This coding style uses what is called a reverse case statement to test if a case item is true by using a case header of the form case (1’b1). Example code is shown below:

In this one-hot state machine coding style, the state parameters or enumerated type values represent indices into the state and next vectors. Synthesis tools interpret this coding style efficiently and generates output assignment and next state logic that does only 1-bit comparison against the state vectors. Notice also the use of always_comb and always_ff SystemVerilog always statements, and unique case to add some run-time checking.

An alternate one-hot state machine coding style to the “index-parameter” style is to completely specify the one-hot encoding for the state vectors, as shown below:

According to Cliff Cummings’ 2003 paper, this coding style yields poor performance because the Design Compiler infers a full 4-bit comparison against the state vector, in effect defeating the speed advantage of a one-hot state machine. However, the experiments conducted in this paper were done in 2003, and I suspect synthesis tools have become smarter since then.

State machines may look easy on paper, but are often not so easy in practice. Given how frequently state machines appear in designs, it is important for every RTL designer to develop a consistent and efficient style for coding them. One-hot state machines are generally preferred in applications that can trade-off area for a speed advantage. This article demonstrated how they can be coded in Verilog and SystemVerilog using a unique and very efficient “reverse case statement” coding style. It is a technique that should be in every RTL designer’s arsenal.

What are your experiences with coding one-hot state machines? Do you have another coding style or synthesis results to share? Leave a comment below!

References

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

SystemVerilog always_comb, always_ff. New and Improved.

Verilog engineers will be familiar with using always to code recurring procedures like sequential logic, and most will have used always @(*) to code combinational logic. SystemVerilog defines four forms of always procedures: always, always_comb, always_ff, always_latch. What do the three new always procedures bring, and should you be using them? This article will attempt to answer these questions.

Verilog always @*

Verilog 2001 first introduced always @* as a shortcut to code combinational logic. The always @* is intended to infer a complete sensitivity list for both RTL simulation and synthesis from the contents of the block, saving a designer from having to manually specify a sensitivity list to code combinational logic. However, it does not infer a complete sensitivity list when the always @* block contains functions. Namely, it will not infer sensitivity to signals that are externally referenced in a function or a task that is called from the always block. It will only be sensitive to the signals passed into the function or task. Here is an example that illustrates the behaviour:

When the code is executed, it gives the following output. Notice that change on a, b does not trigger the always @* block to be reevaluated, but change on c does.

SystemVerilog always_comb

SystemVerilog always_comb solves this limitation. It descends into functions to infer a complete sensitivity list that is sensitive to changes within contents of a function, as illustrated in the example above. However, it doesn’t infer sensitivity from tasks (if a task takes zero time, you can use a void function instead to infer sensitivity). It improves upon always @* in some other ways as well:

  • Variables written on the left-hand side of assignments within always_comb cannot be written to by any other process
  • The procedure is automatically triggered once after all always and initial blocks have executed to ensure the outputs of the always_comb block match the input conditions before advancing simulation time; always @* waits until a change occurs on a signal in the inferred sensitivity list, which can sometimes cause unexpected behaviour due to race conditions at time zero
  • To ensure it models combinational logic, always_comb cannot include blocking timing or event controls, or fork-join
  • Extending usage to SystemVerilog assertions, always_comb is sensitive to expressions in immediate assertions within the procedure

In short, SystemVerilog always_comb is a better version of always @* and should always (pun intended) be used.

SystemVerilog always_ff

SystemVerilog always_ff procedure is used to model sequential flip-flop logic. It has similar improvements compared to using plain Verilog always. A always_ff procedure adds a restriction that it can contain one and only one event control and no blocking timing controls. Variables written on the left-hand side of assignments within always_ff, including variables from contents of a called function, cannot be written by other processes. Also, it is recommended that software tools perform additional checks to ensure code within the procedure models flip-flop behaviour. However, these checks are not defined in the SystemVerilog LRM.

SystemVerilog always_latch

Finally, SystemVerilog always_latch is used to model latch logic. It has identical rules to always_comb, and the SystemVerilog LRM recommends software tools perform additional checks to ensure code within the procedure models latch behaviour.

The three new SystemVerilog always procedures bring some enhanced capabilities. SystemVerilog always_comb, in particular, improves upon the Verilog always @* in several positive ways, and is undoubtedly the most useful of the three. I would recommend using all three in all newly written SystemVerilog code, if not for their new features, at least to convey design intent.

Do you have other experiences or gotchas with using (or not using) the new SystemVerilog always procedures? Leave a comment below!

References

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

SystemVerilog Unique And Priority – How Do I Use Them?

Improperly coded Verilog case statements can frequently cause unintended synthesis optimizations or unintended latches. These problems, if not caught in pre-silicon simulations or gate level simulations, can easily lead to a non-functional chip. The new SystemVerilog unique and priority keywords are designed to address these coding traps. In this article, we will take a closer look at how to use these new SystemVerilog keywords in RTL coding. The reader is assumed to have knowledge of how Verilog case statements work. Those who are not familiar can refer to my previous post “Verilog twins: case, casez, casex. Which Should I Use?

The SystemVerilog unique and priority modifiers are placed before an if, case, casez, casex statement, like this:

With the if…else statement, the SystemVerilog unique or priority keyword is placed only before the first if, but affects all subsequent else if and else statements.

SystemVerilog Unique Keyword

The unique keyword tells all software tools that support SystemVerilog, including those for simulation, synthesis, lint-checking, formal verification, that each selection item in a series of decisions is unique from any other selection item in that series, and that all legal cases have been listed. In other words, each item is mutually exclusive, and the if…else or case statement specifies all valid selection items.

It is easier to illustrate the effects of SystemVerilog unique using a case statement. unique case causes a simulator to add run-time checks that will report a warning if any of the following conditions are true:

  1. More than one case item matches the case expression
  2. No case item matches the case expression, and there is no default case

To illustrate how SystemVerilog unique affects simulation of case statements, let’s look at a wildcard casez statement:

You may recognize that this code resembles the priority decoder example from my previous post “Verilog twins: case, casez, casex. Which Should I Use?” However, by adding the SystemVerilog unique keyword, the behaviour is now completely different.

Firstly, by adding the SystemVerilog unique keyword, the designer asserts that only one case item can match at a time. If more than one bit of irq is set in simulation, the simulator will generate a warning, flagging that the assumption of irq being one-hot has been violated. Secondly, to synthesis tools, the unique keyword tells the tool that all valid case items have been specified, and can be evaluated in parallel. Synthesis is free to optimize the case items that are not listed.

Read the second point again, it is paramount! In a unique case statement (without a default case), outputs after synthesis from any unlisted case item is indeterminate. In simulation you may see a deterministic behaviour, maybe even an output that looks correct (along with an easy-to-miss warning), but that may not match what you see in silicon. I have personally seen a chip that did not work because of this coding error.

Back to the example, because of the unique keyword, synthesis will remove the priority logic. Thus, this code example is actually a decoder with no priority logic. Eliminating unnecessary priority logic typically results in smaller and faster logic, but only if it is indeed the designer’s intention.

The SystemVerilog unique keyword can be applied similarly to an if…else statement to convey the same uniqueness properties. For a unique if statement, a simulator will generate a run-time warning if any of the following is false:

  1. If two or more of the if conditions are true at the same time
  2. If all of the if conditions (including else if) are false, and there is no final else branch

SystemVerilog 2012 adds the keyword unique0 which, when used with a case or if statement, generates a warning only for the first condition above.

SystemVerilog Priority Keyword

The priority keyword instructs all tools that support SystemVerilog that each selection item in a series of decisions must be evaluated in the order in which they are listed, and all legal cases have been listed. A synthesis tool is free to optimize the logic assuming that all other unlisted conditions are don’t cares. If the priority case statement includes a case default statement, however, then the effect of the priority keyword is disabled because the case statement has then listed all possible conditions. In other words, the case statement is full.

Since the designer asserts that all conditions have been listed, a priority case will cause simulators to add run-time checks that will report a warning for the following condition:

  1. If the case expression does not match any of the case item expressions, and there is no default case

A priority if will cause simulators to report a warning if all of the if…if else conditions are false, and there is no final else branch. An else branch will disable the effect of the priority if.

When to Use Them

SystemVerilog unique and priority should be used especially in case statements that infer priority or non-priority logic. Using these keywords help convey design intent, guide synthesis tools to the correct result, and adds simulation and formal verification assertions that check for violation of design assumptions. One suggestion from “full_case parallel_case”, the Evil Twins of Verilog Synthesis is to code intentional priority encoders using if…else if statements rather than case statements, as it is easier for the typical engineer to recognize a priority encoder coded that way.

SystemVerilog unique and priority do not guarantee the removal of unwanted latches. Any case statement that makes assignments to more than one output in each case item statement can still generate latches if one or more output assignments are missing from other case item statements. One of the easiest ways to avoid these unwanted latches is by making a default assignment to the outputs before the case statement.

The unique and priority keywords should not be blindly added to any case and if statements either. Below is an example where the priority keyword will cause a design to break. The hardware that is intended is a decoder with enable en. When en=0, the decoder should output 4’b0000 on y.

The logic will synthesize to something like this:
SystemVerilog priority case incorrect usage

Here the priority keyword indicates that all unlisted case items are don’t cares, and can be optimized. As a result, the synthesis tool will simply optimize away en, which results in a different hardware than what was intended. A simulator will report a warning whenever en=0, which should raise an alarm warning that something is wrong. The unique keyword will have the same result here.

Conclusion

SystemVerilog unique and priority help avoid bugs from incorrectly coded case and if…else statements. They are part of the SystemVerilog language which means all tools that support SystemVerilog, including those for simulation, lint-checking, formal verification, synthesis, all have to implement the same specification of these keywords. Using these keywords help convey design intent, guide synthesis tools to the correct result, and adds simulation and formal verification checks for violation of design assumption.

In this post I have purposely tried to avoid discussing the Verilog pragmas full_case and parallel_case to write a more stand-alone discussion of the SystemVerilog unique and priority keywords. Those who are interested in the historical development of these keywords from Verilog full_case and parallel_case can refer to “full_case parallel_case”, the Evil Twins of Verilog Synthesis.

Do you have other experiences or examples of when to use or not use unique and priority? Leave a comment below!

References

Quiz and Sample Source Code

Now it’s time for a quiz! How will each of the following variations of case statement behave when the case expression a) matches one of the non-default case items, b) does not match any non-default case item, c) contains all X’s (e.g. if the signal comes from an uninitialized memory), d) contains all Z’s (e.g. if the signal is unconnected), e) contains a single bit X, f) contains a single bit Z?

  • Plain case statement
  • Plain case with default case
  • Casez
  • Casez with default case
  • Casex
  • Casex with default case
  • Unique case
  • Unique case with default case
  • Unique casez
  • Unique casex

Confused? Download the sample source code that you can run to give you the answer! Simply fill in the form below to get access. You will also get my articles as soon as they are published, delivered directly to your inbox. I may occasionally send out questionnaires to better understand your needs, or newsletters to bring you valuable information in addition to this blog. You will always have a chance to unsubscribe from any communication if you find it is not useful to you.

DOWNLOAD THE FREE SOURCE CODE

Simply fill in the form below, verify your email address
and you’ll be sent a link to download the example source code in this article.
Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

Verilog twins: case, casez, casex. Which Should I Use?

The Verilog case statement is a convenient structure to code various logic like decoders, encoders, onehot state machines. Verilog defines three versions of the case statement: case, casez, casex. Not only is it easy to confuse them, but there are subtleties between them that can trip up even experienced coders. In this article I will highlight the identifying features of each of the twins, and discuss when each should be used.

Basic Verilog Case Statement

Let’s start by reviewing the basic case statement:

A case statement has the following parts:

  • Case statement header—consists of the case, casez, or casex keyword followed by case expression
  • Case expression—the expression in parentheses immediately following the case keyword. Valid expressions include constants (e.g. 1’b1), an expression that evaluates to a constant, or a vector
  • Case item—the expression that is compared against the case expression. Note that C-style break is implied following each case item statement
  • Case item statement—one or more statements that is executed if the case item matches the current case expression. If more than one statement is required, they must be enclosed with begin…end
  • Case default—optional, but can include statements to be executed if none of the defined case items match the current case expression

Wildcard Case Statement: casez

The plain case statement is simple but rigid—everything must be explicitly coded. In some situations, you may want to specify a case item that can match multiple case expressions. This is where “wildcard” case expressions casez and casex come in. casez allows “Z” and “?” to be treated as don’t care values in either the case expression and/or the case item when doing case comparison. For example, a case item 2’b1? (or 2’b1Z) in a casez statement can match case expression of 2’b10, 2’b11, 2’b1X, 2’b1Z. It is generally recommended to use “?” characters instead of “Z” characters in the case item to indicate don’t care bits.

Verilog “wildcard” case statements can have overlapping case items. If more than one case item can match a case expression, the first matching case item has priority. Thus, priority logic can be inferred from a case statement. The following code snippet illustrates how casez can be used to code priority logic. It simulates and synthesizes correctly as a priority decoder.

The logic will look something like this:
Verilog casez priority decoder

Even though this may seem an elegant way to code a priority decoder, the priority intention may only be apparent to the most experienced coders. Therefore, it is generally recommended to code priority logic using the more explicit if…else statement to clearly convey the intention.

While wildcard case comparison can be useful, it also has its dangers. Imagine a potentially dangerous casez statement where the case expression is a vector and one bit resolves to a “Z”, perhaps due to a mistakenly unconnected input. That expression will match a case item with any value for the “Z” bit! To put in more concrete terms, if the LSB of irq in the above code snippet is unconnected such that the case expression evaluates to 3’b00Z, the third case item will still match and int0 will be set to 1, potentially masking a bug!

Even wilder: casex

Now that we understand the usage and dangers of casez, it is straight-forward to extend the discussion to casex. casex allows “Z”, “?”, and “X” to be treated as don’t care values in either the case expression and/or the case item when doing case comparison. That means, everything we discussed for casez also applies for casex, plus “X” is now also a wildcard. In my previous article on Verilog X Optimism I discussed how X’s can propagate around a design and mask design issues. These propagated X’s can easily cause problems when combined with casex statements. To avoid these problems, the recommendation from RTL Coding Styles That Yield Simulation and Synthesis Mismatches is not to use casex at all for synthesizable code.

Verilog case, casez, casex each has its place and use cases. Understanding the differences between them is key to using them correctly and avoiding bugs. It may also help you in your next job interview 🙂

Have you come across any improper usage of these constructs? What are your recommendations for using them, especially casex? Leave a comment below!

SystemVerilog Unique Case

SystemVerilog adds a possible unique modifier to case statements. How does that change the behaviour? Head over to my post SystemVerilog Unique And Priority – How Do I Use Them?

References

  1. RTL Coding Styles That Yield Simulation and Synthesis Mismatches
  2. “full_case parallel_case”, the Evil Twins of Verilog Synthesis

Quiz and Sample Source Code

Now it’s time for a quiz! How will each of the following variations of case statement behave when the case expression a) matches one of the non-default case items, b) does not match any non-default case item, c) contains all X’s (e.g. if the signal comes from an uninitialized memory), d) contains all Z’s (e.g. if the signal is unconnected), e) contains a single bit X, f) contains a single bit Z?

  • Plain case statement
  • Plain case with default case
  • Casez
  • Casez with default case
  • Casex
  • Casex with default case
  • Unique case
  • Unique case with default case
  • Unique casez
  • Unique casex

Confused? Download the sample source code that you can run to give you the answer! Simply fill in the form below to get access. You will also get my articles as soon as they are published, delivered directly to your inbox. I may occasionally send out questionnaires to better understand your needs, or newsletters to bring you valuable information in addition to this blog. You will always have a chance to unsubscribe from any communication if you find it is not useful to you.

DOWNLOAD THE FREE SOURCE CODE

Simply fill in the form below, verify your email address
and you’ll be sent a link to download the example source code in this article.
Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu

SystemVerilog and Verilog X Optimism – Hardware-like X Propagation with Xprop

In part 2 of this series, SystemVerilog and Verilog X Optimism – What About X Pessimism?, I discussed several coding styles that help to reduce the risk of missing design bugs due to Verilog X optimism. In part 3, we will take a look at how proprietary simulator features help avoid the problem by smartly doing X propagation. Specifically, we will look at Synopsys VCS Xprop.

Like the name suggests, X propagation means propagating an X at the input of some logic to its outputs. Synopsys VCS Xprop can do so smartly, in many cases avoiding X optimism, making simulation behave closer to real hardware.

Xprop has three modes: xmerge, tmerge, and vmerge. Xmerge simply assigns X to outputs whenever any of the inputs are X. This behaviour is similar to what would be observed in gate level simulations, but can sometimes be even more pessimistic. With tmerge, when an input is X, the simulator will traverse both code paths assuming the input is 0 and 1, and compare the results. If the results are the same, the determinate result is assigned to the output. If the results differ, X is assigned to the output to do X propagation. Vermge basically disables Xprop, allowing classic Verilog X optimism.

Consider the if…else statement again from part 1 of the series:

Let’s also add a gate level implementation of this multiplexer for comparison.
MUX21NAND to illustrate xprop

The truth table of Xprop Tmerge and Xmerge is as follows. You can see that Tmerge closely resembles actual hardware behaviour (though not always; depending on how the statement is implemented in gates, the last row may return X in gates).

abcondClassic Verilog SimulationGate Level SimulationActual HardwareXprop XmergeXprop Tmerge
00X000X0
01X1X0/1XX
10X0X0/1XX
11X1X1X1

Xprop does similar X propagation on sequential logic with ambiguous clock transitions. Recall from part 1 of this series that Verilog normally treats ambiguous clock transitions as valid clock transitions (for example, 0->1, 0->X, 0->Z, X->1, Z->1 are all considered posedge), which can trigger sequential logic when real hardware may not. With Xprop enabled, an indeterminate clock transition will corrupt the outputs of sequential logic triggered by that clock.

Xprop can be even more useful with low power simulations, where powered off logic will drive X’s on outputs to indicate it is off. One of the ways that unisolated paths can be detected is to ensure these X’s can propagate to somewhere visible by your test environment.

Enabling Xprop takes relatively little effort. You simply need to provide some additional options to VCS, and an Xprop configuration file. The configuration file lists the module name, module hierarchy, or instance name to turn Xprop on or off. From the VCS User Guide:

vcs -xprop[=tmerge|xmerge] [-xprop=xprop_config_file] [-xprop=unifiedInference] source_files

A sample Xprop configuration file is as follows:

In a recent project I worked on, after turning on Xprop and debugging the resulting X’s, two design bugs were immediately found. The first bug was a list of flip-flops without reset that fed into the control path, causing a block to hang and not respond after reset. The block simulated fine without Xprop due to X optimism (in fact, block level verification was already complete). But once Xprop was enabled, it did not respond after reset until all the control path flip-flops were fixed and properly reset. The second bug was an incorrectly coded arbiter that could select an out of range bit out of a bus. Without Xprop, the arbiter returned 0 when the index was out of range of the bus width. With Xprop enabled, the arbiter assigned X to the output when the select became out of range and the bug surfaced.

One caveat when using Xprop is it can falsely score code coverage, as the simulator does in fact execute multiple branches of code when an X input causes divergent code paths. To work around this issue, since Xprop generally catches reset and testbench issues that are pervasive throughout an environment, you may be able to run only a small subset of tests with Xprop enabled to catch X optimism bugs, and collect code coverage with the remaining tests with Xprop disabled. Xprop may also not solve all your X optimism woes. X-Propagation: An Alternative to Gate Level Simulation discusses some false negatives from actual designs that Xprop stumbled on. Lastly, Xprop is not guaranteed to propagate X’s in your entire design and you should take a look at the Xprop reports to see which parts of your design it was able to instrument.

Xprop is a relatively easy way to make Verilog simulations behave closer to actual hardware. Provided that all block verification environments are Xprop clean, you will likely see diminishing returns using it at a cluster or full-chip environment. However, given the relatively low effort to setup and debug Xprop, it is definitely a simulator option that is recommended for any block level verification environment.

What are your experiences with using xprop in your simulations, especially in low power simulations with UPF? Have you found bugs you otherwise wouldn’t have? Leave a comment below!

Jason Yu

Jason Yu

SoC Design Engineer at Intel Corporation
Jason has more than 8 years' experience in the semiconductor industry, designing and verifying Solid State Drive controller SoC. His areas of workinclude RTL design, verification with UVM, low power verification with UPF. Thoughts and opinions expressed in articles are personal and do not reflect that of Intel Corporation in any way.
Jason Yu