Clock Domain Crossing Design – 3 Part Series

Thank you for all your interest in my last post on Dual-Clock Asynchronous FIFO in SystemVerilog! I decided to continue the theme of clock domain crossing (CDC) design techniques, and look at several other methods for passing control signals and data between asynchronous clock domains. This is perfect timing because I’m just about to create a new revision of one of my design blocks at work, which incorporates many of these concepts. I, too, can use a refresher 🙂

The concepts in this article are mostly taken from Cliff Cumming's very comprehensive paper Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog. I’ve broken the topics into 3 parts:

  • Part 1 – metastability and challenges with passing single bit signals across a clock domain crossing (CDC), and single-bit synchronizer
  • Part 2 – challenges with passing multi-bit signals across a CDC, and multi-bit synchronizer
  • Part 3 – design of a complete multi-bit synchronizer with feedback acknowledge

Let’s get right to it!

What is Metastability?

Any discussion of clock domain crossing (CDC) should start with a basic understanding of metastability and synchronization. In layman’s terms, metastability refers to an unstable intermediate state, where the slightest disturbance will cause a resolution to a stable state. When applied to flip-flops in digital circuits, it means a state where the flip-flop’s output may not have settled to the final expected value.

One of the ways a flip-flop can enter a metastable state is if its setup or hold time is violated. In an asynchronous clock domain crossing (CDC), where the source and destination clocks have no frequency relationship, a signal from the source domain has a non-zero probability of changing within the setup or hold time of a destination flip-flop it drives. Synchronization failure occurs when the output of the destination flip-flop goes metastable and does not converge to a legal state by the time its output must be sampled again (by the next flip-flop in the destination domain). Worse yet, that next flip-flop may also go metastable, causing metastability to propagate through the design!

Synchronizers for Clock Domain Crossing (CDC)

A synchronizer is a circuit whose purpose is to minimize the probability of a synchronization failure. We want the metastability to resolve within a synchronization period (a period of the destination clock) so that we can safely sample the output of the flip-flop in the destination clock domain. It is possible to calculate the failure rate of a synchronizer, and this is called the mean time between failure (MTBF).

Without going into the math, the takeaway is that the probability of hitting a metastable state in a clock domain crossing (CDC) is proportional to:

  1. Frequency of the destination domain
  2. Rate of data crossing the clock boundary

This result gives us some ideas on how to design a good synchronizer. Interested readers can refer to Metastability and Synchronizers: A Tutorial for a tutorial on the topic of metastability, and some interesting graphs of how flip-flops can become metastable.

Two flip-flop synchronizer

Two flip-flop synchronizer for clock domain crossing (CDC)

The most basic synchronizer is two flip-flop in series, both clocked by the destination clock. This simple and unassuming circuit is called a two flip-flop synchronizer. If the input data changes very close to the receiving clock edge (within setup/hold time), the first flip-flop in the synchronizer may go metastable, but there is still a full clock for the signal to become stable before being sampled by the second flip-flop. The destination domain logic then uses the output from the second flip-flop. Theoretically it is possible for the signal to still be metastable by the time it is clocked into the second flip-flop (every MTBF years). In that case a synchronization failure occurs and the design would likely malfunction.

The two flip-flop synchronizer is sufficient for many applications. Very high speed designs may require a three flip-flop synchronizer to give sufficient MTBF. To further increase MTBF, two flip-flop synchronizers are sometimes built from fast library cells (low threshold voltage) that have better setup/hold time characteristics.

Registering source signals into the synchronizer

It is a generally good practice to register signals in the source clock domain before sending them across the clock domain crossing (CDC) into synchronizers. This eliminates combinational glitches, which can effectively increase the rate of data crossing the clock boundary, reducing MTBF of the synchronizer.

Synchronizing Slow Signals Into Fast Clock Domain

The easy case is passing signals from a slow clock domain to a fast clock domain. This is generally not a problem as long as the faster clock is > 1.5x frequency of the slow clock. The fast destination clock will simply sample the slow signal more than once. In these cases, a simple two-flip-flop synchronizer may suffice.

If the fast clock is < 1.5x frequency of the slow clock, then there can be a potential problem, and you should use one of the solutions in the next section.

Synchronizing Fast Signals Into Slow Clock Domain

The more difficult case is, of course, passing a fast signal into a slow clock domain. The obvious problem is if a pulse on the fast signal is shorter than the period of the slow clock, then the pulse can disappear before being sampled by the slow clock. This scenario is shown in the waveform below.

Fast source pulse missed by slow clock in clock domain crossing (CDC)

A less obvious problem is even if the pulse was just slightly wider than the period of the slow clock, the signal can change within the setup/hold time of the destination flip-flop (on the slow clock), violating timing and causing metastability.

Before deciding on how to handle this clock domain crossing (CDC), you should first ask yourself whether every value of the source signal is needed in the destination domain. If you don’t (meaning it is okay to drop some values) then it may suffice to use an “open-loop” synchronizer without acknowledgement. An example is the gray code pointer from my Dual-Clock Asynchronous FIFO in SystemVerilog; it needs to be accurate when read, but the FIFO pointer may advance several times before a read occurs and the value is used. If, on the other hand, you need every value in the destination domain, then a “closed-loop” synchronizer with acknowledgement may be needed.

Single bit — two flip-flop synchronizer

A simple two flip-flop synchronizer is the fastest way to pass signals across a clock domain crossing. It can be sufficient in many applications, as long as the signal generated in the fast clock domain is wider than the cycle time of the slow clock. For example, if you just need to synchronize a slow changing status signal, this may work. A safe rule of thumb is the signal must be wider than 1.5x the cycle width of the destination clock (“three receiving clock edge” requirement coined in Mark Litterick’s paper on clock domain crossing verification). This guarantees the signal will be sampled by at least one (but maybe more) clock edge of the destination clock. The requirement can be easily checked using SystemVerilog Assertions (SVA).

The 1.5x cycle width is easy to enforce when the relative clock frequencies of the source and destination are fixed. But there are real world scenarios where they won’t be. In a memory controller design I worked on, the destination clock can take on three different frequencies, which can be either faster/slower/same as the source clock. In that situation, it was not easy to design the clock domain crossing signals to meet the 1.5x cycle width of the slowest destination clock.

Single bit — synchronizer with feedback acknowledge

A synchronizer with feedback acknowledge is slightly more involved, but not much. The figure below illustrates how it works.

Single bit feedback synchronizer for clock domain crossing (CDC) diagram

The source domain sends the signal to the destination clock domain through a two flip-flop synchronizer, then passes the synchronized signal back to the source clock domain through another two flip-flop synchronizer as a feedback acknowledgement. The figure below shows a waveform of the synchronizer.

Single bit feedback synchronizer for clock domain crossing (CDC) waveform

This solution is very safe, but it does come at a cost of increased delay due to synchronizing in both directions before allowing the signal to change again. This solution would work in my memory controller design to handle the varying clock frequency relationship.

Conclusion

Even though we would all like to live in a purely synchronous world, in real world applications you will undoubtedly run into designs that require multiple asynchronous clocks. This article described two basic techniques to pass a single control signal across a clock domain crossing (CDC). Clock domain crossing (CDC) logic bugs are elusive and extremely difficult to debug, so it is imperative to design synchronization logic correctly from the start!

Passing a single control signal across a clock domain crossing (CDC) isn’t very exciting. In Clock Domain Crossing Techniques – Part 2, I will discuss the difficulties with passing multiple control signals, and some possible solutions.

References

Sample Source Code

The accompany source code for this article is the single-bit feedback synchronizer and testbench, which generates the following waveform when run. Download and run it to see how it works!

Single bit feedback synchronizer simulation

    Answer

    15 thoughts on “Clock Domain Crossing Design – 3 Part Series”

    1. Hi Jason,

      I have some questions concerning this single bit CDC.

      1) In the feedback design, the purpose is to successfully sample the src_data even if it is sent from fast to slow domain, but how do you make sure that the src_data is wide enough to be sampled by the slow dest_clk?

      2) After the feedback acknowledge is sent to the src_domain, how is it used ?

      I can see from the TB code, you manually de-assert the src_data after receiving the acknowledge signal.
      @(posedge srcclk iff src_qdata == 1’b1);
      src_data <= 1'b0;

      Can you help clarify my confusion?

      Reply
      • Hi Pan. Yes that’s correct. To ensure src_data is wide enough to be sampled by a slow dest_clk, your source sending logic will need to handle this and hold src_data (and I presume hold off any further upstream logic as well) until the feedback acknowledgement has been received. In my testbench code I mimic this by waiting until I see the feedback acknowledgement (src_qdata) on the source side before changing src_data.

        Reply
    2. Thanks, Jason

      I have one more question on the metastability which confuses me a lot.

      What is the analog output voltage of a FF in meta state, and what is the logic value?

      In your post, you says it hover somewhere between ground (logic 0) and drain voltage (logic 1).

      But in your ref [Metastability and Synchronizers: A Tutorial] by Ran Ginosar, he says “What happens at the flip-flop during metastability, and what can we see at its output? It’s been said that we can see a wobbling signal that hovers around half VDD, or that it can oscillate. Well, this is not exactly the case. If node A in Figure 2 is around VDD/2, the chance that we can still see the same value at Q, three inverters later (or even one inverter, if the slave latch is metastable) is practically zero. Instead, the output will most likely be either 0 or 1, and as VA resolves, the output may (or may not) toggle at some later time. If indeed that toggle happens later than the nominal tpCQ, then we know that the flip- flop was metastable. And this is exactly what we want to mask with the synchronizer.”

      Reply
    3. what happens if I have two consecutive zeroes or ones to transmit, how can synchronizer with feedback mechanism help in this case ?

      the feedback/acknowledgement mechanism seems to based on edge transitioning

      Reply
      • Hi Promach. Let’s take the Multi-cycle path (MCP) formulation with feedback design. It requires the source side logic to assert and hold src_send to indicate sending a data bit (on src_data_in) until src_rdy is asserted. After that, src_send must de-assert, then re-assert a later cycle to send the next data bit. Therefore, to send two consecutive bits of zeros or ones would require two src_send/src_rdy handshakes, with at least one idle cycle (src_send=0) in between.

        Reply
    4. May I know if you have any blogs (did I miss anything) on formal verification instead of simulation of asynchronous FIFO ?

      Reply
      • That’s a really good question, but unfortunately I don’t have any information on that. Formal property verification of asynchronous designs I think is kind of a hack, as there is no equivalent way to express asynchronous designs in formal property verification tools (you can kind of fake it by allowing one clock to toggle on any edge of the other clock). To properly verify CDC I think linting or CDC verification tool (which can be formal based as well) will yield better results.

        Reply
    5. Hi Jason,

      I am considering clock gating for power saving.

      May I know how would your synchronizer pair circuit works in clock gating situation ?

      Reply
      • Hi. I assume you’re talking about coarse grained clock gating, where you are writing new custom logic to gate the clock when the logic is idle (and not fine grained clock gating, which is “automatically” inserted by synthesis tool). For coarse grained clock gating, you can imagine there is a tradeoff between how much logic you gate at once versus how complex the clock gating logic is. The more you gate at once, the simpler the logic, and vice versa. For designs I work on, our policy/guideline is to gate at a “design block” level, which consist of several to tens of modules. What that means is the clock gating will generally encompass alot more logic than just the synchronizer pair. You would determine an idle condition for all that logic (including the synchronizer, and across the two clock domains), then instantiate a clock gate surrounding all that logic. That’s just the guideline that my team uses. Clock gating design is very dependent on your overall low power methodology, which may be different from company to company, from team to team.

        Reply
    6. Found these lines in the post.

      “In a memory controller design I worked on, the destination clock can take on three different frequencies, which can be either faster/slower/same as the source clock. In that situation, it was not easy to design the clock domain crossing signals to meet the 1.5x cycle width of the slowest destination clock.”

      So what was your approach? I hope you were looking for a generic solution for slow to fast and fast to slow CDC. A toggle synchronizer (which has a pulse to level converter at the source side and edge detector at the receiver side) will work here I think.

      Reply
      • A toggle synchronizer will work, but you have to ensure the signal from the source will not change too quickly to be captured in the destination domain, in the fast to slow synchronization case. In that design it was a datapath that I needed to synchronize across the clock domain, so I used this asynchronous FIFO. I think for any CDC, there will always need to be assumptions about the range of ratios between the two clocks that has to be taken into account in the design.

        Reply

    Leave a Comment

    This site uses Akismet to reduce spam. Learn how your comment data is processed.