TL;DR: Vivado 2024.1 fails to infer FSMs from Chisel-generated SystemVerilog files due to Chisel’s optimizations; no solutions except manual patching are known by the author as of writing.


1. Background

Finite state machines (FSMs), or more specifically deterministic finite state machines, are crucial to digital circuits. In such context, it could be generalized to a sequential logic unit whose current state is determined by equation $Q^{n+1} = F_Q(Q^n, \boldsymbol{x}) \in S$, where $\boldsymbol{x}$ is a Boolean vector and $S$ is a finite set of all valid states. It is also worth mentioning that the output $\boldsymbol{y}$ of such unit could be either $\boldsymbol{y} = F_y(Q^n)$ (Moore machine) or $\boldsymbol{y} = F_y(Q^n, \boldsymbol{x})$ (Mealy machine).

When implementing an FSM in SystemVerilog, we would usually write code in the style as below:

localparam S0 = ...;
localparam S1 = ...;
...
localparam Sn = ...;

reg [W:0] y, Y;
always_ff @(posedge clock) begin
    y <= rst ? S0 : Y;
end
always_comb begin
    case (y)
        S0: begin Y = ...; end
        S1: begin Y = ...; end
        ...
        Sn: begin Y = ...; end
        default: begin Y = S0; end
    endcase
end

In the above code snippet, y is the current state, and is reset or stored the next state Y on each triggering edge.

Most synthesizers have heuristics to detect, and possibly optimize, FSMs generated by such a pattern. Generally, if some edge-triggered, synchronously-reset flip-flop y is assigned a new value Y which is chosen from constant values Sx, it is likely to be a FSM state variable. The synthesizer may then decide to rewrite Sx values to some other encodings (eg. one-hot) to optimize it further.

Things could quickly become complex if some more layers of abstraction are introduced, making the synthesizer fail to infer a suitable next-state Y. Chisel, for example, is a circuit generator framework that allows users to apply modern designing ideas, complex metaprogramming techniques, and pre-optimizations. It uses Scala 2 and emits various HDL artifacts, including SystemVerilog. We will then show that even if a FSM is written in an officially-recommended style, it still fails to be detected by heuristics in Vivado 2024.1.

2. Experiments and results

The example we chose is a 5-bit burst detector. When a synchronous, active-high input signal has been asserted for at least 5 cycles, the circuit should pull its output high until the input deasserts. We will write it in both vanilla SystemVerilog and Chisel 7.0.0-M2 (latest as of writing), and synthesizes them using Vivado 2024.1 targeting XC7A100TFGG484-2L. We use synthesis preset Flow_PerfOptimized_high which tries to rewrite FSMs into one-hot encoding. Then we capture the output of the synthesizer to determine whether heuristics have successfully found a FSM, as well as synthesized LUT and FF counts.

Boilerplate for Chisel programs could be found in Appendix A. Also be noted that all CIRCT-generated SystemVerilog files contain prologs that is used to randomly initialize module content before simulating. It has been omitted for brevity and could be found in Appendix B.

2.1. Vanilla SystemVerilog

The hand-written vanilla SystemVerilog code goes below:

module burst5detector(
    input  clock,
    input  reset,
    input  din,
    output dout
);
    localparam S_Idle  = 3'b000;
    localparam S_1High = 3'b001;
    localparam S_2High = 3'b010;
    localparam S_3High = 3'b011;
    localparam S_4High = 3'b100;
    localparam S_5High = 3'b101;

    reg [2:0] y, Y;
    always_ff @(posedge clock) begin
        y <= reset ? S_Idle : Y;
    end
    always_comb begin
        case (y)
            S_Idle:  begin Y = din ? S_1High : S_Idle; end
            S_1High: begin Y = din ? S_2High : S_Idle; end
            S_2High: begin Y = din ? S_3High : S_Idle; end
            S_3High: begin Y = din ? S_4High : S_Idle; end
            S_4High: begin Y = din ? S_5High : S_Idle; end
            S_5High: begin Y = din ? S_5High : S_Idle; end
            default: begin Y = S_Idle; end
        endcase
    end

    assign dout = (y == S_5High);
endmodule

module top(
    input  clock,
    input  reset,
    input  din,
    output dout
);
    burst5detector detector(
        .clock(clock),
        .reset(reset),
        .din(din),
        .dout(dout)
    );
endmodule

Running it through the synthesizer gives the following output, indicating that Vivado has successfully spotted a FSM pattern and rewritten its states:

---------------------------------------------------------------------------------
Finished applying 'set_property' XDC Constraints : Time (s): cpu = 00:00:06 ; elapsed = 00:00:10 . Memory (MB): peak = 1580.703 ; gain = 619.957
---------------------------------------------------------------------------------
INFO: [Synth 8-802] inferred FSM for state register 'y_reg' in module 'burst5detector'
---------------------------------------------------------------------------------------------------
                   State |                     New Encoding |                Previous Encoding 
---------------------------------------------------------------------------------------------------
                  S_Idle |                           000001 |                              000
                 S_1High |                           000010 |                              001
                 S_2High |                           000100 |                              010
                 S_3High |                           001000 |                              011
                 S_4High |                           010000 |                              100
                 S_5High |                           100000 |                              101
---------------------------------------------------------------------------------------------------
INFO: [Synth 8-3354] encoded FSM with state register 'y_reg' using encoding 'one-hot' in module 'burst5detector'
---------------------------------------------------------------------------------
Finished RTL Optimization Phase 2 : Time (s): cpu = 00:00:06 ; elapsed = 00:00:10 . Memory (MB): peak = 1580.703 ; gain = 619.957
---------------------------------------------------------------------------------
Resource Utilization
LUT 7
FF 6
IO 4

Then we write the same circuit in Chisel, using the recommended approach in their Cookbook:

import chisel3._
import chisel3.util._

class Burst5Detector extends Module {
  class Port extends Bundle {
    val din  = Input(Bool())
    val dout = Output(Bool())
  }
  val io = IO(new Port)

  private object State extends ChiselEnum {
    val S_Idle  = Value
    val S_1High = Value
    val S_2High = Value
    val S_3High = Value
    val S_4High = Value
    val S_5High = Value
  }
  import State._

  private val y = RegInit(S_Idle)
  switch(y) {
    is(S_Idle) {
      when(io.din) { y := S_1High }
    }
    is(S_1High) {
      when(io.din) { y := S_2High }.otherwise { y := S_Idle }
    }
    is(S_2High) {
      when(io.din) { y := S_3High }.otherwise { y := S_Idle }
    }
    is(S_3High) {
      when(io.din) { y := S_4High }.otherwise { y := S_Idle }
    }
    is(S_4High) {
      when(io.din) { y := S_5High }.otherwise { y := S_Idle }
    }
    is(S_5High) {
      when(~io.din) { y := S_Idle }
    }
  }

  io.dout := y === S_5High
}

class Top extends Module {
  val detector = Module(new Burst5Detector)
  val io       = IO(new detector.Port)

  io <> detector.io
}

The above code generates the following SystemVerilog:

// Generated by CIRCT firtool-1.77.0

// [Common prolog omitted]

module Burst5Detector(
  input  clock,
         reset,
         io_din,
  output io_dout
);

  reg  [2:0] y;
  reg  [2:0] casez_tmp;
  wire [2:0] _GEN = y == 3'h5 & ~io_din ? 3'h0 : y;
  always_comb begin
    casez (y)
      3'b000:
        casez_tmp = io_din ? 3'h1 : y;
      3'b001:
        casez_tmp = {1'h0, io_din, 1'h0};
      3'b010:
        casez_tmp = io_din ? 3'h3 : 3'h0;
      3'b011:
        casez_tmp = {io_din, 2'h0};
      3'b100:
        casez_tmp = io_din ? 3'h5 : 3'h0;
      3'b101:
        casez_tmp = _GEN;
      3'b110:
        casez_tmp = _GEN;
      default:
        casez_tmp = _GEN;
    endcase
  end // always_comb
  always @(posedge clock) begin
    if (reset)
      y <= 3'h0;
    else
      y <= casez_tmp;
  end // always @(posedge)
  `ifdef ENABLE_INITIAL_REG_
    `ifdef FIRRTL_BEFORE_INITIAL
      `FIRRTL_BEFORE_INITIAL
    `endif // FIRRTL_BEFORE_INITIAL
    logic [31:0] _RANDOM[0:0];
    initial begin
      `ifdef INIT_RANDOM_PROLOG_
        `INIT_RANDOM_PROLOG_
      `endif // INIT_RANDOM_PROLOG_
      `ifdef RANDOMIZE_REG_INIT
        _RANDOM[/*Zero width*/ 1'b0] = `RANDOM;
        y = _RANDOM[/*Zero width*/ 1'b0][2:0];
      `endif // RANDOMIZE_REG_INIT
    end // initial
    `ifdef FIRRTL_AFTER_INITIAL
      `FIRRTL_AFTER_INITIAL
    `endif // FIRRTL_AFTER_INITIAL
  `endif // ENABLE_INITIAL_REG_
  assign io_dout = y == 3'h5;
endmodule

module Top(
  input  clock,
         reset,
         io_din,
  output io_dout
);

  Burst5Detector detector (
    .clock   (clock),
    .reset   (reset),
    .io_din  (io_din),
    .io_dout (io_dout)
  );
endmodule

Synthesis produced the following log, indicating no FSMs have been discovered by heuristics:

---------------------------------------------------------------------------------
Finished applying 'set_property' XDC Constraints : Time (s): cpu = 00:00:07 ; elapsed = 00:00:09 . Memory (MB): peak = 1580.090 ; gain = 620.133
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
Finished RTL Optimization Phase 2 : Time (s): cpu = 00:00:08 ; elapsed = 00:00:09 . Memory (MB): peak = 1580.090 ; gain = 620.133
---------------------------------------------------------------------------------
Resource Utilization
LUT 4
FF 3
IO 4

2.3. Chisel with MuxLookup

Finally, we write the detector in Chisel with MuxLookup which is the style preferred by us.

import chisel3._
import chisel3.util._

class Burst5Detector extends Module {
  class Port extends Bundle {
    val din  = Input(Bool())
    val dout = Output(Bool())
  }
  val io = IO(new Port)

  private object State extends ChiselEnum {
    val S_Idle  = Value
    val S_1High = Value
    val S_2High = Value
    val S_3High = Value
    val S_4High = Value
    val S_5High = Value
  }
  import State._

  private val y = RegInit(S_Idle)
  y := MuxLookup(y, S_Idle)(
    Seq(
      S_Idle  -> Mux(io.din, S_1High, S_Idle),
      S_1High -> Mux(io.din, S_2High, S_Idle),
      S_2High -> Mux(io.din, S_3High, S_Idle),
      S_3High -> Mux(io.din, S_4High, S_Idle),
      S_4High -> Mux(io.din, S_5High, S_Idle),
      S_5High -> Mux(io.din, S_5High, S_Idle)
    )
  )

  io.dout := y === S_5High
}

class Top extends Module {
  val detector = Module(new Burst5Detector)
  val io       = IO(new detector.Port)

  io <> detector.io
}

The generated SystemVerilog code is listed below:

// Generated by CIRCT firtool-1.77.0

// [Common prolog omitted]

module Burst5Detector(
  input  clock,
         reset,
         io_din,
  output io_dout
);

  reg [2:0] y;
  always @(posedge clock) begin
    if (reset)
      y <= 3'h0;
    else
      y <=
        y == 3'h5 | y == 3'h4
          ? (io_din ? 3'h5 : 3'h0)
          : y == 3'h3
              ? {io_din, 2'h0}
              : {1'h0,
                 y == 3'h2
                   ? {2{io_din}}
                   : y == 3'h1 ? {io_din, 1'h0} : {1'h0, y == 3'h0 & io_din}};
  end // always @(posedge)
  `ifdef ENABLE_INITIAL_REG_
    `ifdef FIRRTL_BEFORE_INITIAL
      `FIRRTL_BEFORE_INITIAL
    `endif // FIRRTL_BEFORE_INITIAL
    logic [31:0] _RANDOM[0:0];
    initial begin
      `ifdef INIT_RANDOM_PROLOG_
        `INIT_RANDOM_PROLOG_
      `endif // INIT_RANDOM_PROLOG_
      `ifdef RANDOMIZE_REG_INIT
        _RANDOM[/*Zero width*/ 1'b0] = `RANDOM;
        y = _RANDOM[/*Zero width*/ 1'b0][2:0];
      `endif // RANDOMIZE_REG_INIT
    end // initial
    `ifdef FIRRTL_AFTER_INITIAL
      `FIRRTL_AFTER_INITIAL
    `endif // FIRRTL_AFTER_INITIAL
  `endif // ENABLE_INITIAL_REG_
  assign io_dout = y == 3'h5;
endmodule

module Top(
  input  clock,
         reset,
         io_din,
  output io_dout
);

  Burst5Detector detector (
    .clock   (clock),
    .reset   (reset),
    .io_din  (io_din),
    .io_dout (io_dout)
  );
endmodule

Still, Vivado failed to pick up the FSM.

---------------------------------------------------------------------------------
Finished applying 'set_property' XDC Constraints : Time (s): cpu = 00:00:07 ; elapsed = 00:00:09 . Memory (MB): peak = 1518.992 ; gain = 558.602
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
Finished RTL Optimization Phase 2 : Time (s): cpu = 00:00:07 ; elapsed = 00:00:09 . Memory (MB): peak = 1518.992 ; gain = 558.602
---------------------------------------------------------------------------------
Resource Utilization
LUT 4
FF 3
IO 4

3. Speculation of cause

Vivado’s inference of FSMs, as per UG901-2024.1, rely on the detection of three clear stages of FSM’s state management. In Chisel generated code, some assignments of next-state have been optimized into concatenations: casez_tmp = {1'h0, io_din, 1'h0}; and casez_tmp = {io_din, 2'h0};. These RHS’s likely prevent Vivado from deducing a constant next-state for the two branches. In fact, once we rewrite the generated code in Section 2.2 to make use of conditional expressions, Vivado could successfully detect the FSMs:

diff --git a/Top.sv b/Top.patched.sv
index 414fda3..fd4cc5b 100644
--- a/Top.sv
+++ b/Top.patched.sv
@@ -1,4 +1,4 @@
-// Generated by CIRCT firtool-1.77.0
+// Generated by CIRCT firtool-1.77.0; manually patched by Mantle
 
 // Include register initializers in init blocks unless synthesis is set
 `ifndef RANDOMIZE
@@ -58,11 +58,11 @@ module Burst5Detector(
       3'b000:
         casez_tmp = io_din ? 3'h1 : y;
       3'b001:
-        casez_tmp = {1'h0, io_din, 1'h0};
+        casez_tmp = io_din ? 3'b010 : 3'b000;
       3'b010:
         casez_tmp = io_din ? 3'h3 : 3'h0;
       3'b011:
-        casez_tmp = {io_din, 2'h0};
+        casez_tmp = io_din ? 3'b100 : 3'b000;
       3'b100:
         casez_tmp = io_din ? 3'h5 : 3'h0;
       3'b101:

The output confirmed our speculation:

---------------------------------------------------------------------------------
Finished applying 'set_property' XDC Constraints : Time (s): cpu = 00:00:07 ; elapsed = 00:00:09 . Memory (MB): peak = 1579.922 ; gain = 619.777
---------------------------------------------------------------------------------
INFO: [Synth 8-802] inferred FSM for state register 'y_reg' in module 'Burst5Detector'
---------------------------------------------------------------------------------------------------
                   State |                     New Encoding |                Previous Encoding 
---------------------------------------------------------------------------------------------------
                 iSTATE4 |                           000001 |                              000
                 iSTATE0 |                           000010 |                              001
                 iSTATE1 |                           000100 |                              010
                 iSTATE2 |                           001000 |                              011
                 iSTATE3 |                           010000 |                              100
                 iSTATE5 |                           100000 |                              101
---------------------------------------------------------------------------------------------------
INFO: [Synth 8-3354] encoded FSM with state register 'y_reg' using encoding 'one-hot' in module 'Burst5Detector'
---------------------------------------------------------------------------------
Finished RTL Optimization Phase 2 : Time (s): cpu = 00:00:08 ; elapsed = 00:00:09 . Memory (MB): peak = 1579.922 ; gain = 619.777
---------------------------------------------------------------------------------
Resource Utilization
LUT 8
FF 6
IO 4

4. Conclusion

If you use Vivado (at least equal or before 2024.1) to synthesize your Chisel-generated SystemVerilog file, and need to make use of its FSM re-encoding optimizations, double check that the heuristics aren’t hindered by the upper-layer optimization.

Appendix A. Chisel project boilerplate

build.mill:

package build

import mill._
import mill.define.Sources
import mill.modules.Util
import mill.scalalib.scalafmt.ScalafmtModule
import mill.scalalib._
import mill.bsp._

object chisel_fsm_test extends ScalaModule with ScalafmtModule { m =>
  override def scalaVersion   = "2.13.14"
  override def scalacOptions = Seq(
    "-language:reflectiveCalls",
    "-deprecation",
    "-feature",
    "-Xcheckinit"
  )
  override def ivyDeps = Agg(
    ivy"org.chipsalliance::chisel:7.0.0-M2"
  )
  override def scalacPluginIvyDeps = Agg(
    ivy"org.chipsalliance:::chisel-plugin:7.0.0-M2"
  )

  def repositoriesTask = T.task {
    Seq(
      coursier.MavenRepository("https://repo.scala-sbt.org/scalasbt/maven-releases"),
      coursier.MavenRepository("https://oss.sonatype.org/content/repositories/releases"),
      coursier.MavenRepository("https://oss.sonatype.org/content/repositories/snapshots")
    ) ++ super.repositoriesTask()
  }
}

Elaborate.scala:

object Elaborate extends App {
  val firtoolOptions = Array(
    "--lowering-options=" + Seq(
      "disallowLocalVariables",
      "disallowPackedArrays",
      "locationInfoStyle=none"
    ).reduce(_ + "," + _)
  )
  circt.stage.ChiselStage.emitSystemVerilogFile(new Top(), args, firtoolOptions)
}

Appendix B. CIRCT-generated prolog

// Generated by CIRCT firtool-1.77.0

// Include register initializers in init blocks unless synthesis is set
`ifndef RANDOMIZE
  `ifdef RANDOMIZE_REG_INIT
    `define RANDOMIZE
  `endif // RANDOMIZE_REG_INIT
`endif // not def RANDOMIZE
`ifndef SYNTHESIS
  `ifndef ENABLE_INITIAL_REG_
    `define ENABLE_INITIAL_REG_
  `endif // not def ENABLE_INITIAL_REG_
`endif // not def SYNTHESIS

// Standard header to adapt well known macros for register randomization.

// RANDOM may be set to an expression that produces a 32-bit random unsigned value.
`ifndef RANDOM
  `define RANDOM $random
`endif // not def RANDOM

// Users can define INIT_RANDOM as general code that gets injected into the
// initializer block for modules with registers.
`ifndef INIT_RANDOM
  `define INIT_RANDOM
`endif // not def INIT_RANDOM

// If using random initialization, you can also define RANDOMIZE_DELAY to
// customize the delay used, otherwise 0.002 is used.
`ifndef RANDOMIZE_DELAY
  `define RANDOMIZE_DELAY 0.002
`endif // not def RANDOMIZE_DELAY

// Define INIT_RANDOM_PROLOG_ for use in our modules below.
`ifndef INIT_RANDOM_PROLOG_
  `ifdef RANDOMIZE
    `ifdef VERILATOR
      `define INIT_RANDOM_PROLOG_ `INIT_RANDOM
    `else  // VERILATOR
      `define INIT_RANDOM_PROLOG_ `INIT_RANDOM #`RANDOMIZE_DELAY begin end
    `endif // VERILATOR
  `else  // RANDOMIZE
    `define INIT_RANDOM_PROLOG_
  `endif // RANDOMIZE
`endif // not def INIT_RANDOM_PROLOG_

// [Actual module content starts here]