Finite state machine pitfalls with Chisel, SystemVerilog and Vivado
TL;DR: Vivado 2024.1 fails to infer FSMs from Chisel-generated SystemVerilog files due to Chisel’s optimizations; no solutions except manual patching are known by the author as of writing.
- 1. Background
- 2. Experiments and results
- 3. Speculation of cause
- 4. Conclusion
- Appendix A. Chisel project boilerplate
- Appendix B. CIRCT-generated prolog
1. Background
Finite state machines (FSMs), or more specifically deterministic finite state machines, are crucial to digital circuits. In such context, it could be generalized to a sequential logic unit whose current state is determined by equation $Q^{n+1} = F_Q(Q^n, \boldsymbol{x}) \in S$, where $\boldsymbol{x}$ is a Boolean vector and $S$ is a finite set of all valid states. It is also worth mentioning that the output $\boldsymbol{y}$ of such unit could be either $\boldsymbol{y} = F_y(Q^n)$ (Moore machine) or $\boldsymbol{y} = F_y(Q^n, \boldsymbol{x})$ (Mealy machine).
When implementing an FSM in SystemVerilog, we would usually write code in the style as below:
localparam S0 = ...;
localparam S1 = ...;
...
localparam Sn = ...;
reg [W:0] y, Y;
always_ff @(posedge clock) begin
y <= rst ? S0 : Y;
end
always_comb begin
case (y)
S0: begin Y = ...; end
S1: begin Y = ...; end
...
Sn: begin Y = ...; end
default: begin Y = S0; end
endcase
end
In the above code snippet, y
is the current state, and is reset or stored the next state Y
on each triggering edge.
Most synthesizers have heuristics to detect, and possibly optimize, FSMs generated by such a pattern. Generally, if some edge-triggered, synchronously-reset flip-flop y
is assigned a new value Y
which is chosen from constant values Sx
, it is likely to be a FSM state variable. The synthesizer may then decide to rewrite Sx
values to some other encodings (eg. one-hot) to optimize it further.
Things could quickly become complex if some more layers of abstraction are introduced, making the synthesizer fail to infer a suitable next-state Y
. Chisel, for example, is a circuit generator framework that allows users to apply modern designing ideas, complex metaprogramming techniques, and pre-optimizations. It uses Scala 2 and emits various HDL artifacts, including SystemVerilog. We will then show that even if a FSM is written in an officially-recommended style, it still fails to be detected by heuristics in Vivado 2024.1.
2. Experiments and results
The example we chose is a 5-bit burst detector. When a synchronous, active-high input signal has been asserted for at least 5 cycles, the circuit should pull its output high until the input deasserts. We will write it in both vanilla SystemVerilog and Chisel 7.0.0-M2 (latest as of writing), and synthesizes them using Vivado 2024.1 targeting XC7A100TFGG484-2L. We use synthesis preset Flow_PerfOptimized_high
which tries to rewrite FSMs into one-hot encoding. Then we capture the output of the synthesizer to determine whether heuristics have successfully found a FSM, as well as synthesized LUT and FF counts.
Boilerplate for Chisel programs could be found in Appendix A. Also be noted that all CIRCT-generated SystemVerilog files contain prologs that is used to randomly initialize module content before simulating. It has been omitted for brevity and could be found in Appendix B.
2.1. Vanilla SystemVerilog
The hand-written vanilla SystemVerilog code goes below:
module burst5detector(
input clock,
input reset,
input din,
output dout
);
localparam S_Idle = 3'b000;
localparam S_1High = 3'b001;
localparam S_2High = 3'b010;
localparam S_3High = 3'b011;
localparam S_4High = 3'b100;
localparam S_5High = 3'b101;
reg [2:0] y, Y;
always_ff @(posedge clock) begin
y <= reset ? S_Idle : Y;
end
always_comb begin
case (y)
S_Idle: begin Y = din ? S_1High : S_Idle; end
S_1High: begin Y = din ? S_2High : S_Idle; end
S_2High: begin Y = din ? S_3High : S_Idle; end
S_3High: begin Y = din ? S_4High : S_Idle; end
S_4High: begin Y = din ? S_5High : S_Idle; end
S_5High: begin Y = din ? S_5High : S_Idle; end
default: begin Y = S_Idle; end
endcase
end
assign dout = (y == S_5High);
endmodule
module top(
input clock,
input reset,
input din,
output dout
);
burst5detector detector(
.clock(clock),
.reset(reset),
.din(din),
.dout(dout)
);
endmodule
Running it through the synthesizer gives the following output, indicating that Vivado has successfully spotted a FSM pattern and rewritten its states:
---------------------------------------------------------------------------------
Finished applying 'set_property' XDC Constraints : Time (s): cpu = 00:00:06 ; elapsed = 00:00:10 . Memory (MB): peak = 1580.703 ; gain = 619.957
---------------------------------------------------------------------------------
INFO: [Synth 8-802] inferred FSM for state register 'y_reg' in module 'burst5detector'
---------------------------------------------------------------------------------------------------
State | New Encoding | Previous Encoding
---------------------------------------------------------------------------------------------------
S_Idle | 000001 | 000
S_1High | 000010 | 001
S_2High | 000100 | 010
S_3High | 001000 | 011
S_4High | 010000 | 100
S_5High | 100000 | 101
---------------------------------------------------------------------------------------------------
INFO: [Synth 8-3354] encoded FSM with state register 'y_reg' using encoding 'one-hot' in module 'burst5detector'
---------------------------------------------------------------------------------
Finished RTL Optimization Phase 2 : Time (s): cpu = 00:00:06 ; elapsed = 00:00:10 . Memory (MB): peak = 1580.703 ; gain = 619.957
---------------------------------------------------------------------------------
Resource | Utilization |
---|---|
LUT | 7 |
FF | 6 |
IO | 4 |
2.2. Chisel with recommended switch
statements
Then we write the same circuit in Chisel, using the recommended approach in their Cookbook:
import chisel3._
import chisel3.util._
class Burst5Detector extends Module {
class Port extends Bundle {
val din = Input(Bool())
val dout = Output(Bool())
}
val io = IO(new Port)
private object State extends ChiselEnum {
val S_Idle = Value
val S_1High = Value
val S_2High = Value
val S_3High = Value
val S_4High = Value
val S_5High = Value
}
import State._
private val y = RegInit(S_Idle)
switch(y) {
is(S_Idle) {
when(io.din) { y := S_1High }
}
is(S_1High) {
when(io.din) { y := S_2High }.otherwise { y := S_Idle }
}
is(S_2High) {
when(io.din) { y := S_3High }.otherwise { y := S_Idle }
}
is(S_3High) {
when(io.din) { y := S_4High }.otherwise { y := S_Idle }
}
is(S_4High) {
when(io.din) { y := S_5High }.otherwise { y := S_Idle }
}
is(S_5High) {
when(~io.din) { y := S_Idle }
}
}
io.dout := y === S_5High
}
class Top extends Module {
val detector = Module(new Burst5Detector)
val io = IO(new detector.Port)
io <> detector.io
}
The above code generates the following SystemVerilog:
// Generated by CIRCT firtool-1.77.0
// [Common prolog omitted]
module Burst5Detector(
input clock,
reset,
io_din,
output io_dout
);
reg [2:0] y;
reg [2:0] casez_tmp;
wire [2:0] _GEN = y == 3'h5 & ~io_din ? 3'h0 : y;
always_comb begin
casez (y)
3'b000:
casez_tmp = io_din ? 3'h1 : y;
3'b001:
casez_tmp = {1'h0, io_din, 1'h0};
3'b010:
casez_tmp = io_din ? 3'h3 : 3'h0;
3'b011:
casez_tmp = {io_din, 2'h0};
3'b100:
casez_tmp = io_din ? 3'h5 : 3'h0;
3'b101:
casez_tmp = _GEN;
3'b110:
casez_tmp = _GEN;
default:
casez_tmp = _GEN;
endcase
end // always_comb
always @(posedge clock) begin
if (reset)
y <= 3'h0;
else
y <= casez_tmp;
end // always @(posedge)
`ifdef ENABLE_INITIAL_REG_
`ifdef FIRRTL_BEFORE_INITIAL
`FIRRTL_BEFORE_INITIAL
`endif // FIRRTL_BEFORE_INITIAL
logic [31:0] _RANDOM[0:0];
initial begin
`ifdef INIT_RANDOM_PROLOG_
`INIT_RANDOM_PROLOG_
`endif // INIT_RANDOM_PROLOG_
`ifdef RANDOMIZE_REG_INIT
_RANDOM[/*Zero width*/ 1'b0] = `RANDOM;
y = _RANDOM[/*Zero width*/ 1'b0][2:0];
`endif // RANDOMIZE_REG_INIT
end // initial
`ifdef FIRRTL_AFTER_INITIAL
`FIRRTL_AFTER_INITIAL
`endif // FIRRTL_AFTER_INITIAL
`endif // ENABLE_INITIAL_REG_
assign io_dout = y == 3'h5;
endmodule
module Top(
input clock,
reset,
io_din,
output io_dout
);
Burst5Detector detector (
.clock (clock),
.reset (reset),
.io_din (io_din),
.io_dout (io_dout)
);
endmodule
Synthesis produced the following log, indicating no FSMs have been discovered by heuristics:
---------------------------------------------------------------------------------
Finished applying 'set_property' XDC Constraints : Time (s): cpu = 00:00:07 ; elapsed = 00:00:09 . Memory (MB): peak = 1580.090 ; gain = 620.133
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
Finished RTL Optimization Phase 2 : Time (s): cpu = 00:00:08 ; elapsed = 00:00:09 . Memory (MB): peak = 1580.090 ; gain = 620.133
---------------------------------------------------------------------------------
Resource | Utilization |
---|---|
LUT | 4 |
FF | 3 |
IO | 4 |
2.3. Chisel with MuxLookup
Finally, we write the detector in Chisel with MuxLookup
which is the style preferred by us.
import chisel3._
import chisel3.util._
class Burst5Detector extends Module {
class Port extends Bundle {
val din = Input(Bool())
val dout = Output(Bool())
}
val io = IO(new Port)
private object State extends ChiselEnum {
val S_Idle = Value
val S_1High = Value
val S_2High = Value
val S_3High = Value
val S_4High = Value
val S_5High = Value
}
import State._
private val y = RegInit(S_Idle)
y := MuxLookup(y, S_Idle)(
Seq(
S_Idle -> Mux(io.din, S_1High, S_Idle),
S_1High -> Mux(io.din, S_2High, S_Idle),
S_2High -> Mux(io.din, S_3High, S_Idle),
S_3High -> Mux(io.din, S_4High, S_Idle),
S_4High -> Mux(io.din, S_5High, S_Idle),
S_5High -> Mux(io.din, S_5High, S_Idle)
)
)
io.dout := y === S_5High
}
class Top extends Module {
val detector = Module(new Burst5Detector)
val io = IO(new detector.Port)
io <> detector.io
}
The generated SystemVerilog code is listed below:
// Generated by CIRCT firtool-1.77.0
// [Common prolog omitted]
module Burst5Detector(
input clock,
reset,
io_din,
output io_dout
);
reg [2:0] y;
always @(posedge clock) begin
if (reset)
y <= 3'h0;
else
y <=
y == 3'h5 | y == 3'h4
? (io_din ? 3'h5 : 3'h0)
: y == 3'h3
? {io_din, 2'h0}
: {1'h0,
y == 3'h2
? {2{io_din}}
: y == 3'h1 ? {io_din, 1'h0} : {1'h0, y == 3'h0 & io_din}};
end // always @(posedge)
`ifdef ENABLE_INITIAL_REG_
`ifdef FIRRTL_BEFORE_INITIAL
`FIRRTL_BEFORE_INITIAL
`endif // FIRRTL_BEFORE_INITIAL
logic [31:0] _RANDOM[0:0];
initial begin
`ifdef INIT_RANDOM_PROLOG_
`INIT_RANDOM_PROLOG_
`endif // INIT_RANDOM_PROLOG_
`ifdef RANDOMIZE_REG_INIT
_RANDOM[/*Zero width*/ 1'b0] = `RANDOM;
y = _RANDOM[/*Zero width*/ 1'b0][2:0];
`endif // RANDOMIZE_REG_INIT
end // initial
`ifdef FIRRTL_AFTER_INITIAL
`FIRRTL_AFTER_INITIAL
`endif // FIRRTL_AFTER_INITIAL
`endif // ENABLE_INITIAL_REG_
assign io_dout = y == 3'h5;
endmodule
module Top(
input clock,
reset,
io_din,
output io_dout
);
Burst5Detector detector (
.clock (clock),
.reset (reset),
.io_din (io_din),
.io_dout (io_dout)
);
endmodule
Still, Vivado failed to pick up the FSM.
---------------------------------------------------------------------------------
Finished applying 'set_property' XDC Constraints : Time (s): cpu = 00:00:07 ; elapsed = 00:00:09 . Memory (MB): peak = 1518.992 ; gain = 558.602
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
Finished RTL Optimization Phase 2 : Time (s): cpu = 00:00:07 ; elapsed = 00:00:09 . Memory (MB): peak = 1518.992 ; gain = 558.602
---------------------------------------------------------------------------------
Resource | Utilization |
---|---|
LUT | 4 |
FF | 3 |
IO | 4 |
3. Speculation of cause
Vivado’s inference of FSMs, as per UG901-2024.1, rely on the detection of three clear stages of FSM’s state management. In Chisel generated code, some assignments of next-state have been optimized into concatenations: casez_tmp = {1'h0, io_din, 1'h0};
and casez_tmp = {io_din, 2'h0};
. These RHS’s likely prevent Vivado from deducing a constant next-state for the two branches. In fact, once we rewrite the generated code in Section 2.2 to make use of conditional expressions, Vivado could successfully detect the FSMs:
diff --git a/Top.sv b/Top.patched.sv
index 414fda3..fd4cc5b 100644
--- a/Top.sv
+++ b/Top.patched.sv
@@ -1,4 +1,4 @@
-// Generated by CIRCT firtool-1.77.0
+// Generated by CIRCT firtool-1.77.0; manually patched by Mantle
// Include register initializers in init blocks unless synthesis is set
`ifndef RANDOMIZE
@@ -58,11 +58,11 @@ module Burst5Detector(
3'b000:
casez_tmp = io_din ? 3'h1 : y;
3'b001:
- casez_tmp = {1'h0, io_din, 1'h0};
+ casez_tmp = io_din ? 3'b010 : 3'b000;
3'b010:
casez_tmp = io_din ? 3'h3 : 3'h0;
3'b011:
- casez_tmp = {io_din, 2'h0};
+ casez_tmp = io_din ? 3'b100 : 3'b000;
3'b100:
casez_tmp = io_din ? 3'h5 : 3'h0;
3'b101:
The output confirmed our speculation:
---------------------------------------------------------------------------------
Finished applying 'set_property' XDC Constraints : Time (s): cpu = 00:00:07 ; elapsed = 00:00:09 . Memory (MB): peak = 1579.922 ; gain = 619.777
---------------------------------------------------------------------------------
INFO: [Synth 8-802] inferred FSM for state register 'y_reg' in module 'Burst5Detector'
---------------------------------------------------------------------------------------------------
State | New Encoding | Previous Encoding
---------------------------------------------------------------------------------------------------
iSTATE4 | 000001 | 000
iSTATE0 | 000010 | 001
iSTATE1 | 000100 | 010
iSTATE2 | 001000 | 011
iSTATE3 | 010000 | 100
iSTATE5 | 100000 | 101
---------------------------------------------------------------------------------------------------
INFO: [Synth 8-3354] encoded FSM with state register 'y_reg' using encoding 'one-hot' in module 'Burst5Detector'
---------------------------------------------------------------------------------
Finished RTL Optimization Phase 2 : Time (s): cpu = 00:00:08 ; elapsed = 00:00:09 . Memory (MB): peak = 1579.922 ; gain = 619.777
---------------------------------------------------------------------------------
Resource | Utilization |
---|---|
LUT | 8 |
FF | 6 |
IO | 4 |
4. Conclusion
If you use Vivado (at least equal or before 2024.1) to synthesize your Chisel-generated SystemVerilog file, and need to make use of its FSM re-encoding optimizations, double check that the heuristics aren’t hindered by the upper-layer optimization.
Appendix A. Chisel project boilerplate
build.mill
:
package build
import mill._
import mill.define.Sources
import mill.modules.Util
import mill.scalalib.scalafmt.ScalafmtModule
import mill.scalalib._
import mill.bsp._
object chisel_fsm_test extends ScalaModule with ScalafmtModule { m =>
override def scalaVersion = "2.13.14"
override def scalacOptions = Seq(
"-language:reflectiveCalls",
"-deprecation",
"-feature",
"-Xcheckinit"
)
override def ivyDeps = Agg(
ivy"org.chipsalliance::chisel:7.0.0-M2"
)
override def scalacPluginIvyDeps = Agg(
ivy"org.chipsalliance:::chisel-plugin:7.0.0-M2"
)
def repositoriesTask = T.task {
Seq(
coursier.MavenRepository("https://repo.scala-sbt.org/scalasbt/maven-releases"),
coursier.MavenRepository("https://oss.sonatype.org/content/repositories/releases"),
coursier.MavenRepository("https://oss.sonatype.org/content/repositories/snapshots")
) ++ super.repositoriesTask()
}
}
Elaborate.scala
:
object Elaborate extends App {
val firtoolOptions = Array(
"--lowering-options=" + Seq(
"disallowLocalVariables",
"disallowPackedArrays",
"locationInfoStyle=none"
).reduce(_ + "," + _)
)
circt.stage.ChiselStage.emitSystemVerilogFile(new Top(), args, firtoolOptions)
}
Appendix B. CIRCT-generated prolog
// Generated by CIRCT firtool-1.77.0
// Include register initializers in init blocks unless synthesis is set
`ifndef RANDOMIZE
`ifdef RANDOMIZE_REG_INIT
`define RANDOMIZE
`endif // RANDOMIZE_REG_INIT
`endif // not def RANDOMIZE
`ifndef SYNTHESIS
`ifndef ENABLE_INITIAL_REG_
`define ENABLE_INITIAL_REG_
`endif // not def ENABLE_INITIAL_REG_
`endif // not def SYNTHESIS
// Standard header to adapt well known macros for register randomization.
// RANDOM may be set to an expression that produces a 32-bit random unsigned value.
`ifndef RANDOM
`define RANDOM $random
`endif // not def RANDOM
// Users can define INIT_RANDOM as general code that gets injected into the
// initializer block for modules with registers.
`ifndef INIT_RANDOM
`define INIT_RANDOM
`endif // not def INIT_RANDOM
// If using random initialization, you can also define RANDOMIZE_DELAY to
// customize the delay used, otherwise 0.002 is used.
`ifndef RANDOMIZE_DELAY
`define RANDOMIZE_DELAY 0.002
`endif // not def RANDOMIZE_DELAY
// Define INIT_RANDOM_PROLOG_ for use in our modules below.
`ifndef INIT_RANDOM_PROLOG_
`ifdef RANDOMIZE
`ifdef VERILATOR
`define INIT_RANDOM_PROLOG_ `INIT_RANDOM
`else // VERILATOR
`define INIT_RANDOM_PROLOG_ `INIT_RANDOM #`RANDOMIZE_DELAY begin end
`endif // VERILATOR
`else // RANDOMIZE
`define INIT_RANDOM_PROLOG_
`endif // RANDOMIZE
`endif // not def INIT_RANDOM_PROLOG_
// [Actual module content starts here]