91³Ô¹ÏÍø

Techniques for Improving QoR Using analyze_datapath_extraction

By Mahurshi Akilla, Corporate Applications Engineer, Synopsys and Lakshmi Gopalakrishnan, Corporate Applications Engineer, Synopsys

Introduction

Efficient datapath extraction is essential in getting good quality of results (QoR), particularly in designs with large datapath content. This article highlights the importance of RTL coding style, describes how coding style can influence the datapath extraction, and explains some of the datapath analysis techniques that designers can use to improve the QoR of their designs.

Analyze Datapath Extraction

To get the maximum benefit from datapath extraction, designers should follow the coding guidelines that allow Design Compiler to extract the largest possible datapath block. A large extracted datapath contains more arithmetic operators that allow high level optimizations to occur, resulting in the most optimized design during synthesis. To find out more about the coding guidelines that provide good QoR, refer to SolvNet article, ¡°¡± (SolvNet login required).

The analyze_datapath_extraction command in Design Compiler (DC) gives feedback whenever datapath extraction is blocked due to RTL coding style. This feedback is provided as HDL messages. Making changes to the coding style based on this feedback can greatly improve how efficiently DC can optimize the design and give better QoR.

In DC version I-2013.12, the analyze_datapath_extraction command was enhanced to quickly determine and prioritize the datapath leakage issues in large hierarchical designs. To find out more about these enhancements, refer to ¡°What¡¯s New with DesignWare Building Blocks and minPower Components in I-2013.12.¡±

Table 1 shows different kinds of HDL messages reported by the analyze_datapath_extraction command. Designers should first focus on the HDL messages that identify arithmetic operators on the timing critical blocks or on the blocks that consume relatively higher power, as fixing those cases provides the greatest QoR benefit. In addition, note that HDL-120 and HDL-132 messages provide good QoR benefit with relatively simple RTL changes, and these messages also tend to occur more frequently in typical designs.

QoR gainRTL fixFrequencyPriority
HDL-120: leakagespeed(high)easyHigh8 high
HDL-132: unsigned subtractorspeed(high)easyMedium7 high
HDL-121: sequential cellarea(low)hardMedium1 low
HDL-122: internal saturationspeed(high)mediumMedium4 medium
HDL-125: instantiated DWspeed(high)mediumMedium3 low
HDL-126: hierarchical boundaryspeed(high)medium-hardHigh2 low
HDL-129: gtech adder chainsspeed(high)easyLow6 medium
HDL-133: gtech adder between operatorspeed(high)easyLow5 medium

Table 1: HDL messages reported by the analyze_datapath_extraction command

Datapath Leakage

Datapath leakage happens when an internal operand is not wide enough to store the result of an operation, but the full result is required later for another operation.

In the ¡°Bad QoR¡± example in Table 2, the output of the operation a* b + c should be 17 bits wide, but it is truncated to a 16-bit value in the assign statement in line 5. In line 6, there is an extension in the addition operation leading to a 17-bit value, causing datapath leakage. The analyze_datapath_extraction command identifies this case and issues an HDL-120 message. Correcting the bit width of the intermediate value ¡®t¡¯ as shown in the ¡°Good QoR¡± example below resolves this issue and provides better QoR.

Bad QoRGood QoR
    1 module bad (a, b, c, d, z);
2 input [7:0] a, b, c, d;
3 output [16:0] z;
4 wire [15:0] t;
5 assign t = a * b + c;
6 assign z = t + d;
7 endmodule
    1 module bad (a, b, c, d, z);
2 input [7:0] a, b, c, d;
3 output [16:0] z;
4 wire [16:0] t;
5 assign t = a * b + c;
6 assign z = t + d;
7 endmodule
Output from
analyze_datapath_extraction:
Information: Operator associated
with resources 'add_5 (top.v:5)'
in design 'top' breaks the
datapath extraction because there
is leakage due to truncation on
its fanout. (HDL-120)
There is no leakage here, so DC is able to implement a smaller and faster design.

Table 2: Improving QoR by increasing bit width

For large designs, there may be multiple HDL leakage messages (HDL-120 and HDL-132) to review. To make this task easier, analyze_datapath_extraction helps sort the messages when the ¨Csort or ¨Cmax switches are used. The messages are sorted based on the priority of the potential QoR impact. This priority is determined based on the following criteria:

  1. The width of the operand that has leakage detected
  2. The size and operation type of the operator driving the operand with leakage
  3. The sizes and operation types of the operators driven by the leaked operand

Improving QoR Using the analyze_datapath_extraction Command

When RTL coding guidelines are not followed, designers can use the HDL messages provided by the analyze_datapath_extraction command to modify RTL and improve QoR.

Example: A Verilog design consisting of datapath logic that does not follow the RTL coding guidelines:

module logicblock (

 input [15:0] a5, b5,

 output [31:0] c5

);

  assign c5 = a5 * b5;

endmodule

module top (

  input clk, en,

  input  [15:0] a1, a2, a3, a4, a5, a6, a7,

  input  [15:0] b1, b2, b3, b4, b5, b6, b7,

  output [32:0] z1,

  output [32:0] z2,

  output reg [127:0] z3);

  wire [32:0] c1 = a1*b1 + b2;

  wire [32:0] c2 = a2*b2 + b3;

  wire [31:0] c3 = a3*b3 + b4; //truncating result

  wire [32:0] c4 = a5*b5 + b6; 

  wire [31:0] c5;

  wire [31:0] c6;

  reg [63:0] c8, c9;

  //part of datapath is at a different hierarchy

  logicblock i_logicblock (a5, b5, c5);

  //simple multiplication operation done using DW02_mult instead of * operator

  DW02_mult #(16, 16) U1 ( .A(a6), .B(b6), .TC(1'b0), .PRODUCT(c6) );

  //extension of truncated 32 bit result in c3 to 33 bits

  assign z1 = c1 + c2 + c3 + c4 + c5 + c6 + b7;

  wire [28:0] c7_t = 0 - a7; //output of operation is treated as

signed

  assign z2   = c7_t + b7;   //driver(signed)/load(unsigned) mismatch

at fanin

  //operation broken down with manual pipelining

  always @(posedge clk) begin

      if (en) begin

           c8 <= c1*c2;

           c9 <= c3*c4;

           z3 <= c8*c9;

      end

  end

endmodule

Addressing analyze_datapath_extraction Messages

The analyze_datapath_extraction command is used in the compile script after the RTL elaboration stage and prior to the compile command, and helps designers understand the issues that block datapath extraction. In a large design that may contain multiple HDL messages from analyze_datapath_extraction, understanding which messages provide the best QoR improvement enables designers to address those messages first. The following section suggests the process to fix the analyze_datapath_extraction messages generated for the above design. 

Step 1: Address messages related to timing violations where datapath logic is in the critical path

First, look at the timing report of the design and determine if datapath logic is present in critical paths. If HDL messages identify operators that happen to be on the timing critical paths, address them first. Pipelining is used to improve the throughput in high performance designs. However, datapath optimization may be prevented if manual pipelining is done by sub-optimally placing the pipeline registers in between datapath logic. When pipelining occurs around datapath logic, the retiming feature should be used instead of manual pipelining, as retiming tends to provide better QoR benefit as shown:

a. Look at the output of the report_qor command to see if there are any paths that violate timing.

Timing Path Group 'clk'

  Levels of Logic         :      31.00

  Critical Path Length    :       4.54

  Critical Path Slack     :      -0.13

  Critical Path Clk Period:       4.66

  Total Negative Slack    :      -8.22

  No. of Violating Paths  :      81.00

b. Once the violating paths are clear, the next step is to check if any of these paths contain datapath components. Use the report_timing command to look at the top violating timing paths. Extracted datapath blocks can be identified in a timing path as they have the prefix ¡°DP_OP_¡± in the cell names. It is harder to identify singleton datapath components as they get ungrouped during synthesis, so it is essential to look at the actual cells in the critical path and analyze, as shown:

Startpoint: c8_reg_41_ (rising edge-triggered flip-flop clocked by clk)

Endpoint: z3_reg_114_ (rising edge-triggered flip-flop clocked by clk)

Path Group: clk

Path Type : max

Des/Clust/PortWire Load ModelLibrary
------------------------------------------------------------------------------------------------------------------------------------------------
TopB0.2X0.2ts28nphhpmc_ss0p9vn40c
PointIncrPath
------------------------------------------------------------------------------------------------------------------------------------------------
clock clk (rise edge) 0.00 0.00
clock network delay (ideal) 0.00 0.00
c8_reg_41_/CK (SEM_FDPHQ_2) 0.00 0.00 r
c8_reg_41_/Q (SEM_FDPHQ_2)0.20 0.20 f
U472/X (SEM_INV_9) 0.08 0.29 r
U7630/X (SEM_EN2_8) 0.18 0.47 r
U5794/X (SEM_INV_6) 0.15 0.62 f
U5704/X (SEM_ND2_12) 0.08 0.70 r
U136/X (SEM_INV_18) 0.04 0.74 f
U5279/X (SEM_INV_32) 0.05 0.79 r
U8180/X (SEM_OAI22_1) 0.13 0.91 f
U8260/CO (SEM_ADDF_V1_2) 0.34 1.25 f
U8432/S (SEM_ADDF_V1_2) 0.50 1.76 r
U1748/X (SEM_EN2_8) 0.21 1.96 r
U1198/X (SEM_EN2_8) 0.22 2.19 r
..
..
..
..
data arrival time4.54
clock clk (rise edge)4.66 4.66
clock network delay (ideal)0.004.64
z3_reg_114_/CK (SEM_FDPRBQ_V2_2)0.004.66 r
library setup time-0.254.41
data required time4.41
------------------------------------------------------------------------------------------------------------------------------------------------
data required time4.41
data arrival time-4.54
------------------------------------------------------------------------------------------------------------------------------------------------
slack (VIOLATED)-0.13

The timing path above shows full adders (SEM_ADDF_V1_2) in the timing critical path. Using RTL Cross Probing in Design Vision (Figure 1), the adder implemented is part of the datapath block. ¡°Origin¡± listed as ¡°datapath¡± below points to the fact that this cell (U8260) has originated as a result of datapath optimization. Similar results are seen for cell U8432 as well. (For more information on RTL Cross Probing, refer to Design Compiler User Guide). 

Figure 1: Using RTL Cross Probing in Design Vision

c. The next step is to check for analyze_datapath_extraction commands that point to logic in this register-to-register path (HDL-121 messages). The above RTL code that is in the critical path points to logic that uses manual pipelining. Using the retiming feature in these paths can help to ease timing. (Refer to SolvNet Article 015771 : Coding Guidelines for Datapath Synthesis). Because HDL-121 messages are directly related to the critical path, we need to address them first. 

There are two HDL-121 messages pointing to the registers that break the connection between DesignWare blocks:

Information: There is sequential cell between operator associated with

'mult_41 (top.v:41)' and 'mult_43 (top.v:43)' in design 'top'. (HDL-

121)

Information: There is sequential cell between operator associated with

'mult_42 (top.v:42)' and 'mult_43 (top.v:43)' in design 'top'. (HDL-

121)

Now that we have identified that manual pipelining is being done on logic that is in the critical path, the fix is to move the pipeline registers to the output of the datapath logic and do retiming using 'set_optimize_registers' in Design Compiler (see "Design Compiler Reference Manual: Register Retiming"). This will let the tool extract all relevant logic around the registers, choose the appropriate architectures and move the registers to the optimal locations to fix timing.

Bad QoRGood QoR
output reg [127:0] z3;
reg [63:0] c8, c9;
//manual pipelining of datapath
always @(posedge clk) begin
if (en) begin
c8 <= c1*c2;
c9 <= c3*c4;
z3 <= c8*c9;
end
end
output reg [127:0] z3;
reg [127:0] c8;
//move pipeline register to the output and use
retiming by doing set_optimize_registers before
compile_ultra
always @(posedge clk) begin
if (en) begin
c8 <= c1*c2*c3*c4;
z3 <= c8;
end
end

Table 3: Addressing HDL-121 message in the design

Note: While addressing HDL-121 messages, it is important to make sure that

  1. The RTL contains pipelined registers in between datapath operators and the registers can be retimed
  2. The functionality of the RTL is not changed including latency on all its outputs
  3. analyze_datapath_extraction shows relevant DesignWare logic is extracted when registers are replaced with nets

Step 2: Address high-priority datapath leakage messages

The datapath leakage messages are high priority messages that, when addressed, provide improvement to QoR. These messages should be addressed first, especially for arithmetic operations on the critical paths. Shown below is the analyze_datapath_extraction command output used with the ¨Csort option on the example design. This lists out the leakage messages based on high to low priority.

****
Leakage detections on design 'top'
****
Msg type         |Msg count        |Max lkg width    |Avg. lkg width
-----------------------------------------------------------------------------------------------------------------------------------------------------------
HDL-12023230
HDL-13212929
***Sorted leakage messages

Information: Operator associated with resources 'add_18 (top.v:18)' in design 'top' breaks the datapath extraction because there is leakage due to truncation on its fanout to operator of resources 'add_32 (top.v:32) add_32_2 (top.v:32) add_32_3 (top.v:32) add_32_4 (top.v:32) add_32_5 (top.v:32) add_32_6 (top.v:32)'. (HDL-120)

Information: The output of subtractor associated with resources 'sub_34 (top.v:34)' is treated as signed signal. (HDL-132)

Information: Operator associated with resources 'add_35 (top.v:35)' in design 'top' breaks the datapath extraction because there is leakage due to driver(signed)/load(unsigned) sign mismatch on fanin from operator of resources 'sub_34 (top.v:34)'. (HDL-120)

Table 4 lists the code corresponding to the leakage message and the corresponding RTL fix for QoR improvement.

Bad QoRGood QoR
1. input [15:0] a3, b3, b4;
output [32:0] z1;
wire [32:0] c1, c2, c4;
wire [31:0] c4, c5;
wire [31:0] c3 = a3*b3 + b4; //truncation of MSB
//extension of truncated 32 bit result
in c3 to 33 bits
assign z1 = c1 + c2 + c3 + c4 + c5 + c6 + b7;
input [15:0] a1, a2, a3, a4, a5, a6, a7;
input [15:0] b1, b2, b3, b4, b5, b6, b7;
wire [32:0] c1, c2, c4;
wire [31:0] c4, c5;
wire [32:0] c3 = a3*b3 + b4;//increase O/P
width to ensure no truncation
assign z1 = c1 + c2 + c3 + c4 + c5 + c6 + b7;
2. input [15:0] a7, b7;
wire [28:0] c7_t = 0 - a7;
//output of operation is treated as signed
assign z2 = c7_t + b7;
//driver(signed)/load(unsigned) mismatch at fanin
input [15:0] a7, b7;
output signed [32:0] z2;
wire signed [32:0] c7_t = 0 - a7;
assign z2 = c7_t + $signed(b7); //ensure
this is a signed operation and this change
follows design intent

Table 4: Addressing HDL-120 and HDL-132 messages in the design 

Table 5 lists the QoR improvement seen due to fixing timing violations as well as the high priority datapath leakage issues in the design.

S.NoBefore fixesAfter fixing timing violations and leakage messagesPercentage improvement
1Critical Path Slack-1.30
2Critical Path Length4.544.19
3Number of violating Paths810
4Total Negative Slack-8.220
5Area31511.922384024.349%

Table 5: QoR improvement seen after fixing timing violations and datapath leakage based on analyze_datapath_extraction messages 

*The runs were performed using a 28nm HP library using I-2013.12-SP2

Step 3: Addressing other lower priority analyze_datapath_extraction messages

Now that all the timing violations have been fixed and the area is reduced, the next step is to address the lower priority messages to improve QoR. The following section lists the lower priority messages issued in this design and the QoR benefit seen by fixing them.

Listed are the two lower priority analyze_datapath_extraction messages:

Information: Missing possible extraction across cell 'mult_5 (top.v:5)' in design 'logicblock' and cell 'add_32_4 (top.v:32)' in design 'top'. (HDL-126)

Information: Cell 'U1 (top.v:29)' in design 'top' cannot be extracted because it instantiates 'DW02_mult', which could be inferred with operator '*' instead. (HDL-125)

Bad QoRGood QoR
module logicblock (
input [15:0] a5, b5,
output [31:0] c5
);
assign c5 = a5 * b5;
endmodule

module top (
input [15:0] a5, b5
¡­¡­);
wire [31:0] c5;
//part of datapath is at
a different hierarchy
logicblock i_logicblock (a5, b5, c5);
..
..
endmodule
module top (
input [15:0] a5, b5,
¡­¡­¡­);
wire [31:0] c5;
assign c5 = a5*b5;
//move datapath logic into the
same hierarchy to enable
datapath extraction
¡­
...
Endmodule
module top (
input [15:0] a6, b6,
¡­..);
wire [31:0] c6;
//simple multiplication
operation done using
//DW02_mult instead of * operator
DW02_mult #(16, 16) U1 ( .A(a6), .B(b6),
.TC(1'b0), .PRODUCT(c6) );
module top (
input [15:0] a6, b6,
¡­..);
wire [31:0] c6;
assign c5 = a5*b5;
// use operator inference
instead of DW instantiation (DW02_mult)
..
..
endmodule

Table 6: Addressing HDL-126 and HDL-125 messages

Table 7 shows the QoR improvement after fixing the RTL based on lower priority HDL messages from analyze_datapath_extraction.

S.NoBefore fixesAfter fixing timing violations and leakage messagesPercentage improvement
1Critical Path Slack00
2Critical Path Length4.194.26
3Number of violating Paths00
4Total Negative Slack00
5Area2384022983.423.59%

Table 7: QoR improvement seen after fixing the RTL based on lower priority HDL messages from analyze_datapath_extraction

*The runs were performed using a 28nm HP library using I-2013.12-SP2

Using analyze_datapath_extraction Command to Improve QoR in Large Designs

In large designs, there is a possibility that there are lots of HDL messages as reported by the analyze_datapth_extraction command. In such scenarios, it is important to focus on the HDL messages that directly relate to critical blocks in the design and address them first.

The following guidelines can be useful in simplifying the process to address the issues identified by analyze_datapath_extraction:

  1. Identify the critical sub-designs with respect to timing, power, and arithmetic contents (to run on a sub-design, pass [get_designs ] to the analyze_datapath_extraction)
  2. Focus on the messages generated from the critical sub-designs
  3. Prioritize the operators based on their timing, power, area (for HDL-120/HDL-132, use ¨Csort)
  4. Modify RTL to improve datapath extraction on operators associated with the most critical messages first

The other low priority messages can provide additional marginal improvement if addressed. Refer to the man page of the analyze_datpath_extraction command for more information.

Conclusion

The analyze_datapath_extraction command provides valuable guidance to help identify the areas where RTL coding is not optimal for efficient datapath synthesis. Designers can use the output of this command and focus on the HDL messages that directly impact the critical blocks to get the most benefit. Making small modifications to the RTL based on the HDL messages from analyze_datapath_extraction command fixes timing violations while reducing area use. Designers should consider using RTL coding guidelines and the HDL messages from analyze_datapath_extraction command to get the best possible QoR in datapath designs.