91³Ô¹ÏÍø

ASIP Models

ASIP Designer comes with an extensive library of example processor models provided as nML source code. They can be used as a starting point for architectural exploration and customer-specific production designs, or just be partially leveraged as reference implementation for selected architectural features. All these models come with a fully working toolset, SDK and synthesizable RTL, but are not to be considered as verified IP.

Microcontrollers


Compact 16-bit RISC microcontroller

Compact 16-bit RISC microcontroller with reduced hardware

Trv (Family)

Variants of microcontrollers with RISC-V ISA

DLX (Family)

Variants of Hennessy & Patterson 32-bit RISC microcontroller DLX

Generic DSPs


16/32-bit DSP with single MAC unit, dual load-store units with post-modify addressing, and 3-way instruction-level parallelism in 16/32-bit variable-length instructions

Educational Models


Tvec (Family)

Variants of wide SIMD processor, with per-lane predication controlled by predicate registers, and gather/scatter-based vector addressing. Additional family member supports compilation of OpenCL C kernels

Tvliw (Family)

Variants of a 4-slot VLIW processor, with predication of VLIW slots and instruction compaction

Tutorial model used in basic processor modeling hands-on laboratory

Workshop model: Matrix multiplication on a RISC-V scalar core (Trv32p5x) with SIMD vector and ILP extensions

Historic educational model used in manuals

Domain-Specific Accelerators


Video accelerator for motion estimation

Accelerator for gaussian image filtering

SIMD vector processor for communication kernels, supporting complex-type operations

Scalar accelerator for block matrix inversion

Scalar FFT accelerator

Accelerator for 5G New Radio MMSE equalization using Cholesky decomposition

Accelerator for 5G Low Density Parity Check decoding

Accelerator for SHA256 hashing by extension of a RISC-V scalar core

Accelerator for the Kyber key encapsulation mechanism (post-quantum cryptography) by extension of a RISC-V scalar core

AI accelerator for MobileNet Convolutional Neural Network

Medium-throughput AI accelerator supporting TFLM

Primecore *

ASIP for FFT and DFT computation in 4G/5G mobile devices, supporting:

  • FFT for all power-of-2 sizes ranging from 8 to 2048
  • DFT for all prime-factorizable sizes ranging from 6 to 1536

Tcrypt *

Accelerator for AES encryption and decryption

Tvox *

Accelerator for simultaneous localization and mapping (SLAM)

JEMA/JEMB *

Dual-ASIP design for JPEG encoding

* Available on demand. For more information, please contact Synopsys by sending your request to asipinfo@synopsys.com

Microcontrollers

Tmicro

16-bit microcontroller

  • 16-bit integer data path
  • 3-stage exposed pipeline
  • 8x16-bit general-purpose register file
  • 16-bit instruction width
  • 32-bit multi-cycle multi-word long immediate instructions
  • Single data memory
  • Separate AGU with indirect addressing and post-modify addressing modes
  • Additional features:
    • 16x16->32-bit multiplier
    • 16-bit serial divider
    • Zero-overhead loop support:
      • 3-level do-loop
    • Interrupt support
    • OCD support

Back to example models overview

Tnano

16-bit microcontroller with reduced hardware (based on Tmicro)

Differences to Tmicro:
  • No HW multiplier
  • No serial divider
  • No separate AGU: address computations are performed on the ALU
  • No zero-overhead loop support
  • No 32-bit instructions

Back to example models overview

Trv (Family)

The Trv family is a collection of RISC-V processor models combining different data path widths, pipeline depths, and optional extensions. The base models, supporting integer and multiplication instructions, are labeled Trv<ww>p<n>[f][x][c], with <ww> denoting the data path width (32 or 64) and <n> denoting the pipeline depth (3 or 5). Optional extensions are indicated by additional suffixes:

  • Suffix ¡°f¡± denotes single-precision floating point extensions (32-bit only).
  • Suffix ¡°x¡± denotes selected DSP extensions (can be combined with ¡°f¡±).
  • Suffix ¡°c¡± denotes support for compressed 16-bit instruction format (Trv32p3 only).

A separate model, Trv32p3sdx, with ¡°sdx¡± denoting ¡°simple data path extensions¡± contains a low-barrier modeling skeleton for custom data path extensions and comes with a set of example implementations for different application domains, such as FFT, SHA256 encryption, and a neural network for keyword spotting.

The following table lists the features of the available Trv family models in detail.

Trv32p3 (base model):

32-bit RISC-V microcontroller with with 3-stage pipeline

 

  • Supported ISA:
    • RV32IM: base integer instructions + multiplication + division
    • Zicsr: control and status register instructions
    • Zba: advanced address generation
    • Zbb: basic bit manipulation
    • Zbs: single-bit instructions
  • 32-bit integer data path
  • 3-stage protected pipeline
    • Bypasses & HW stalls
  • 32x32-bit general-purpose register file
  • 32-bit instruction width
  • Single data memory
  • Separate AGU with indirect addressing
  • Additional features:
    • 32x32->64-bit multiplier
    • 32-bit serial divider
    • Interrupt support
    • OCD support

Trv32p3x (variant):

Trv32p3 with DSP extensions

Features on top of Trv32p3:

  • 2-way static ILP:
    • arith/ctrl || move/load/store
  • Zero-overhead loop support:
    • 2-level do-loop
    • 1-level zloop
  • AGU with post-modify addressing modes

Trv32p3f (variant):

Trv32p3 with floating-point hardware support

Features on top of Trv32p3:

  • Supported ISA:
    • RV32IMFZfinx
  • FPU based on HardFloat [Hauser]
  • Single-precision serial division & square-root unit

Trv32p3fx (variant):

Trv32p3f with DSP extensions

Features on top of Trv32p3f:

  • 2-way static ILP:
    • arith/ctrl || move/load/store
  • Zero-overhead loop support:
    • 2-level do-loop
    • 1-level zloop
  • AGU with post-modify addressing modes

Trv32p3c (variant):

Trv32p3 with compressed instruction support

Features on top of/different from Trv32p3:

  • Supported ISA:
    • RVC: Support for 16-bit compressed instruction format
  • No interrupt support

Trv32p5 (variant):

32-bit RISC-V microcontroller with 5-stage pipeline

Features different from Trv32p3:

  • 5-stage protected pipeline (instead of 3)

Trv32p5x (variant):

Trv32p5 with DSP extensions

Features on top of Trv32p5:

  • 2-way static ILP:
    • arith/ctrl || move/load/store
  • Zero-overhead loop support:
    • 2-level do-loop
    • 1-level zloop
  • AGU with post-modify addressing modes

Trv32p5f (variant):

Trv32p5 with floating-point hardware support

Features on top of Trv32p5:

  • Supported ISA:
    • RV32IMFZfinx
  • FPU based on HardFloat [Hauser]
  • Single-precision serial division & square-root unit

Trv32p5fx (variant):

Trv32p5f with DSP extensions

Features on top of Trv32p5f:

  • 2-way static ILP:
    • arith/ctrl || move/load/store
  • Zero-overhead loop support:
    • 2-level do-loop
    • 1-level zloop
  • AGU with post-modify addressing modes

Trv64p3 (base model):

64-bit RISC-V microcontroller with 3-stage pipeline

 

  • Supported ISA:
    • RV64IM: base integer instructions + multiplication + division
  • 64-bit integer data path
  • 3-stage protected pipeline
    • Bypasses & HW stalls
  • 32x64-bit general-purpose register file
  • 32-bit instruction width
  • Single data memory
  • Separate AGU with indirect addressing
  • Additional features:
    • 64x64->128-bit multiplier
    • 64-bit serial divider
    • OCD support

Trv64p3x (variant):

Trv64p3 with DSP extensions

Features on top of Trv64p3:

  • 2-way static ILP:
    • arith/ctrl || move/load/store
  • Zero-overhead loop support:
    • 2-level do-loop
    • 1-level zloop
  • AGU with post-modify addressing modes

Trv64p5 (variant):

64-bit RISC-V microcontroller with 5-stage pipeline

Features different from Trv64p3:

  • 5-stage protected pipeline (instead of 3)

Trv64p5x (variant):

Trv64p5 with DSP extensions

Features on top of Trv64p5:

  • 2-way static ILP:
    • arith/ctrl || move/load/store
  • Zero-overhead loop support:
    • 2-level do-loop
    • 1-level zloop
  • AGU with post-modify addressing modes

Trv32p3sdx (variant):

Trv32p3c with skeleton for custom data path extensions

Features on top of Trv32p3c:

  • Model stubs for low-barrier modeling of extension instructions
  • Shared 32x32-bit / 16x64-bit register file to enable both 32-bit and 64-bit extensions
  • Zero-overhead loop support:
    • 2-level do-loop
  • AGU with post-modify addressing modes

Back to example models overview

DLX (Family)

DLX (base model):

32-bit microcontroller (Hennessy & Patterson DLX)

 

  • 32-bit integer data path
  • 5-stage protected pipeline
    • Bypasses & HW stalls
  • 32x32-bit general-purpose register file
  • 32-bit instruction width
  • Single data memory
  • Separate AGU with indirect addressing and post-modify addressing modes
  • Additional features:
    • 32x32->32-bit multiplier
    • 32-bit serial divider
    • Zero-overhead loop support:
      • 2-level do-loop
      • 1-level zloop
  • Interrupt support
  • OCD support

FLX (variant):

DLX with HW floating point unit

Features on top of DLX base model:

  • 32-bit floating-point unit
  • Floating-point multicycle divider and square-root
  • Variant with custom 24-bit non-IEEE floating-point type

TLX (variant):

DLX with reduced register file and exposed shallower pipeline

Features different from DLX base model:

  • Reduced register file (16 x 32-bit)
  • 3-stage exposed pipeline

MLX (variant):

DLX with two-stage fetch pipeline

Features different from DLX base model:

  • Two-cycle latency for PM loads, resulting in a two-stage fetch pipeline

ILX (variant):

DLX with multi-threading support, exposed pipeline

Features different from DLX base model:

  • 4-way static multi-threading support
  • 4-fold instantiation of original DLX register set
  • 5-stage exposed pipeline

PLX (variant):

DLX with multi-threading support, protected pipeline

Features different from DLX base model:

  • 8-way static multi-threading support
  • 8-fold instantiation of original DLX register set

VLX (variant):

DLX with SIMD vector extensions

Features on top of DLX base model:

  • 4-lane SIMD vector ALU (4 x 32-bit)
  • 16 x 128-bit vector register file
  • Vector load/store (128-bit memory access)
  • 5-stage protected vector pipeline:
    • Bypassed vector registers

BLX (variant):

DLX with simple branch predictor

 

Features on top of DLX base model:

  • Branch prediction logic
  • Branch target buffer (BTB) with 64 entries, fully associative content-addressable memory

Back to example models overview

Generic DSPs

Tdsp

16/32-bit DSP with single MAC unit, dual load-store units with post-modify addressing, and 3-way instruction-level parallelism in 16/32-bit variable-length instructions

  • 16/32-bit fractional data path
  • 3-stage exposed pipeline
    • Bypassed modifier registers only
  • Register files:
    • 8x16-bit data register file
    • 4x32-bit long-word register file
    • 8x20-bit pointer register file
    • 4x16-bit modifier register file
  • 16/32-bit instructions
  • Dual-port data memory
  • 2 AGUs with post-modify and cyclic addressing modes
  • Additional features:
    • 16x16->32-bit MAC unit
    • 32-bit serial divider
    • Zero-overhead loop support:
      • 3-level do-loop
    • Interrupt support
    • OCD support

Back to example models overview

Educational Models

Tvec (Family)

Tvec1 (base model):

Scalar microcontroller with additional SIMD vector data path

  • Based on Tmicro
  • 16-bit integer scalar data path
  • 128-bit SIMD vector data path
  • 3-stage exposed pipeline
  • Register files:
    • 8x16-bit scalar register file
    • 4x128-bit vector register file
  • 16-bit instruction width
  • Single data memory with support for both scalar and wide vector access
  • Single AGU with indirect and post-modify addressing modes
  • 8-lane SIMD vector ALU
    • (additive arithmetic, logic, min/max, vector sum)
  • Additional features:
    • 16x16->32-bit scalar Multiplier/mac unit
    • No hardware divider
    • Zero-overhead loop support:
      • 3-level do-loop
    • Interrupt support
    • OCD support

Tvec2 (variant):

Tvec1 with vector predication

Features on top of Tvec1:

  • 4x8-bit vector condition register file
  • Guarded SIMD instructions via vector predication (lane-enables)

Tvec3 (variant):

Tvec2 with vector-based vector addressing

Features on top of Tvec2:

  • Vector load/store instructions with vector-based vector addressing

Tvec4 (variant):

Tvec2 with scalar-based vector addressing

Features on top of Tvec2:

  • Vector load/store instructions with scalar-based vector addressing
  • Gather-scatter I/O interface to resolve memory bank access conflicts

Tvec5 (variant):

Tvec4 support for multiple vector types on shared vector ALU

Features on top of Tvec4:

  • Vector ALU supporting two vector types on shared hardware:
    • 8x16-bit SIMD data path
    • 4x32-bit SIMD data path

 

Back to example models overview

Tvliw (Family)

Tvliw1 (base model):

32-bit microprocessor with 4-slot VLIW instruction level parallelism

  • 32-bit integer data path
  • 3-stage exposed pipeline
  • Register files:
    • 16x32-bit data register file
    • 8x32-bit pointer register file
    • 8x32-bit modifier register file
  • 96-bit instruction width
  • 4-way VLIW instruction level parallelism
    • 2 arithmetic slots
    • 2 load/store/move slots
  • Dual-port data memory
  • 2 AGUs with post-modify addressing modes
  • Additional features:
    • 32x32->32-bit multiplier
    • Zero-overhead loop support:
      • 1-level do-loop

Tvliw2 (variant):

Tvliw1 with variable-length instruction level parallelism

Features on top of Tvliw1:

  • Variable-length instruction formats with predecoding and expansion in the PCU:
    • 1 to 4 parallel instructions
    • 24/48/72/96-bit instruction width
  • Instruction predication based on up to 8 dynamic conditions
  • 8x1-bit condition register file for instruction predication

Tvliw3 (variant):

Tvliw2 with additional 2-cycle program fetch pipeline

Features on top of Tvliw2:

  • Program fetch pipeline supporting program memory with 2-cycle load latency
  • Loop instruction buffer

Back to example models overview

Tinycore2

Tutorial model used in basic processor modeling hands-on laboratory

  • 16-bit integer data path
  • 4-stage exposed pipeline
  • 8x16-bit register file
  • 16-bit ALU
  • 14-bit instruction width
  • Single-port memory with indirect and post-increment addressing modes
  • No separate AGU, address computation performed on ALU
  • Zero-overhead loop support:
    • 1-level do-loop

Back to example models overview

Matmul

Workshop model: Matrix multiplication on a RISC-V scalar core (Trv32p5x) with SIMD vector and ILP extensions.

Features on top of/different from Trv32p5x:

  • 4-lane SIMD vector data path (4x32-bit)
    • Including 4-lane vector mac unit (32x32->32-bits)
  • 8x128-bit vector register file
    • with exposed pipeline, partially bypassed
  • Unified vector/scalar memory

Back to example models overview

Tctcore

Historic educational model used in manuals

  • 16-bit integer data path
  • 4-stage exposed pipeline
  • Register files:
    • 8x16-bit distributed data register file
    • 4x10-bit distributed pointer register file
    • 4x10-bit distributed modifier register file
  • 18-bit instruction width
  • Dual-port data memory
  • 2 AGUs with post modify addressing modes and separated pointer/modifier register subsets
  • 16-bit ALU with dedicated operand/result registers
  • Additional features:
    • 16x16->32-bit scalar multiplier/mac unit with dedicated operand/result registers
    • Zero-overhead loop support:
      • 1-level do-loop

Back to example models overview

Domain-Specific Accelerators

Tmotion

Video accelerator for motion estimation

  • Based on Tmicro
  • 8/16-bit scalar data path
  • 128-bit SIMD vector data path
  • 3-stage exposed pipeline
  • Register files:
    • 8x16-bit scalar register file
    • 4x128-bit vector register file
  • 16/32/48-bit instructions
  • Shared scalar/vector data memory with unaligned 128-bit vector access
  • Dedicated coefficient memory with 128-bit vector access
  • 2 AGUs with post-modify addressing modes
  • 16-lane SIMD vector ALU with specialized vector absolute-difference instructions
  • Zero-overhead loop support:
    • 3-level do-loop

Back to example models overview

Tgauss

Accelerator for Gaussian image filtering

  • Based on TLX
  • 16/32-bit scalar data path
  • 48-bit SIMD vector data path
  • 4-stage exposed pipeline
  • Register files:
    • 16x32-bit scalar register file, split into separately accessible 16-bit low/high parts
    • Two distributed 10x48-bit vector register files with cyclic buffer access
    • 16x5-bit pointer register file for cyclic buffering
  • 32-bit instructions
  • 32-bit scalar data memory
  • Separate vector memories for the input/output image
  • Separate vector memory for line buffers
  • 2x24-bit vector data path (2 RGB pixels) with 6-lane bytewise multiply/accumulate unit
  • Additional features:
    • 32x32->32-bit pipelined multiplier (2 cycles)
    • 32-bit sequential divider performing 3 iterations in parallel
    • Zero-overhead loop support:
      • 3-level do-loop
    • OCD support

Back to example models overview

Tcom8

SIMD vector processor for communication kernels, supporting complex-type operations

  • Based on Tmicro
  • 16/32-bit scalar data path
  • 128-bit SIMD vector data path
  • 4-stage exposed pipeline
  • Register files:
    • 8x16-bit scalar register file,
    • 4x128-bit vector register file
    • 4x320-bit partitioned vector accumulator register file
    • 4x16-bit pointer register file
    • 4x16-bit modifier register file
  • 32-bit instruction width
  • 2-way static ILP for scalar/vector instructions
  • 4-way static ILP for custom FFT instructions
  • Dual-port vector memory and separate single-port vector coefficient memory
  • 3 AGUs supporting cyclic, bit-reverse and specialized next-butterfly addressing modes
  • 8x40-bit/4x80-bit shared vector ALU supporting 8-lane SIMD fixed-point or 4-lane SIMD complex-fixed-point operations:
    • Vector shift unit
    • Vector multiply/MAC
    • Vector butterfly (complex only)
  • Additional features:
    • 16x16->32-bit multiplier
    • 16-bit serial divider
    • Zero-overhead loop support:
      • 3-level do-loop
    • Interrupt support
    • OCD support

Back to example models overview

MXcore

Scalar accelerator for block matrix inversion

  • 32-bit integer/floating point data path with 2x32-bit complex number support
  • 3-stage exposed pipeline
  • Register files:
    • 8x32-bit data register file,
    • 4x16-bit pointer register file
    • 4x16-bit modifier register file
  • 16/32-bit instruction width
  • 2-way static ILP
    (arithmetic || memory/move/control)
  • Single AGU with post-modify addressing modes
  • 32-bit integer ALU
  • 32-bit floating-point ALU
  • Additional features:
    • 32x32->64-bit integer multiplier
    • 32-bit floating point multiplier
    • 32-bit sequential divider (int and float) performing 3 iterations in parallel
    • Zero-overhead loop support:
      • 2-level do-loop
    • OCD support

Back to example models overview

FFTcore

Scalar FFT accelerator

(minimal core optimized for FFT application kernel, without support for C built-in types or arbitrary C code)

  • 48-bit complex fixed-point data path
  • 3-stage exposed pipeline
  • Two distributed register files of 4x48-bit each
  • 20-bit instruction width
  • Up to 5-way ILP for FFT inner loop
    • (load || store || coef_load || mul || butterfly)
  • 2 data memories, 1 separate coefficient memory
  • 3 specialized AGUs with post-modify, circular, and custom butterfly addressing modes
  • 48-bit ALU with complex multiply and butterfly
  • Zero-overhead loop support:
    • 2-level do-loop

Back to example models overview

MMSE
Accelerator for 5G New Radio MMSE equalization using Cholesky decomposition, with FLX processor as scalar base

Features on top of FLX:

  • 64-bit complex floating-point data path
  • N-lane SIMD complex vector processing unit with design-time configurable vector size / number of lanes using N as parameter
  • 8 x (Nx64-bit) vector register file
  • 64-bit instruction width
  • 4-way ILP:
    • (scalar/memory || move || vector complex mul || vector complex add/sum)
  • Vector load/store with various application-specific post modify addressing modes tuned for efficient access of matrix elements during Cholesky decomposition
  • Balanced data path to maximize memory bandwidth utilization

Back to example models overview

LPDC

Accelerator for 5G Low Density Parity Check decoding, by extension of a RISC-V scalar core (Trv32p5x)

Features on top of Trv32p5x:

  • 128-lane SIMD vector processing unit (128 x 8-bit) with specialized instructions for variable rotation, addsub, minimum detection and element selection
  • 8 x 1024-bit vector register file
  • 64-bit instruction width
  • Up to 4-way ILP:
    • Split arith/ctrl and move/ldst slots of original Trv32p5x to support scalar ldst || vector ldst || dual vector arith
  • Dedicated 1024-bit wide vector memory

Back to example models overview

SHA256

Accelerator for SHA256 hashing by extension of a RISC-V scalar core

  • Based on Trv32p3 (RV32IM ISA), reduced by unused features:
    • No hardware multiplier
    • No hardware divider
    • No OCD support
    • Reduced register file (see below)
  • 3-stage protected pipeline:
    • Bypasses & HW stalls
  • 16x32-bit register file
  • Dedicated SHA256-step instructions
  • Separate data memory for K-table
  • 2 AGUs with post-modify addressing modes
  • 32-bit instruction width
  •  3-way ILP for critical loop instructions:
    • (SHA-step || load data || load K-table)
  • Zero-overhead loop support:
    • 1-level zloop

Back to example models overview

Tsec

Accelerator for the Kyber key encapsulation mechanism (post-quantum cryptography) by extension of a RISC-V scalar core (Trv32p5x)

Features on top of Trv32p5x:

  • Support for SHA3 hashing
    • 25x64-bit dedicated state register file
    • 64-bit load/store instructions
    • Keccak hash unit with dedicated instruction
  • Dedicated instructions for Kyber modulo-based operations
    • 2-way packed SIMD
    • Montgomery and Barret reduction
    • NTT butterfly

Back to example models overview

Tmoby
AI accelerator for MobileNet Convolutional Neural Network, with Trv32p3 RISC-V processor (RV32IM ISA) as scalar base

Features on top of Trv32p3:

  • Additional, 4th pipeline stage used for vector memory load only
  • 64-lane SIMD vector processing unit (64x8-bit data path)
    • Including 64-lane vector mac unit (8x8->32-bits)
  • Many distributed register files, including:
    • 4x64-bit vector register file
    • 3x512-bit matrix register file
    • 3x2048-bit vector accumulator register file
  • Separate vector memories:
    • Vector feature memory with vector addressing
    • Vector weight memory with scalar addressing
  • 90-bit instruction width
  • 4-way ILP:
    • (scalar || vector arithmetic || vector feature memory || vector weight memory)
  • Zero-overhead loop support:
    • o   3-level do-loop
    • o   1-level zloop

Back to example models overview

smarT
Accelerator for medium-throughput CNN applications, based on RISC-V scalar core (Trv32p5x)

Features on top of Trv32p5x:

  • Dual convolutional unit with 16-lane SIMD 8-bit multipliers per unit
  • 4x32-bit vector data path including:
    • Vector shift-round-sat unit
  • Additional registers:
    • Quad-access (4x32-bit) to existing central register file
    • Two accumulator register files (4x32-bit)
    • Vector address register file (4x32-bit)
  • 128-bit memory interface with 4 banks of 32-bit and vector addressing
  • Small local memory
  • Low-overhead DMA
  • Proof-of-concept support of TensorFlow Light for Microcontrollers (TFLM)

Back to example models overview