Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 12/19/2022
Public
Document Table of Contents

11.1.2.5. Avalon Streaming Interface

The offline compiler expects the RTL module to support Avalon® streaming interface with readyLatency = 0, at both input and output.

As shown in Integration of an RTL Module into an Intel FPGA SDK for OpenCL Pipeline, the RTL module must have four ports:
  • ivalid and iready as the input Avalon® streaming interface
  • ovalid and oready as the output Avalon® streaming interface
The following figure illustrates the timing diagram for input data transfer with back pressure. For more information about Avalon® streaming interfaces, refer to the " Avalon Streaming Interfaces " section in Avalon Interface Specifications.
Figure 31. Timing Diagram for Input Data Transfer with Back Pressure

For an RTL module with a fixed latency, the output signals (ovalid and oready) can have constant high values, and the input ready signal (iready) can be ignored.

A stall-free RTL module might receive an invalid input signal (ivalid is low). In this case, the module ignores the input and produces invalid data on the output. For a stall-free RTL module without an internal state, it might be easier to propagate the invalid input through the module. However, for an RTL module with an internal state, you must handle an ivalid = 0 input carefully.

Example Timing Diagram of a Stall-free RTL Component

Consider the following example timing diagram of a stall-free RTL component:

Figure 32. Timing Diagram of a Stall-free RTL Component
where,
  • IS_STALL_FREE value = "yes"
  • IS_FIXED_LATENCY value = "yes"
  • EXPECTED_LATENCY value = "2"

Example Timing Diagram of a Non-stall-free RTL Component

Consider the following example timing diagram of a stallable RTL component:

Figure 33. Timing Diagram of a Stallable RTL Component
where,
  • IS_STALL_FREE value = "no"
  • IS_FIXED_LATENCY value = "no"
  • EXPECTED_LATENCY value = "4"

Performing Advanced Compiler Optimizations

Both ALLOW_MERGING and HAS_SIDE_EFFECTS parameters allow the offline compiler to perform advanced optimizations. Consider the following combinations to understand their impact completely:

Note: The combination you select depends on your design architecture. Allow the compiler to replicate the RTL block for multiple calls or vectorized code.
Table 12.   ALLOW_MERGING and HAS_SIDE_EFFECTS Parameters Combination and Their Impact
Combination Description

ALLOW_MERGING value = "no"

HAS_SIDE_EFFECTS value = "no"

Each call to an RTL library corresponds to one distinct instance in the hardware.

Calls might be optimized away by the compiler if deemed redundant or unnecessary. Calls might be vectorized, with multiple instances in the hardware created for a single RTL library call.

ALLOW_MERGING value = "no"

HAS_SIDE_EFFECTS value = "yes"

Each call to an RTL library corresponds to one distinct instance in hardware.

Calls are not optimized away by the compiler. The compiler errors out if the attribute num_simd_work_items is greater than 1 for the kernel calling the RTL library.

ALLOW_MERGING value = "yes"

HAS_SIDE_EFFECTS value = "no"

Multiple calls to an RTL library might be merged into one call, and hence correspond to one instance in the hardware.

Calls might be optimized away by the compiler if deemed redundant or unnecessary. Calls might be vectorized, with multiple instances in the hardware created for a single RTL library call.

ALLOW_MERGING value = "yes"

HAS_SIDE_EFFECTS value = "yes"

Multiple calls to an RTL library might be merged into one call, and hence correspond to one instance in hardware.

Calls are not optimized away by the compiler. The compiler errors out if the attribute num_simd_work_items is greater than 1 for the kernel calling the RTL library.