Intel® Arria® 10 Hard Processor System Technical Reference Manual

ID 683711
Date 8/28/2023
Public
Document Table of Contents

10.3.15. Accelerator Coherency Port

The ACP allows master peripherals—including FPGA-based master peripherals—to maintain data coherency with the Cortex*-A9 MPCore* processors and the SCU. Dedicated master peripherals in the HPS, and those built in FPGA logic, access the coherent memory through the ACP. Cacheable master accesses from the system interconnect are redirected to the ACP.

The ACP port allows one-way coherency. One-way coherency allows an external ACP master to see the coherent memory of the Cortex*-A9 processors but does not allow the Cortex*-A9 processors to see memory changes outside of the cache.

A master on the ACP port can read coherent memory directly from the L1 and L2 caches, but cannot write directly to the L1 cache. The possible ACP master read and write scenarios are as follows:
  • ACP master read with coherent data in the L1 cache: The ACP gets data directly from the L1 cache and the read is interleaved with a processor access to the L1 cache.
  • ACP master read with coherent data in L2 cache: The ACP request is queued in the SCU and the transaction is sent to the L2 cache.
  • ACP master read with coherent data not in L1 or L2 cache: Depending on the previous state of the data, the L1 or L2 cache may request the data from the L3 system interconnect. The ACP is stalled until data is available, however ACP support of multiple outstanding transactions minimizes bottlenecks.
  • ACP master write with coherent data in the L1 cache: The L1 cache data is marked as invalid in the Cortex*-A9 MPU and the line is evicted from the L1 cache and sent to L2 memory. The ACP write data is scheduled in the SCU and is eventually written to the L2 cache. Later, when the Cortex*-A9 processor accesses the same memory location, a cache miss in the L1 occurs.
  • ACP master write, with coherent data in the L2 cache: The ACP write is scheduled in the SCU and then is written into the L2 cache.
  • ACP master write, with coherent data not in L1 or L2 cache: ACP write is scheduled.

An ACP transaction is generated when a master on the system interconnect has its AxCACHE[1] attribute signal set to 1, which indicates a coherent request. All transactions that are set as cacheable are routed to the ACP instead of the normal mapping and are treated as coherent by the cache controllers of the MPU subsystem. The ACP ID generation follows an allocation mechanism that ensures that requests with the same master initiator flow and sequence identifiers always allocate the same local sequence ID and that requests with different identifiers always have different IDs. In addition, if an allocator runs out of sequence IDs, the ACP stalls requests until the reception of responses makes new sequence IDs available. Therefore, the guaranteed number of pending transactions that the interconnect can have is up to four pending transactions, with the allowed AXI ID[2:0] values of 0x4 through 0x7. AxID[2:0] values of 0x0 and 0x1 are assigned to CPU0 and CPU1, respectively.

Note: The entire 4 GB address space can be accessed coherently through the ACP.
Note: Avoid needing to coherently access back the HPS through ACP from the fabric in order to complete an access coming from HPS, as this may result in a deadlock situation.

Certain coherent access scenarios can create deadlock through the ACP and the CPU. However, you can avoid this type of deadlock with a simple access strategy.

In the following example, the CPU creates deadlock by initiating an access to the HPS through the FPGA fabric:
  1. The CPU initiates a device memory access to the FPGA fabric. The CPU pipeline must stall until this type of access is complete.
  2. Before the FPGA fabric state machine can respond to the device memory access, it must access the HPS coherently. It initiates a coherent access, which requires the ACP.
  3. The ACP must perform a cache maintenance operation before it can complete the access. However, the CPU’s pipeline stall prevents it from performing the cache maintenance operation. The system deadlocks.

You can implement the desired access without deadlock, by breaking it into smaller pieces. For example, you can initiate the operation with one access, then determine the operation status with a second access.

Note: Refer to the Arria 10 SoC device errata for details on Arm* errata that directly affect the ACP.