Current QSPI reads from external flash are not continuous.
This is due to the asynchronous nature of SPI_clk (used to drive shift register) and ocp_clk. Your design needs to wait for “word done” and “next command issue” from design before communicating with external flash.
This causes the design to wait for some clk cycles (typically 7-8 external spi_clk cycles) before receiving next “word” from external flash.
Due to this wait time, QSPI throughput comes down to an efficiency of 82% than actually supported by the external device.
QSPI XIP performance is comparable to 60 % of DDR performance when operating at 64 MHz.
There is a significant impact on the CPU QSPI XIP performance when there is concurrent EDMA copy from QSPI. Application developers should use ASYNC EDMA transfers with lower ACNT or BW limiter to balance the share of the CPU and DMA traffic to the CPU for the CPU traffic to not get starved. The application developer can choose the BW limit/ACNT based on the priority of the application image load versus the IPU CPU performance.