SPRUIG3C January 2018 – August 2019 TDA4VM , TDA4VM-Q1
VCOP has 8 lanes of 40 bits each. When mapped to 32-bit lanes on C7x, there are 16 lanes available, potentially doubling the throughput of a kernel.
Many kernels are written to be independent of the
SIMD width, using the macro VCOP_SIMD_WIDTH
to abstract the number
of lanes. Some of these kernels can be successfully built in host emulation mode for
wider (or narrower) machines simply by changing the value of the macro. In host
emulation mode, VCOP_SIMD_WIDTH
must now be defined on the command
line or before inclusion of vcop_host_emulation.h.
The SIMD width used by VCC is controlled by the
--vcop_simd
option. (Kernels that qualify for SIMD 16 are NOT
automatically detected or transformed.) For a SIMD width of 16,
--vcop_simd=16
should be used. This option controls the
translation sequence calls to the VM. As an additional change to allow this option,
the generated C source file will also define VCOP_SIMD_WIDTH
.
Some kernels depend on a specific SIMD width and will not work correctly if extended to 16-way SIMD. Furthermore, increasing the SIMD factor may depend on certain properties of the data layout in memory. For example, image widths may be required to be multiples of 16 instead of 8. It is not possible for the migration tool to automatically detect these cases.
The following are examples of VCOP operations that cannot be trivially extended to 16-way SIMD.
src1[0]
) is assumed to be 8 bits