SPRUIG3C January 2018 – August 2019 TDA4VM , TDA4VM-Q1
VCC may recognize transpose sequences that use OFFSET_NP1. The general pattern is to store into a scratch buffer using OFFSET_NP1, then read the scratch buffer back using NPT loads. Unfortunately there is no direct translation for the OFFSET_NP1 store. However, the streaming engine does support a transposed read mode. If transpose recognition is enabled, the migration tool may transform the sequence to use non-transposed stores instead of OFFSET_NP1 stores, and transposed loads using the streaming engine instead of normal vector loads. Thus, the transpose operation shifts from the store to the subsequent load. The layout of the data in the scratch buffer is altered with respect to its VCOP layout, so this transformation only works when the scratch buffer is used only for the transpose operation and not otherwise used.
Transpose detection and transformation may be enabled by:
--transpose
command line option.
This option enables automatic detection of
transpose sequences. This will apply transpose at
every possible point in the file. If a kernel that
should use transpose is in the same file as one
that shouldn’t, they should be separated into two
files.__tscratch
keyword to a
parameter. (For example, __tscratch
__vptr_uint32 scratch_buffer
) This method
of enabling transpose will take effect even if
--transpose
is not
specified.The transpose transformation may be performed under the following conditions:
The transpose transformation will correctly handle unrolled reads or unrolled writes and transform them as a set. In addition, the transpose transformation will correctly handle a transpose scratch buffer that has been split such that the one portion is used separately from another portion. However, the transpose transformation will not correctly handle a combination of unrolled reads/writes AND a transpose scratch buffer that has been split. (It becomes impossible for VCC to disambiguate the offset for the unroll from the offset for the split.)