SPRUIG3C January 2018 – August 2019 TDA4VM , TDA4VM-Q1
The copy-in operation is responsible for copying table data from its “permanent” location in L2 into L1D so that an LHT operation can be performed. The source table in L2 is in VCOP layout; the destination table in L1D is in C7x layout.
This operation is performed by the
LHT_copy_in::copy_table_in()
method of the virtual machine. A pointer to the table in L2 and its size in bytes are passed as parameters. The size is rounded up to a multiple of 128 bytes (1024 bits), which is the line size of banked tables on C7x.
The table is read and written one 1024-bit line at a time. Each line is read from L2 as a pair of 512-bit vectors using the Streaming Engine. The pair of vectors, containing four VCOP lines, are rearranged using two VPERM instructions. Then, they are written into L1D using a LUTINIT instruction.
Thus each 1024-bit chunk requires 2 SE-based vector loads, two VPERMs, and two LUTINITs. The resultant loop pipelines at an ii (initiation interval) of 2. The throughput is 512 bits per cycle.
In order to use LUTINIT to populate a table in
L1D, the table must be configured as one parallel table, allowing the lanes of the
payload vectors to be written into the table in linear fashion. This is regardless
of how the table is configured for the LHT operation itself. So there is an
independent LTCR configuration that applies only to the copy-in operation. This
configuration is computed during the init()
function by a call to
the copy_in_config()
method, and stored in the tvals structure.
Similarly, the SE configuration used for the
copy-in operation is computed during init()
by the
copy_in_SE_config()
method and stored in the tvals
structure.