SPRUIG3C January 2018 – August 2019 TDA4VM , TDA4VM-Q1
Most of VCOP’s arithmetic operations translate directly to either one C7x instruction or a short sequence.
VCOP arithmetic operations generally operate on vectors of signed 40-bit elements. Some VCOP operations ignore the upper 8 bits and operate on 32-bit elements. C7x lacks direct support for 40-bit arithmetic. One option is to model each lane as a 64-bit element, but in practice most kernels rely on only 32 bits of precision. The migration tool models VCOP vectors as having 32-bit elements and translates them accordingly (Section 1.5).
Most arithmetic operations are sign-agnostic, but in some cases treating elements as unsigned rather than signed can correct translation errors arising from the loss of the guard (see Section 1.5). Therefore some translations have unsigned forms in addition to the default signed forms.
In cases where an operation is represented by a C operator, the migration tool generates the expression using the operator, and the C7x compiler generates the appropriate vector instruction. For example, for this VCOP Kernel-C statement:
Vdst = Vsrc1 + Vsrc2;
The migration tool simply copies the expression verbatim and the compiler generates a VADDW instruction.
In other cases where an operation translates directly to a C7x instruction, the compiler generates a C7x intrinsic to invoke the instruction. For example:
Vdst = min(Vsrc1, Vsrc2);
translates to
Vdst = __min(Vsrc1, Vsrc2);
which turns directly into a VMINW instruction.
If there is no direct translation to a single instruction, the migration tool relies on the virtual machine to provide the translation. Each operation that does not correspond to a C operator is implemented by a class in the virtual machine. Like the load and store classes, these are template classes so that variations can be specified at compile time by template parameters. For example, this VCOP Kernel-C statement:
Vdst = unpack(Vsrc1, Vsrc2);
translates to:
bit_unpack<int>::apply(Vdst, Vsrc1, Vsrc2);
The element type is specified by the
template parameter int
. All arithmetic classes have a single method
called apply()
that implements the translation. In this case the
apply()
method invokes a sequence of four C7x intrinsics.