SPRUIG3C January 2018 – August 2019 TDA4VM , TDA4VM-Q1
Like rounding, a VCOP store that includes saturation is translated as if were two operations: an explicit saturation operation, followed by the store. The saturation operation operates on 32-bit elements regardless of the data type of the store. VCOP supports several forms of saturation: SYMM, ASYMM, 4PARAM, and so on. Fundamentally, all of the forms operate as follows:
saturate(min, minset, max, maxset) = (x < min) ? minset
: (x > max) ? maxset : x;
In the typical case when the saturation bounds are
the same as the min/max set-to values, that is min == minset
and
max == maxset
, the C7x translation is an efficient two
instruction sequence:
VMINW Vsrc,Vmax,Vdst
VMAXW Vdst,Vmin,Vdst
If saturation bounds are to a power of 2 boundary, such as from 0 to 255, a single saturation instruction is used:
VGSATUW Vsrc,Cwidth,Vdst
If saturation bounds are different from the min/max set-to values, a less efficient 4-instruction sequence is required:
VCMPGTW Vmin,Vsrc,Pred0
VSEL Pred0,Vminset,Vsrc
VCMPGTW Vmax,Vsrc,Pred1
VSEL Pred1,Vsrc,Vmaxset
For unsigned vectors (see Section 1.5), unsigned forms of the compares are used.
C7x has a dedicated instruction for saturating to the range of a signed 16-bit value: VSATWH. Therefore the following statement:
__vptr_s16 out;
out[Agen] = Vsrc.saturate(); // saturates to (-32768, 32767)
translates to a single instruction.
VCC removes saturations that it determines to have no effect.