米6体育平台手机版

SPRUIG3C January 2018 – August 2019 TDA4VM , TDA4VM-Q1

6.1 Overview

The collective design goal of the C7x ISA and the migration tool is to achieve cycle parity with VCOP on Kernel-C kernels when translated to C7x. The actual performance of a given kernel depends on a variety of factors:

VCOP features used by the kernel
how well the C7x ISA specifically covers those features
the efficiency of the translation generated by the migration tool and virtual machine
the ability of the C7x C compiler to generate efficient code from the translated code (including the virtual machine)
outer loop code limiting the use of NLC

Of these, only the ISA itself is relatively constant.

The most significant performance issues arise from the use of LHT (lookup and histogram) operations due to the overhead of copying the table into and out of L1D, the use of OFFSET_NP1 and PDDA parallel scattering stores, and collating stores which are not well-supported on C7x.

Having said that, for many kernels the goal of cycle parity is already achieved with current tools. We have established the following general expectations:

On average, kernels will execute on C7x with a cycles-to-cycles efficiency of about 0.7x vs VCOP.
For kernels that can use 16-way SIMD, this doubles to about 1.4x.