SPRUIG3C January 2018 – August 2019 TDA4VM , TDA4VM-Q1
To facilitate evaluation of the performance of
translated kernels running on C7x compared to native execution on VCOP, an
automatic profiling mechanism has been added to both the Kernel-C compiler (
vcc-arp32
) and the migration tool (
vcc7x
). It measures the execution cycles of both the
init()
function and the vloops()
function
on both targets so they can be compared.
The profiling mechanism relies on the built-in TSC cycle counter available on both EVE and C7x. The TSC is accessed through API calls in the RTS.
On EVE the kernel executes asynchronously. That
is, the vloops()
function only dispatches the kernel loops to
VCOP but returns without waiting for them to complete. A subsequent call to
_vcop_vloop_done()
synchronizes ARP32 execution with the
completion of the kernel. EVE programmers can manage this synchronization
themselves by calling the init()
function, the
vloops()
function, and the
_vcop_vloop_done()
in the client code, possibly
interspersed with other operations like memory transfers.
Alternatively, programmers can use the
higher-level kernel()
function call which wraps these other
calls. The kernel()
function waits for the VCOP loop to
complete before returning. Automatic profiling is supported only when the kernel
is invoked through the higher-level kernel()
API. However,
users that use the lower-level calls can still use the profiling mechanisms
manually by inserting calls to the timer functions themselves.
The mechanism works as follows:
--profile
switch on the VCC command line.init()
and
vloops()
functions. Given a kernel named
mykernel
, the variables are named
mykernel_init_cycles
and
mykernel_vloops_cycles
.kernel()
function, VCC
wraps calls to the init()
and vloops()
functions with cycle counting via _tsc_gettime()
. In the
EVE case, the vloops cycle count is taken after
_vcop_vloop_done()
, so that the elapsed time includes
complete execution of the vloop command on VCOP. It stores the cycle counts
in the counter variables.kernel()
function, VCC inserts a call to a new API function
__vcop_log_kernel_profile()
that records the
accumulated cycle counts along with the kernel’s name. This function is
defined in the compiler’s runtime support library.main()
returns) the values are automatically printed.
For example:
kernel profiling results:
vcop_fir_2D_short: init cycles=207 vloops cycles=64
If the kernel is not invoked through the
high-level kernel()
API, users may still insert cycle counting
code manually and call __vcop_log_kernel_profile()
to register
the accumulated cycle counts.
On EVE, VCOP is clocked at twice the rate of ARP32. The reported cycles for EVE are ARP32 cycles, not VCOP cycles. Therefore to compare C7x cycles to VCOP cycles, the EVE cycle counts should be doubled.