SPRADD8 November   2024 F29H850TU , F29H859TU-Q1

 

  1.   1
  2.   Abstract
  3.   Trademarks
  4. 1Introduction to Real-Time Control
  5. 2C29 CPU and Key Features
    1. 2.1 Parallel Architecture and Compiler Entitlement
  6. 3C29 Performance Benchmarks
    1. 3.1 Signal Chain Benchmark with ACI Motor Control
    2. 3.2 Real-time Control and DSP Performance
      1. 3.2.1 Examples and Factors Contributing to Results
        1. 3.2.1.1 Saturation (or Limiting) Example
        2. 3.2.1.2 Dead Zone Example
        3. 3.2.1.3 Space Vector Generation (SVGEN) Example
        4. 3.2.1.4 Software Pipelining
      2. 3.2.2 Customer Control and Math Benchmarks
    3. 3.3 General Purpose Processing (GPP) Performance
      1. 3.3.1 Examples and Factors Contributing to Results
        1. 3.3.1.1 Discontinuity Management
        2. 3.3.1.2 Switch() Example
    4. 3.4 Model-Based Design Benchmarks
    5. 3.5 Application Benchmarks
      1. 3.5.1 Single Phase 7kW OBC Description
      2. 3.5.2 Vienna Rectifier-Based Three Phase Power Factor Correction
      3. 3.5.3 Single-Phase Inverter
      4. 3.5.4 Machine Learning
    6. 3.6 Flash Memory Efficiency
    7. 3.7 Code-size Efficiency
  7. 4Summary
  8. 5References

Discontinuity Management

Traditionally, Branch, Call, and Return operations incur overhead because of the instruction pipeline. The CPU fetches, decodes, and determines that a branch, call, or return operation needs to occur in the Decode-2 phase of the pipeline. By this time, the pipeline is filled with next instructions, which need to be flushed before the instruction at the discontinuity destination is fetched. Flushing of instructions results in overhead.

The C29 CPU has a 9-stage pipeline, with discontinuity decision occurring in the Decode-2 (D2) phase of the pipeline. Therefore, three instructions following a discontinuity instruction are already in the pipeline (the Fetch-1, Fetch-2, and Decode-1 phases of pipeline). In addition to regular branch, call, or return instructions, the C29 ISA supports delayed branch, call, or return instructions (the corresponding instruction has a trailing D, for example CALLD, RETD). When these delayed discontinuity instructions are used, three instructions immediately following them are always executed, regardless of whether the discontinuity occurs or not (in the case of a conditional branch). The three instructions following a delayed discontinuity instruction are referred to as delay slots. The C29 Compiler, when using the delay slot version of these instructions, inserts appropriate instructions into delay slots, thus reducing the discontinuity overhead from three cycles to effectively zero cycles.

Two examples illustrating the use of this by a compiler are shown below.

  • A function call where 6 function arguments are passed in three delay slots.
@CALLD  funcA         ; Call funcA
||LD.32 A4,@pointer1  ; Load A4 with pointer1 value from memory
LD.32   A5,@pointer2  ; Load A5 with pointer2 value from memory
||SUB.U16 A6,SP,#34   ; A6 points to value on stack offset -34
MV      A7,#ArrayB    ; Load A7 with address of ArrayB
||LD.32 D0,@variable1 ; Load D0 with Variable1 from memory
LD.32   D1,@variable2 ; Load D1 with Variable2 from memory
; Total Cycles = 4
  • A return with where the saved registers are restored and the stack is deallocated in three delay slots.
funcA: ADD.U16 SP,SP,#24      ; Allocate local stack space
       ST.64   *(SP-#24),XM2  ; Save XM2, XM4, XM6 registers on stack
       ST.64   *(SP-#16),XM4
       ST.64   *(SP-#8),XM6
       ... user code...
       RETD    *(SP-#32)     ; packet 1:Return and restore RPC from stack
       ||MV    M0,M3         ; Place return value in register M0
       LD.64   XM6,*(SP-#8)  ; packet 2:Restore XM6 from stack
       LD.64   XM4,*(SP-#16) ; packet 3:Restore XM4 from stack
       LD.64   XM2,*(SP-#24) ; packet 4:Restore XM2 from stack
       ||SUB.U16 SP,SP,#32   ; Deallocate local + return stack space
; Total Cycles = 4
Attention:

The above examples are models of how the C29 compiler uses delay slots. In practice, delay slots are used for more than just function argument passing and register restoration and stack deallocation. Delay slots often contain instructions for implementing the actual functionality of user code.