SLAU723A October 2017 – October 2018 MSP432E401Y , MSP432E411Y
Figure 9-1 shows the AES block diagram. A single-core dual-interface architecture is used.
AES is an efficient implementation of the Rijndael cipher (the AES algorithm) and a 128-bit polynomial multiplication (referred to here as GHASH, as per the AES-GCM specification). Rijndael is a block cipher in which each data block is 128 bits. The polynomial multiplication multiplies two 128-bit vectors using the smallest 128-bit irreducible polynomial, represented by the following 128-bit string: {0120}||10000111. The two implementations are combined into the AES wide-bus engine.
Depending on the availability of context and data, the AES wide-bus engine is automatically triggered to process the data. The AES wide-bus engine is directly connected to the context and data registers so that it can immediately start processing when all data is available. The AES wide-bus engine also interfaces to the I/O control FSM/µDMA request interface.
AES comprises the following major functional blocks:
The AES wide-bus engine, which is the major top-level component, comprises the following functional blocks:
AES encryption requires a specific number of rounds, depending on the key length. The supported key lengths are 128, 192, and 256 bits, which require 10, 12, and 14 rounds, respectively, or 32, 38, and 44 clock cycles, respectively, because {number of clock cycles} = 2 + 3 × {number of rounds}.
The larger key lengths provide greater encryption strength at the expense of additional rounds, and therefore reduced throughput. The overall throughput of the AES executing polynomial multiplication is adjusted based on the overall cryptographic performance. The AES module contains one electronic codebook (ECB) core and a dedicated 32-cycle polynomial multiplication module for performing GHASH operations. Polynomial multiplication operates in parallel with the AES core, if data is available for both modules.
Depending on the key size (128, 192, or 256 bits), this core requires 32, 38, or 44 clock cycles to process one 128-bit data block. While one data block processes, the next block can be immediately preloaded. When a block is preloaded, the previous block must finish before additional data can be loaded. Therefore, once the pipeline is full, sequential data blocks can be passed every 32, 38, or 44 clock cycles.