Understanding the CW305 aes128 example project hardware interface

✍️ → Written on 2021-04-05 in 2028 words. Part of cs IT-security

Motivation

In a current research project, I am evaluating the side channel security of a hardware module. We use the CW305 board by NewAE Technology (in detail, the NAE-CW305-04-7A100-0.10-X). But how do you communicate with the on-board Artix-7 FPGA? You cannot program the FPGA from Vivado directly via USB, since the on-board Atmel ATSAM3U4E microcontroller acts as USB interface. The same applies to communication. NewAE provides examples for their software and hardware stack. IMHO their work is quite nice and thorough. However, their stack does not work for me. I want to use some PicoScope 5203 as oscilloscope (for improved accuracy) and I don’t consider some symmetric AES cipher, but asymmetric post-quantum cryptography KEMs. How can we get there?

For reference, I get into the technical details of the interface between the Atmel and the FPGA. If you just want to run the example, the NAE whitepaper should be your starting point first.

Setup

Physical side

CW305 setup with USB connector for power and data, JTAG connector for programming and two probes connected with pins
Figure 1. CW305 board with connections
  • In the shown configuration (based on the LEDs) the FPGA is unconfigured

  • USB connector is used for power and communication with the microcontroller

  • JTAG with the programmer at the bottom is used to program the FPGA (I didn’t try SPI flash)

  • PicoScope probes don’t fit into the 20-pin connector slots. Use conducting material to connect them. Office equipment (e.g. paper clips) fits this purpose

  • I placed two probes at the right to fetch signals from pins TRIG and A12. The latter was added for educational purposes

Hardware side (Vivado)

CW305 example project aes128 module instantiations namely cw305_top cw305_usb_reg_fe cw305_reg_aes cdc_pulse clocks aes_core aes_ks and aes_sbox
Figure 2. Verilog module instantiations in the aes128 project
  • The top module instantiates 4 modules. The cdc_pulse module is very generic and allows to transfer the pulse provided in input signal src_pulse within the clock domain src_clk to the clock domain dst_clk. The output is the dst_pulse signal. The reset_i signal simply shuts down the communication between src and dst. And clocks.v just selects which clock shall be used (pll or the CW-internal one) (easiest to spot in line 61). The aes_core was written by Google Vault.

  • You can find the xpr file for Vivado in the repository

  • Sadly, I don’t have the Integrated Logic Analyzers and thus had to undefine ILA_REG and ILA_CRYPTO. For debugging, I have 2 channels on the PicoScope and 3 user LEDs (LED5, LED6, LED7) available.

  • I don’t use the USB block interface (i.e. USE_BLOCK_INTERFACE is undefined)

  • If you programmed the FPGA, LED5 flashes every approx. 800ms (1.25 Hz) as it is attached to crypt_clk_heartbeat bit 22. This serves as sanity check

Software side (python)

API of the CW305 module
CW305.usb_trigger_toggle() -> None
CW305.is_done() -> bool
CW305.vccint_get() -> float
CW305.get_fpga_buildtime() -> str

CW305.fpga_write(addr: int, data: list) -> Any
CW305.fpga_read(addr: int, readlen: int) -> list
CW305.loadEncryptionKey(key: list) -> None
CW305.loadInput(inputtext: list) -> None
CW305.readOutput() -> list
CW305.simpleserial_read(cmd: str) -> list
CW305.simpleserial_write(cmd: str, data: list) -> None
CW305.set_key(key: list, ack=False, timeout=250) -> None
  • The API results from the python library included in the chipwhisperer repository (software folder). A stable release of the library can be installed via pypi (i.e. pip3 install chipwhisperer).

  • The first section of this non-exhaustive API list gives metadata about the FPGA/bitstream

  • As you can see, the high-level API is inherently built for symmetric crypto, but it is mixed with lower-level primitives like fpga_write

  • Apparently the interface is meant to write and read at certain addresses (parameter addr in the interface)

Interface between Atmel microcontroller and Artix-7 FPGA

On the Verilog side (cw305_top), we have the following interface [excluding the block interface]:

module cw305_top #(
  parameter pBYTECNT_SIZE = 7,
  parameter pADDR_WIDTH = 21,
  parameter pPT_WIDTH = 128,
  parameter pCT_WIDTH = 128,
  parameter pKEY_WIDTH = 128
)(
  // USB Interface
  input wire                   usb_clk,      // Clock
  inout wire [7:0]             usb_data,     // Data for write/read
  input wire [pADDR_WIDTH-1:0] usb_addr,     // Address
  input wire                   usb_rdn,      // !RD, low when addr valid for read
  input wire                   usb_wrn,      // !WR, low when data+addr valid for write
  input wire                   usb_cen,      // !CE, active low chip enable
  input wire                   usb_trigger,  // High when trigger requested

  // Buttons/LEDs on Board
  input wire                   j16_sel,      // DIP switch J16
  input wire                   k16_sel,      // DIP switch K16
  input wire                   k15_sel,      // DIP switch K15
  input wire                   l14_sel,      // DIP Switch L14
  input wire                   pushbutton,   // Pushbutton SW4, connected to R1, used here as reset
  output wire                  led1,         // red LED
  output wire                  led2,         // green LED
  output wire                  led3,         // blue LED

  // PLL
  input wire                   pll_clk1,     // PLL Clock Channel #1
  //input wire                 pll_clk2,     // PLL Clock Channel #2 (unused in this example)

  // 20-Pin Connector Stuff
  output wire                  tio_trigger,
  output wire                  tio_clkout,
  input  wire                  tio_clkin
);

Now, the question is how does this Verilog interface correspond to the python interface?

Adding another pin to the interface

I briefly want to add the following modification: We mentioned pin A12 above. We want to add this additional pin to the interface.

Lines for the XCD file to add pin A12
set_property DRIVE 8 [get_ports extra_out]
set_property PACKAGE_PIN A12 [get_ports extra_out]

This enables A12 to appear as extra_out in the Verilog interface. Thus, we can define the additional cw305_top output signal extra_out:

output wire extra_out,

Investigation

The USB trigger signal

Looking at the interfaces, it should be trivial to spot the connection between the API method usb_trigger_toggle and hardware signal usb_trigger. So we write a small program to run toggle the signal in an infinite loop. Be aware, that I sadly have to run this program to accept the USB peripheral. Otherwise an OSError: Unable to communicate with found ChipWhisperer is thrown.

#!/usr/bin/env python3

import time
import chipwhisperer as cw
import random

# https://github.com/newaetech/chipwhisperer/blob/155a7e24acec8556c2b560d5df33eb537aebb44f/software/chipwhisperer/capture/targets/CW305.py#L61
target = cw.target(None, cw.targets.CW305, fpga_id='100t', force=False)
target.vccint_set(1.0) # default

# we only need PLL1:
target.pll.pll_enable_set(True)
target.pll.pll_outenable_set(False, 0)
target.pll.pll_outenable_set(True, 1)
target.pll.pll_outenable_set(False, 2)

# run at 10 MHz:
target.pll.pll_outfreq_set(10E6, 1)

# 1ms is plenty of idling time
target.clkusbautooff = True
target.clksleeptime = 1

while True:
    target.usb_trigger_toggle()
    time.sleep(0.1)

Apparently, usb_trigger is set back to LOW right away.

USB trigger signal toggling every 0.1 second visualized in the PicoScope GUI
Figure 3. USB trigger signal triggering every 0.1 second visualized in the PicoScope GUI

Intercepting writes and reads

Interpreted languages like python allow to debug runtime code more easily than compiled languages. I modified the installed source code of the chipwhisperer library .

Understanding addresses

  • aes_clk gives one posedge per 100ns, thus we have a frequency of 10 MHz in my default configuration of the board

usb_trigger

What is cw305_usb_reg_fe? What is cw305_reg_aes?

// Interface to cw305_usb_reg_fe:
   input  wire                                  usb_clk,         // clock used by the USB interface (i.e. Atmel microcontroller)
   input  wire                                  crypto_clk,      // clock used by the crypto core
   input  wire                                  reset_i,         // is true if the push button is pressed
   input  wire [pADDR_WIDTH-pBYTECNT_SIZE-1:0]  reg_address,     // Address of register
   input  wire [pBYTECNT_SIZE-1:0]              reg_bytecnt,     // Current byte count
   output reg  [7:0]                            read_data,       // data read from the address specified by reg_address
   input  wire [7:0]                            write_data,      // data to be written
   input  wire                                  reg_read,        // Read flag. One clock cycle AFTER this flag is high
                                                                 // valid data must be present on the read_data bus
   input  wire                                  reg_write,       // Write flag. When high on rising edge valid data is
                                                                 // present on write_data
   input  wire                                  reg_addrvalid,   // Address valid flag, must be HIGH to enable reg_address to be read

// from top:
   input  wire                                  exttrigger_in,

// register inputs:
   input  wire [pPT_WIDTH-1:0]                  I_textout,
   input  wire [pCT_WIDTH-1:0]                  I_cipherout,
   input  wire                                  I_ready,  /* Crypto core ready. Tie to '1' if not used. */
   input  wire                                  I_done,   /* Crypto done. Can be high for one crypto_clk cycle or longer. */
   input  wire                                  I_busy,   /* Crypto busy. */

// register outputs:
   output reg  [4:0]                            O_clksettings,
   output reg                                   O_user_led,
   output wire [pKEY_WIDTH-1:0]                 O_key,
   output wire [pPT_WIDTH-1:0]                  O_textin,
   output wire [pCT_WIDTH-1:0]                  O_cipherin,
   output wire                                  O_start   /* High for one crypto_clk cycle, indicates text ready. */
module cw305_top #(
    parameter pBYTECNT_SIZE = 7,
    parameter pADDR_WIDTH = 21,
    parameter pPT_WIDTH = 128,
    parameter pCT_WIDTH = 128,
    parameter pKEY_WIDTH = 128
)
module cw305_top #(
    parameter pBYTECNT_SIZE = 7,
    parameter pADDR_WIDTH = 21,
    parameter pPT_WIDTH = 128,
    parameter pCT_WIDTH = 128,
    parameter pKEY_WIDTH = 128
)(
    // USB Interface
    input wire                          usb_clk,        // Clock
    inout wire [7:0]                    usb_data,       // Data for write/read
    input wire [pADDR_WIDTH-1:0]        usb_addr,       // Address
    input wire                          usb_rdn,        // !RD, low when addr valid for read
    input wire                          usb_wrn,        // !WR, low when data+addr valid for write
    input wire                          usb_cen,        // !CE, active low chip enable
    input wire                          usb_trigger,    // High when trigger requested

    // Buttons/LEDs on Board
    input wire                          j16_sel,        // DIP switch J16
    input wire                          k16_sel,        // DIP switch K16
    input wire                          k15_sel,        // DIP switch K15
    input wire                          l14_sel,        // DIP Switch L14
    input wire                          pushbutton,     // Pushbutton SW4, connected to R1, used here as reset
    output wire                         led1,           // red LED
    output wire                         led2,           // green LED
    output wire                         led3,           // blue LED

    // PLL
    input wire                          pll_clk1,       //PLL Clock Channel #1
    //input wire                        pll_clk2,       //PLL Clock Channel #2 (unused in this example)

    // 20-Pin Connector Stuff
    output wire                         tio_trigger,
    output wire                         tio_clkout,
    input  wire                         tio_clkin

    // Block Interface to Crypto Core
`ifdef USE_BLOCK_INTERFACE
   ,output wire                         crypto_clk,
    output wire                         crypto_rst,
    output wire [pPT_WIDTH-1:0]         crypto_textout,
    output wire [pKEY_WIDTH-1:0]        crypto_keyout,
    input  wire [pCT_WIDTH-1:0]         crypto_cipherin,
    output wire                         crypto_start,
    input wire                          crypto_ready,
    input wire                          crypto_done,
    input wire                          crypto_busy,
    input wire                          crypto_idle
`endif
    );

Adding an additional pin

######## 20-Pin Connector
set_property DRIVE 8 [get_ports extra_out]
set_property PACKAGE_PIN A12 [get_ports extra_out]
core saber_core (
    .clk             (core_clk),
    .load_i          (core_load),
    .pubkey_i        ({core_pubkey, 128'h0}),
    .seckey_i        ({core_seckey, 128'h0}),
    .data_i          (core_pt),
    .size_i          (2'd0), //AES128
    .dec_i           (1'b0),//enc mode
    .data_o          (core_ct),
    .busy_o          (core_busy)
);