Breaking the Flash Encryption Feature of Espressif’s Parts

Posted on Mon 08 January 2024 in Projects

Breaking the Flash Encryption Feature of Espressif’s Parts

Introduction

I recently read the Unlimited Results: Breaking Firmware Encryption of ESP32-V3 (Abdellatif et al., 2023) paper.

This paper is about breaking the firmware encryption feature of the ESP32 SoC using a Side-Channel attack.

This was an interesting read, and soon, I wanted to try to reproduce these results with the following constraints:

To understand everything about this attack, I wanted to start from scratch, even if it meant sometimes reinventing the wheel.
I wanted to keep things low-cost. This means no five-figure digital oscilloscope could be used, as it’s sometimes the case for such attacks.

A few weeks later, not only have I been able to reproduce the paper’s results regarding the ESP32, but I have also:

Mounted a similar attack against the ESP32-C3 and ESP32-C6.
Mounted a secure boot bypass attack based on voltage glitches for both the ESP32-C3 and ESP32-C6.

The first article will detail the side-channel attacks, starting from the basics. A second one will focus on Secure Boot bypass techniques.

Side-Channel Attack Basics

This section will hopefully provide enough theoretical knowledge about side-channel attacks for the rest of the article to make sense. Of course, this topic is extremely broad, so I’ll only scratch the surface of many concepts. Anyone already familiar with the topic should feel free to skip this introduction.

General Idea

A side-channel attack exploits unintentional information leakage from a system to uncover secret values, typically encryption keys. The side channel can take various forms, including timing variations or the power consumption of a device. All the attacks presented in this article will focus on power consumption.

The general idea is to allow the target device to perform an operation involving the secret data we want to recover, along with attacker-supplied data. During this operation, the current consumed by the device is measured. A simple shunt resistor can be used to perform the measurement.

Simplified side-channel measurement setup

If the target device happens to be “leaky”, the measured current is influenced by the value of the manipulated secret to some extent.

To take advantage of this leakage, the following steps are necessary:

Assume a power consumption model: a mathematical representation connecting a specific secret value to a variation in power consumption.
Utilize some kind of mathematical tool to assess how closely a particular secret value aligns with the selected power consumption model.

The next sections of this article will introduce commonly used power consumption models as well as a mathematical tool, the Correlation Power Analysis method.

Power Consumption Models

Hamming Weight Model

Internally, our targets mostly operate on synchronous logic. This means that for each system clock cycle, internal registers are updated.

For each register, it is assumed that differences between the original and new values of each individual bit contribute to the total power consumption.

A common assumption is that it consumes more to change a value. Hence:

$0 \rightarrow 1$ increases the power consumption.
$1 \rightarrow 0$ increases the power consumption as well.
However, $0 \rightarrow 0$ and $1 \rightarrow 1$ don’t.

Using a more formal approach, it’s possible to express that the power consumption $H$ of a given register updating from a value $R'$ to a value $R$ is:

$H = HW(R' \oplus R)$

Where $HW$ is the Hamming Weight function, which simply counts the number of ones in the binary representation of a number.

Example: AES First Round Encryption

Now, how could this be applied to, for instance, a leaky hardware cryptographic encryption core?

Let’s consider the following situation:

An attacker can input arbitrary values into the encryption core.
The encryption core encrypts these values with a secret key. The algorithm used is assumed to be AES-128.

The following diagram summarizes the AES-128 algorithm.

AES-128 Encryption

The details of each operation can be found on the dedicated Wikipedia AES page. For this section, at least the operations $AddRoundKey$ and $SubBytes$ need to be understood.

If we assume a register to be connected just after the SubBytes operation, the value $R$ of this register can be defined as:

$\begin{aligned} R &= SubBytes(AddRoundKey(P)) \\ &= Sbox(P \oplus K_E) \end{aligned}$

Where:

$P$ is the attacker-controlled plaintext.
$K_E$ is the first round encryption key, corresponding to the AES-128 secret encryption key.
$Sbox$ : the standard Rijndael S-box lookup table.

Note that while $R$ is 128-bit long, it’s also possible to consider 8-bit slices and define:

$R_{i=0, \dots, 15} = Sbox(P[i] \oplus K_E[i])$

During this register update, the power consumption can be modeled as:

$H_{i=0, \dots, 15} = HW(R_i' \otimes R_i)$

Assuming the initial values $R_{i=0, \dots, 15}'$ to be zeros, we simply obtain:

$\begin{aligned} H_{i=0, \dots, 15} &= HW(R_i) \\ &= HW(Sbox(P[i] \oplus K_E[i])) \end{aligned}$

Having divided the state register $R$ into $R_{i=0, \dots, 15}$ slices can be used as a divide and conquer method.

For the attacker, the goal will be, for each byte of the secret key $K_E$ , to find the best matching $H_{candidate, i}$ defined as:

$H_{candidate, i} = HW(Sbox(P[i] \oplus guess))$

With $guess=0, \dots, 255$ .

Note

Choosing to attack the output of the Rijndael S-box lookup table is done on purpose. The input and output of these tables are not linearly related, making the difference between valid and “almost valid” guesses very noticeable.

What’s now missing is a mathematical tool that can be used to quantify how well a given guess is matching with the sampled power traces.

Correlation Power Analysis

Let’s consider that $n$ power consumption traces $T_n(t)$ have been sampled while processing the corresponding $P_n$ plaintext values. Moreover, a power consumption model $H_{candidate}$ has been chosen.

We unfortunately don’t precisely know when (i.e., for which $t$ value) the leakage occurs. Hence, for all possible $t$ and $guess$ values, we need to quantify the correlations between the arrays $X$ and $Y$ defined as:

$X_{i=0, \dots, n} = T_i(t)$
$Y_{i=0, \dots, n} = H_{candidate}(guess, P_i)$

The commonly used mathematical tool for the task is the Pearson correlation coefficient. Coefficients are computed for all $t$ and $guess$ values.

The largest of these coefficients will give:

The $t$ value at which the leakage occurs.
The valid guess.

The following animation summarizes the Correlation Power Analysis method.

Espressif’s Flash Encryption Feature

Overview

At the highest level, the external flash encryption feature implemented for the ESP32, ESP32-C3 and ESP32-C6 is essentially the same.

The external flash contains the entire firmware executed by the chip. It’s accessed via a QSPI bus.

Note

The components I targeted are all using external QSPI flash components. This means it’s reasonably easy for an attacker to directly tamper with the QSPI bus.

Espressif’s portfolio also contains SoC with internal flash components. This could make the methods presented here impractical. However, it is to be noted that some of these components still expose the internal flash’s pinout to the outside. It means there may still be a way to make things work. Nevertheless, that’s outside the scope of this article.

This feature can be enabled by configuring some of the device’s eFuses. A secret encryption key is burned into them.

Once activated, the content of the external flash content is transparently and automatically decrypted as soon as it is accessed. The decryption is performed by a dedicated hardware block.

For all devices, the AES cipher is used. However, the cipher mode of operation differs and will be detailed for each chip later in this article.

High-Level Attack Plan

Attacking the ESP32, ESP32-C3 and ESP32-C6 is a textbook side-channel attack where:

The secret to be recovered is obviously related to the AES key burned into the fuses.
The attacker can provide arbitrary inputs by tampering with the data read by the target chip over its QSPI bus.

Side-channel measurement setup, applied to Espressif's devices

It is to be noted that a physical flash component cannot be utilized. Such a memory indeed has:

A slow erase and write speed that would slow down the attack.
A limited number of erase cycles that would make the attack impracticable.

Instead, some kind of hardware should be used to emulate a flash component over the QSPI bus.

Hardware Platform

Schematics and other hardware-related documents are available on the dedicated hardware files page.

Goals

To perform a side-channel attack against the ESP32 family of devices, the following is needed:

The target needs to be supplied with arbitrary flash content through its QSPI bus.
The power consumption of the target needs to be monitored.
Timing and synchronization are critical. Full control over the target’s clock is needed.

Furthermore, as mentioned earlier, cost should be kept low.

Overview

Based on the previous goals and constraints, the following hardware platform, called the ESP CPA Board, has been designed.

The ESP CPA Board

This is a simple 4-layer board that can be assembled manually. Additionally, even for small order quantities, the total BOM cost barely exceeds $100.

Note

In the picture above, you may notice that this revision of the board has been reworked due to a minor layout mistake. However, the hardware files I’m sharing already have this problem fixed.

Small daughter-boards, called “cartridges” can be plugged to the ESP CPA Board.

A cartridge inserted into the board

As you may have guessed already, I’ve designed a “cartridge” for each of our targets, namely the ESP32, ESP32-C3 and ESP32-C6.

The cartridges

Block Diagram

The main functional blocks of the ESP CPA Board are detailed below.

Block Diagram of the ESP CPA Board

Key Features

Fake External Flash

The flash content provided to the DUT can either be fed from a physical flash component, placed on the cartridge or faked from the FPGA. This is achieved by letting the FPGA control the Chip Select line of the “real” flash.

External Flash Wiring

Such a feature is needed to easily and quickly supply arbitrary data to be decrypted by the DUT hardware crypto-core.

Current Measurement

The current consumed by the DUT can be sampled thanks to an ADC. A simple current shunt along with a variable gain amplifier and some antialiasing filtering is used.

To keep costs reasonable, the sampling rate of this acquisition chain is only $12MHz$ . This may sound too low, but that’s actually fine, considering it’s possible to under-clock Espressif’s parts.

Clock Control

The clock of the DUT is provided by the FPGA.

While Espressif’s components typically expect a crystal oscillator, I’ve found feeding a digital clock works as well.

Note

It seems the clock can be fed directly at LVCMOS levels for both the ESP32 and ESP32-C3. However, for the ESP32-C6, capacitively coupling this clock signal seems to be needed for everything to work as expected.

This allows for total control over this clock, and down-clocking the DUT becomes possible. As stated above, this is very useful given the low sampling rate of the current measurement acquisition chain. However, reducing the clock speed introduces the trade-off of extending the duration of our attacks. As I’ll explain later, certain targets required up to five days to be compromised.

Temperature Control

During my tests, I’ve found some targets, especially the ESP32, to be rather sensitive to temperature variation.

The above diagram, captured from a ESP32, demonstrates the phenomena. The temperature is displayed, alongside samples from various measurements, captured at the same time offset.

This variability could degrade the performance of the side-channel attack. That’s the reason, on the back of each cartridge, it’s possible to find:

A low-value resistor that acts as a heating element.
A temperature sensor placed just below the DUT.

Temperature control components

These two components are used to precisely control the temperature of the DUT, keeping it stable, slightly above ambient. Thanks to a simple PID controller, a value of $35^\circ C$ was maintained for all the results presented here.

This technique is inspired from crystal ovens.

Software Support

All the software components of this project are available from this repository.

Firmware & Gateware

Two programmable components are mounted on the ESP CPA Board.

The first one, a microcontroller, the Cypress EZ-USB FX2 (CY7C68013A-100AXC) is used to:

Sample and stream the data from the ADC.
Configure and communicate with the FPGA.

The firmware of this component is written in C and based on libfx2.

The second one is an FPGA, a small ICE5LP1K. This component is well-supported by an open-source toolchain, making it enjoyable to use.

Its gateware is entirely described in Python with Amaranth HDL.

Host Software

The host software is composed of several tools that can be used to:

Configure and control the ESP CPA Board.
Store the data sampled by the ESP CPA Board.
Analyze the sampled data with Correlation Power Analysis techniques.
Display the results of the analysis on a graph.

Most of these tools are written in Python, but some computationally intensive analysis sections are offloaded to a piece of Rust code.

Furthermore, the hardware can be configured to be compatible with Espressif’s development tools. It makes flashing a firmware on the DUT, or burning its security-related fuses, convenient.

Attacking the `ESP32`

Here, the results of Unlimited Results: Breaking Firmware Encryption of ESP32-V3 (Abdellatif et al., 2023) are successfully replicated thanks to the ESP CPA Board. As this is not something new, certain details may be glossed over, and I encourage you to read the paper for additional insights.

Encryption Method Overview

The encryption method used by the ESP32 is thoroughly explained in the official documentation.

What’s important to grasp is that:

The ESP32 flash encryption feature is built upon the AES-256 cipher.
While the flash content is semantically decrypted, the hardware itself performs AES encryption. According to Espressif, this is done for performance reasons.
The main encryption key is stored in the fuses.
Each 32-byte block of flash data is encrypted with a distinct key, derived from these fuses. This key is said to be tweaked with the flash address.

In the context of our side-channel attack, this tweak mechanism doesn’t add much value. For instance, recovering the tweaked key of the first 32-byte block is sufficient to compute the main key.

On the other hand, the AES-256 cipher requires attacking two AES rounds to recover the entire key. 128 bits per round can be recovered.

Experimental Results

Traces Capture

I started by burning the fuses of my target ESP32, mounted on a cartridge board, to enable the flash encryption feature with a known key. This can be done by following the official documentation.

Next, I performed the actual measurement campaign.

It means the DUT has been fed arbitrary data thanks to the Fake Flash feature of the ESP CPA Board, while its power consumption has been sampled.

Each measurement cycle is performed as follows:

The FPGA Fake Flash is configured with a given 128-bit long random ciphertext. These bits will be read by the DUT at boot. More precisely, the Boot ROM firmware of the DUT will pull this data from the flash.
The DUT reset line is released.
As soon as the FPGA can detect the arbitrary 128 bits have been read, the current measurement ADC is sampled for a couple of DUT clock cycles.

Hence, the current traces are captured at around the time the hardware crypto-core of the DUT is processing the provided data.

For better vertical resolution, the procedure is repeated 8 times for a given ciphertext. Samples are next averaged.

Traces corresponding to 600k unique ciphertexts have been recorded.

Traces Analysis

Some basic pre-processing methods are first applied to the measured traces. For instance, a Butterworth digital band-pass filter is used to denoise the signal.

Next, I used the same CPA attack described in the Unlimited Results: Breaking Firmware Encryption of ESP32-V3 (Abdellatif et al., 2023) paper.

I won’t go into many details as this is not something new, just note that the leakage model used for the first round is:

$H = HW(Sbox(P[i] \oplus guess))$

This is similar to the model I introduced in the first section of this article. Moving forward, for the second round, the leakage is defined as follows:

$H = HW(Sbox(state1[i] \oplus guess) \oplus Sbox(P[i] \oplus K_0[i]))$

With:

$HW$ : the Hamming Weight function.
$Sbox$ : the standard Rijndael S-box lookup table.
$i$ : the index of the index of key byte being attacked.
$P$ : the arbitrary ciphertext provided by the Fake Flash component of the ESP CPA Board.
$state1$ : the output of the first AES decryption round. This can be computed given the key recovered by attacking the first round.
$K_0$ : the guessed first round key.

Performing the attack results in both the first and second round keys being successfully recovered.

The following graphs, that will be widely used in the remaining sections of this article, display the ranking evolution for each guess.

Arranging the correlation values in descending order for each key byte enables the ranking of all guesses. A rank of zero for a known-good guess (i.e., the correct guess has the highest correlation value) means the attack converged toward a correct result.

Of course, such a graph can only be obtained when the key is known beforehand. It’s a tool that can only be used for evaluating a side-channel attack.

Alternatively, it’s also possible to take a look at the values of the correlation coefficients. The coefficients of the correct key bytes are rapidly increasing, demonstrating that the attack is a success.

Detailed results for this attack, as well as supplementary information, are available on the ESP32 Detailed Results page.

Conclusion

Thanks to the ESP CPA Board, I was able to reproduce results from the Unlimited Results: Breaking Firmware Encryption of ESP32-V3 (Abdellatif et al., 2023) paper at a significantly lower cost.

These initial experiments did not yield any new discoveries, but they did validate both the ESP CPA Board hardware and the related analysis software.

Attacking the `ESP32-C3`

Next target, the ESP32-C3.

After the publication of the Unlimited Results: Breaking Firmware Encryption of ESP32-V3 (Abdellatif et al., 2023) paper that inspired my research, Espressif published a security advisory: Security Advisory Concerning Breaking the Hardware AES Core and Firmware Encryption of ESP32 Chip Revision v3.0.

The document states:

For hardware Flash encryption of ESP32-S2, ESP32-S3, ESP32-C3, ESP32-C2, the encryption algorithm has been upgraded to a more complex XTS-AES scheme; it increases the difficulty and cost of mounting an SCA, and hence, reduces security risks.

Hence, I knew what to expect. Attacking the device was likely possible, but could be more complicated because of the use of the XTS mode.

Encryption Method Overview

Find below a block diagram for the decryption path of the XTS mode, as used by the ESP32-C3.

XTS-AES Decryption

Two AES encryption blocks are employed, each with a distinct key:

The first key, $K_T$ , will also be called the tweak key.
The second one, $K_E$ .

On the ESP32-C3, both $K_T$ and $K_E$ are 128-bit long and read from the fuses.

The $i$ value does depend on the flash address the data is pulled from. It is incremented every 0x80 bytes. $i$ is encrypted with $K_T$ to obtain a value, $T_0$ , that we’ll call the initial tweak value.

For each of the eight 128-bit blocks composing the 128 bytes where $i$ is constant, $T_n$ can be computed from $T_{n-1}$ based on the following relationship.

$T_n = T_{n-1} \otimes \alpha$

Here, the $\otimes$ operator corresponds to a Galois field multiplication and $\alpha$ to the number $2$ in such a field. I’m not a cryptographer and won’t attempt to explain why this is useful and the theoretical details behind it. For our purpose, what’s important to understand is that this operation can be reversed.

The tweak values are xor’ed at the input and output of the encryption block that uses the $K_E$ key.

Attack Plan

As expected, attacking such a cipher mode poses challenges. The most obvious one is that we don’t have a single key to guess, but two independent keys.

Recovering $K_T$ seems particularly challenging, given the limited control we have over the value of $i$ .

However, the approach I am about to present draws inspiration (albeit loosely, as I won’t be following the exact same procedure) from the paper Side-channel power analysis of XTS-AES (Luo et al., 2017) and can help us extract enough information to recover:

The value of $K_E$ .
The values of all $T_{n=0, \dots, 7}$ corresponding to a given $i$ .

This is sufficient to decrypt the 128-byte block associated with a specific $i$ .

Of course, this also implies that decrypting the entire flash content might require a successful CPA attack for every 128-byte blocks. My next article details how to overcome this limitation.

“Tweaked” Key Recovery

First, let’s define $K_{TE}$ as the tweaked key of the first decryption round of the $K_E$ block.

$K_{TE} = K_{R0} \oplus T_j$

Where:

$K_{R0}$ corresponds to the decryption key of the first AES decryption round. Hence, it’s the last key of the AES key schedule that starts with $K_E$ .
$T_j$ corresponds to the tweak value at the flash offset the CPA attack will be performed at.

Assuming a successful CPA attack, we can assume that recovering the value of $K_{TE}$ is possible.

However, this value alone isn’t enough to decrypt the flash content. Going further is needed.

Second round Key Recovery

With $K_{TE}$ known, it’s possible to attack, once more with CPA methods, the second round of the decryption block.

Obtaining the key of this next round directly isn’t possible. However, what’s possible is to recover $K_{R1}'$ defined as:

$K_{R1} = MixColumns(K_{R1}')$

Where:

$K_{R1}$ is the decryption key for the second decryption round.
$MixColumns$ is the standard AES column mixing operation.

See this article to understand why such a transformation is useful.

$K_E$ Recovery

Given $K_{R1}'$ , $K_{R1}$ can be obtained by applying the $MixColumns$ operation.

Next, reversing the AES key schedule provides access to all $K_{Rn}$ round keys. As an AES-128 block is used, the first key of the schedule, $K_{R10}$ , is equal to $K_E$ .

$T_n$ Recovery

Given:

The relation $K_{TE} = K_{R0} \oplus T_j$ .
The known value $K_{TE}$ .
The known value $K_{R0}$ (all the $K_{Rn}$ round keys have been recovered just above).

Using the following formula provides access to $T_j$ .

$T_j = K_{TE} \oplus K_{R0}$

Now, how to obtain all tweak values, $T_{n=0, \dots, 7}$ given $T_j$ ? Well, remembering the following relationship:

$T_n = T_{n-1} \otimes \alpha$

And the fact $\otimes$ can be reversed, computing all values becomes possible.

Leakage Model

First Decryption Round

Now that we understand the type of key-related information to retrieve, mounting a CPA attacks could look a lot like what has been done against the ESP32.

However, there’s an important difference to consider: the leakage model. The ESP32-C3 is actually using decryption, contrary to the ESP32 where, even though the flash was semantically decrypted, encryption blocks were used.

At first, ones could speculate that employing a model such as:

$H = HW(InvSbox(P[i] \oplus guess))$

instead of:

$H = HW(Sbox(P[i] \oplus guess))$

could work. It actually doesn’t.

This is probably due to the fact that the hardware implementation of the ESP32-C3 is reasonably optimized, and the output of the inverted $Sbox$ table doesn’t directly appear in a register.

Instead, I hypothesize the hardware implementation is based on T-Tables, an optimized AES computation method. See this great article for more details.

After some trial and error and based on this assumption, I came up with the following first decryption round power leakage model.

$\begin{align} v &= InvSbox(P[i] \oplus guess) \nonumber \\ H &= HW(v \otimes \gamma) + HW(v \otimes \delta) + HW(v \otimes \epsilon) + HW(v \otimes \zeta) \nonumber \end{align}$

Where:

$P$ is the supplied ciphertext.
$i$ is the index of the byte being guessed ( $i=0, \dots, 15$ ).
$guess$ are guesses at byte index $i$ for the previous defined $K_{TE}$ .
$\gamma$ is the number 9 in $GF(2^8)$ .
$\delta$ is the number 11 in $GF(2^8)$ .
$\epsilon$ is the number 13 in $GF(2^8)$ .
$\zeta$ is the number 14 in $GF(2^8)$ .

This may sound complex, but this simply translates the utilization of a pre-computed look-up-table, as defined in the previous linked article.

This model does work, but a simple “trick” does make it perform even better. The idea, borrowed from Tian et al. (2014), is to introduce some kind of non-linearity to the model.

Hence, let’s define:

$H'=H^\beta$

Where $\beta$ is a value that can be determined empirically. In the case of the ESP32-C3, $\beta=0.19$ gave me satisfying results.

Second Decryption Round

This model is similar to the one used for the first round. Of course, the results of this first round must be computed, so knowledge of $K_{TE}$ is assumed.

$\begin{align} state &= InvMixColumns(InvSbox(InvShiftRows(P \oplus K_{TE}))) \nonumber \\ v &= InvSbox(state[i] \oplus guess) \nonumber \\ H &= HW(v \otimes \gamma) + HW(v \otimes \delta) + HW(v \otimes \epsilon) + HW(v \otimes \zeta) \nonumber \\ H' &= H^\beta \nonumber \end{align}$

Where:

$P$ is the supplied ciphertext.
$i$ is the index of the byte being guessed ( $i=0, \dots, 15$ ).
$guess$ are guesses at byte index $i$ for the previous defined $K_{R1}'$ key ( $K_{R1} = MixColumns(K_{R1}')$ ).
$\gamma$ is the number 9 in $GF(2^8)$ .
$\delta$ is the number 11 in $GF(2^8)$ .
$\epsilon$ is the number 13 in $GF(2^8)$ .
$\zeta$ is the number 14 in $GF(2^8)$ .
$\beta$ is used to introduce some amount of non-linearity to the model. Here again, $\beta=0.19$ yields good results.

Experimental Results

Traces Capture

Capturing power consumption traces for the ESP32-C3 doesn’t differ much from doing it for the ESP32.

After configuring the target ESP32-C3 to enable flash encryption, more than 800k measurement cycles have been performed.

For some reason, I’ve found the ESP32-C3 to boot slower than the ESP32. Gathering these samples took around five days with the ESP CPA Board.

Traces Analysis

The analysis process doesn’t change much either. As explained above, the power consumption models need to be updated. However, the pre-processing and CPA methods are roughly identical.

The key ranks obtained for the first round are provided below.

All bytes, but 0, 4, and 6 can be successfully recovered.

This considerably reduces the search space already, and the remaining three bytes, which still achieved a low ranking, could be brute-forced.

Also, it’s reasonably clear from examining the evolution of all correlations which bytes have been correctly guessed, and which ones might require a bit of trial and error.

For instance, while the result is obvious for byte 8, we have two potential candidates for byte 0.

On the other hand, for the second round, all 16 bytes of the key can directly be recovered.

Given these results, the successful recovery of $K_E$ and all $T_{n=0, \dots, 7}$ associated with the attacked data block seems possible.

Detailed results for this attack, as well as supplementary information, are available on the ESP32-C3 Detailed Results page.

Conclusion

As implied by Espressif’s advisory, the ESP32-C3 is also vulnerable to side-channel attacks. However, the use of the XTS encryption mode does indeed make things more difficult. It seems a new attack and its associated measurements are required to crack each 128-byte block of the flash data.

That being said, my next article will demonstrate that having control over the first 128 bytes of the flash data is sufficient to execute arbitrary code, even when the Secure Boot feature of the system is enabled. This code could be used to extract the rest of the content.

Attacking the `ESP32-C6`

In its advisory, Espressif indicates:

We will incorporate hardware countermeasures in our future chips to address these vulnerabilities.

The ESP32-C6 is one of these “future” chips, and its datasheet indicates the following feature:

configurable Anti-DPA

Where DPA is a side-channel attack similar to the CPA method we have used so far.

I was curious to test my attacks against this new component, expecting to fail miserably.

Encryption Method Overview

The ESP32-C6 utilizes the same encryption method as implemented in the ESP32-C3. Nothing has changed, XTS-AES is still used.

However, according to the technical reference manual, several countermeasures against side-channel attacks have been put into place.

ESP32-C6 has a dual protection mechanism against Differential Power Analysis (DPA) attacks at the hardware level.
First, a mask mechanism is introduced in the symmetric encryption operation process, which interferes with the power consumption trajectory by masking the real data in the operation process. This security mechanism cannot be turned off.
Second, the clock selected for the operation will change dynamically in real time, blurring the power consumption trajectory during the operation. For this security mechanism, ESP32-C6 provides 4 security levels for users to choose to adapt to different applications.

The first countermeasure is called the masking countermeasure. The idea is to avoid directly manipulating “sensitive” data. Therefore, the values we relied upon in our previous attacks (such as the output of the $Sbox$ lookup table) probably aren’t directly handled by the chip. Instead, they could be XOR-ed with random values, called masks. Carrying out side-channel attacks against such countermeasures, while possible, seems significantly more complicated.

The second optional method detailed by Espressif is a hiding countermeasure. The idea is to randomize the clock of the crypto-core of the ESP32-C6. The following table is shared in the manual.

Security Level Name	Configuration Fuse Value	Crypto clock frequency
`SEC_DPA_OFF`	0	$40MHz$
`SEC_DPA_LOW`	1	$(20MHz, 40MHz]$
`SEC_DPA_MIDDLE`	2	$(33.3MHz, 40MHz]$
`SEC_DPA_HIGH`	3	$(10MHz, 40MHz]$

Hence, at a given timestamp, power traces won’t necessarily reflect the same operation. This can significantly diminish the effectiveness of a CPA attack, unless a method to re-synchronize traces can be developed.

To assess the effectiveness of these countermeasures, several tests have been performed, starting from the security level SEC_DPA_OFF and going all the way up to SEC_DPA_HIGH.

Experimental Results

Masking Countermeasure (`SEC_DPA_OFF`)

First, the target ESP32-C6 was configured with the flash encryption feature enabled, but with its security level set to SEC_DPA_OFF.

In this configuration:

The hiding feature that randomizes the crypto-core clock isn’t enabled.
However, the masking countermeasure is enabled. According to the manual, it can’t be deactivated anyway.

My goal was to mount a similar attack to the one I used against the ESP32-C3, and recover enough traces to decrypt a 128-byte block of data (i.e., I wanted to obtain the $K_E$ key along with the associated $T_{n=0, \dots, 7}$ tweaks).

This might be a bit disappointing, but I didn’t have to do much to bypass the new masking countermeasure.

In a somewhat naive manner, I initially used the same power consumption models as for the ESP32-C3, and surprisingly obtained decent results!

Therefore, the models are once again, for the first round:

$\begin{align} v &= InvSbox(P[i] \oplus guess) \nonumber \\ H &= HW(v \otimes \gamma) + HW(v \otimes \delta) + HW(v \otimes \epsilon) + HW(v \otimes \zeta) \nonumber \\ H' &= H^\beta \nonumber \end{align}$

And for the second:

I managed to improve the results by tweaking the value of the $\beta$ constant. While $\beta=0.19$ worked well for the ESP32-C3, $\beta=0.02$ suited the ESP32-C6 better.

As usual, the following ranking graphs for the first two rounds can be obtained.

Detailed results for this attack, as well as supplementary information, are available on the ESP32-C6 Detailed Results page.

I’m a bit puzzled by this initial success, and I’m not exactly sure why the countermeasure seems so inefficient. I suspect the random masks could be manipulated at the same time as the masked values. Such behavior is sometimes discussed in the literature (keyword: Zero-Offset 2DPA, see for instance Waddle and Wagner (2004)).

Hiding Countermeasure

Given the success I achieved in “bypassing” the masking countermeasure, I began studying the clock randomization.

Second, the clock selected for the operation will change dynamically in real time, blurring the power consumption trajectory during the operation. For this security mechanism, ESP32-C6 provides 4 security levels for users to choose to adapt to different applications.

Security Level Name	Configuration Fuse Value	Crypto clock frequency
`SEC_DPA_OFF`	0	$40MHz$
`SEC_DPA_LOW`	1	$(20MHz, 40MHz]$
`SEC_DPA_MIDDLE`	2	$(33.3MHz, 40MHz]$
`SEC_DPA_HIGH`	3	$(10MHz, 40MHz]$

According to the reference manual, it appears that the countermeasure causes the clock to randomly vary within the specified frequency ranges. Achieving this behavior with digital logic alone seems complex to me, unless a PLL circuit is employed.

Instead, I hypothesize that the clock of the crypto-core is simply randomly gated. The probability of this clock being disabled likely depends on the security level; the higher the level, the more often the clock is disabled. The resulting average frequencies could fall within the indicated ranges.

Possible hardware implementation of the clock randomization feature

If, during a particular system clock tick, a difference between the crypto-core being clocked and the crypto-core being gated can be observed, an attacker could accurately predict when leakages are likely to occur.

The following graph depicts, at a given timestamp, the distributions of the measured samples under two configurations:

SEC_DPA_OFF, the clock randomization isn’t enabled.
SEC_DPA_LOW, the clock randomization is enabled.

A clear difference can be observed between the two distributions. It seems reasonable to assume that, for the SEC_DPA_LOW security level, all samples below 120 units of current (i.e., when the system consumes less current than this threshold) correspond to instances when the crypto-core clock was gated.

With this understanding, it becomes possible to classify the traces into distinct groups, each corresponding to a similar pattern of the crypto-core clock.

Conducting a CPA attack for each of these groups is likely to yield strong results, as all traces within a group are synchronized.

The correlation values obtained from each attack can be combined (multiplied) to derive the final correlation results.

`SEC_DPA_LOW` Results

The target has been configured with the SEC_DPA_LOW security level, and once again, around 600k measurements cycles have been performed. For both of the targeted rounds, all 16 bytes can successfully be recovered.

Detailed results for this attack, as well as supplementary information, are available on the ESP32-C6 (SEC_DPA_LOW) Detailed Results page.

`SEC_DPA_HIGH` Results

Configuring the system with SEC_DPA_HIGH yields very similar results, and attacking the first two rounds enables the recovery of all bytes.

Detailed results for this attack, as well as supplementary information, are available on the ESP32-C6 (SEC_DPA_HIGH) Detailed Results page.

Conclusion

The countermeasures implemented to protect the ESP32-C6 against side-channel attacks don’t appear to be effective. The masking countermeasure doesn’t seem to have much impact, while the hiding countermeasure can be undermined by guessing the behavior of the crypto-clock.

Considering the above, I would say the ESP32-C6 does not look much more secure than the ESP32-C3.

With that said, the XTS mode of encryption forces once again to perform a new attack for every 128-byte block.

However, my next article will demonstrate that having control over a limited amount of flash data is sufficient to execute arbitrary code and extract the remaining content, even when the Secure Boot feature of the system is enabled.

Conclusion

Here, the publicly known side-channel attack against the ESP32 has been replicated using only inexpensive and custom hardware.

As suspected, the ESP32-C3 is also affected by a similar side-channel attack. However, the use of the XTS mode of encryption makes things more difficult for the attacker, and a given attack can only be employed to decrypt a specific 128-byte block at a time.

The ESP32-C6 supposedly embedded side-channel countermeasures in the form of:

A masking countermeasure.
A hiding countermeasure (clock randomization).

These countermeasures don’t appear to be effective, and the ESP32-C6 can be exploited similarly to the ESP32-C3, while still benefiting from the added security of the XTS mode of encryption.

However, the next article in this series will demonstrate that decrypting only a few of the XTS 128-byte blocks can be sufficient to extract the cleartext content of the entire external flash.

Disclosure Timeline

Espressif has been contacted to disclose the attack detailed in this article, along with the method disclosed in the next one.

The process outlined in the Espressif Security Incident Response Document has been followed.

07-Aug-2023: A first email was sent to Espressif.
28-Aug-2023: After several follow-up emails, Espressif started analyzing the issue.
15-Sept-2023 ~ 17-Nov-2023: Several technical exchanges took place. An ESP CPA Board unit and associated cartridges have been shipped to Espressif’s R&D center in Shanghai.
20-Nov-2023: The bug bounty team at Espressif deemed this research eligible for a reward, granting $2,229.
13-Dec-2023: Espressif shared a draft revision of a security advisory.
08-Jan-2024: Espressif published the final advisory.

Appendix

Leakage Assessment

Following a discussion with Espressif, a more thorough assessment of the leakages originating from the ESP32-C3 and ESP32-C6 has been performed.

The goal was to understand the actual impact of the masking countermeasure implemented in the ESP32-C6.

The results are available below:

Bibliography

Karim M. Abdellatif, Olivier Hériveaux, and Adrian Thillard. Unlimited results: breaking firmware encryption of esp32-v3. Cryptology ePrint Archive, Paper 2023/090, 2023. URL: https://eprint.iacr.org/2023/090. ↩ ¹ ² ³ ⁴ ⁵

Chao Luo, Yunsi Fei, and A. Adam Ding. Side-channel power analysis of xts-aes. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, volume, 1330–1335. 2017. doi:10.23919/DATE.2017.7927199. ↩

Qizhi Tian, Maire O'neill, and Neil Hanley. Can leakage models be more efficient? non-linear models in side channel attacks. In 2014 IEEE International Workshop on Information Forensics and Security (WIFS), volume, 215–220. 2014. doi:10.1109/WIFS.2014.7084330. ↩

Jason Waddle and David Wagner. Towards efficient second-order power analysis. In Cryptographic Hardware and Embedded Systems - CHES 2004: 6th International Workshop Cambridge, MA, USA, August 11-13, 2004. Proceedings, volume 3156 of Lecture Notes in Computer Science, 1–15. Springer, 2004. URL: https://iacr.org/archive/ches2004/31560001/31560001.pdf, doi:10.1007/978-3-540-28632-5_1. ↩

Previous Post Next Post

Introduction

Side-Channel Attack Basics

General Idea

Power Consumption Models

Hamming Weight Model

Example: AES First Round Encryption

Correlation Power Analysis

Espressif’s Flash Encryption Feature

Overview

High-Level Attack Plan

Hardware Platform

Goals

Overview

Block Diagram

Key Features

Fake External Flash

Current Measurement

Clock Control

Temperature Control

Software Support

Firmware & Gateware

Host Software

Attacking the ESP32

Encryption Method Overview

Experimental Results

Traces Capture

Traces Analysis

Conclusion

Attacking the ESP32-C3

Encryption Method Overview

Attack Plan

“Tweaked” Key Recovery

Second round Key Recovery

KEK_EKE​ Recovery

TnT_nTn​ Recovery

Leakage Model

First Decryption Round

Second Decryption Round

Experimental Results

Traces Capture

Traces Analysis

Conclusion

Attacking the ESP32-C6

Encryption Method Overview

Experimental Results

Masking Countermeasure (SEC_DPA_OFF)

Hiding Countermeasure

SEC_DPA_LOW Results

SEC_DPA_HIGH Results

Conclusion

Conclusion

Disclosure Timeline

Appendix

Leakage Assessment

Bibliography

Attacking the `ESP32`

Attacking the `ESP32-C3`

$K_E$ Recovery

$T_n$ Recovery

Attacking the `ESP32-C6`

Masking Countermeasure (`SEC_DPA_OFF`)

`SEC_DPA_LOW` Results

`SEC_DPA_HIGH` Results