Info

This document is part of an original submission for the RP2350 Hacking Challenge.

For more detailed and up-to-date content, refer to “Laser Fault Injection on a Budget: RP2350 Edition”.

Boot ROM Fault Injection

Overview

This page describes, at the Boot ROM level, the way the RP2350 has been faulted to achieve the bypass of the secure boot feature.

Even though faults can be injected without triggering the Glitch Detectors, the Boot ROM implements many countermeasures.

Hence, the only way I managed to make things work was to combine a fault injection and the “Dual QSPI Flash” feature offered by the Custom I/O Board.

General Idea

The fault has been injected in the s_varm_crit_ram_trash_verify_block function.

A very stripped-down version of the function is available below.

// Start computing a hash
sb_sha256_init(&sha);

// Load the data from the flash to the RAM
s_varm_crit_mem_copy_by_words((uint32_t *)to_storage_addr,
                              resolve_ram_or_absolute_flash_addr(
                                  from_storage_addr),
                              (size + 3u) & ~3u);

// Feed the hash engine with the data, obtained from RAM
uint32_t *src = resolve_ram_or_absolute_flash_addr(to_storage_addr);
sb_sha256_update_32(&sha, src, size & ~3u);

// Include block data (containing source address, dest address and size)
// to the SHA-256 hash
sb_sha256_update_32(&sha, block_data, parsed_block->hash_def_block_words_included * 4);

// Finalize the hash
sb_sha256_finish(&sha, signature_workspace->hash.bytes);

// Do the complicated & well protected signature validation
sig_matches_image_keyx = s_arm8_verify_signature_secp256k1(signature_workspace->sig_context_buffer,
                                                           &public_key[0],
                                                           &signature_workspace->hash,
                                                           &signature[0]);

The idea is to inject a fault after the call to s_varm_crit_mem_copy_by_words and before the call to sb_sha256_update_32. This could result in the SHA-256 hash to be computed against different data than the one that was previously loaded to RAM.

For instance, the data used to compute the hash can be made the one corresponding to an unaltered firmware, while arbitrary data could be loaded to RAM.

Fault Injection Paths

Path #1 - Single Instruction Skip

The code listed above translates to the following assembly pieces.

s_varm_crit_mem_copy_by_words is first called.

00001d62 01 f0 82 fb        bl            varm_to_s_native_crit_mem_copy_by_words_impl
00001d66 0d 9b              ldr           r3,[sp,#local_24]
00001d68 43 ec 70 07        mcrr          p7,0x7,r0,r3,cr0
00001d6c ac e7              b             LAB_00001cc8

The implementation of the function increments R1 until the target size has been reached.

0000346c 19 fe 3a c7        mrc2          p7,0x0,r12,cr9,cr10,0x1
00003470 10 b5              push          {r4,lr}
00003472 03 46              mov           r3,r0
00003474 22 b1              cbz           r2,LAB_00003480
00003476 1a 44              add           r2,r3
    LAB_00003478
00003478 10 c9              ldmia         r1!,{r4}
0000347a 10 c0              stmia         r0!,{r4}
0000347c 90 42              cmp           r0,r2
0000347e fb d3              bcc           LAB_00003478
    LAB_00003480
00003480 c0 1a              subs          r0,r0,r3
00003482 09 fe 3a c7        mcr2          p7,0x0,r12,cr9,cr10,0x1
00003486 10 bd              pop           {r4,pc}

Next, R1 is kept untouched until the call to sb_sha256_update_32.

00001d02 32 00              movs          r2,r6; size
00001d04 21 00              movs          r1,r4; src
00001d06 0f a8              add           r0,sp,#0x3c; sha context
00001d08 02 f0 49 fb        bl            sb_sha256_update_32

If the instruction at 0x00001d04 can be skipped, the following will in effect be executed:

r1 = resolve_ram_or_absolute_flash_addr(from_storage_addr) + size;
sb_sha256_update_32(&sha, r1, size);

The data pointed by R1 will be pulled from the QSPI bus. This data is under the control of an attacker.

With the help of the dual flash system, arbitrary values can easily be provided, including values that don’t match the content of the other flash memory.

Path #2 - Register Value Corruption

During my tests, I observed that injecting faults between s_varm_crit_mem_copy_by_words and sb_sha256_update_32 often resulted in data being pulled from the flash starting from offset 0.

I don’t have a clear explanation, such as a simple instruction skip, regarding why this happens.

It could, for instance, mean that when the following is executed:

00001d02 32 00              movs          r2,r6; size
00001d04 21 00              movs          r1,r4; src
00001d06 0f a8              add           r0,sp,#0x3c; sha context
00001d08 02 f0 49 fb        bl            sb_sha256_update_32

The value of R4 corresponds to XIP_BASE (or variants, such as XIP_NOCACHE_NOALLOC_NOTRANSLATE_BASE).

Anyway, this means that once again, the call to sb_sha256_update_32 can use data under the control of an attacker instead of the data already loaded into RAM.

Path #3 - Instruction Mutation

The instructions corresponding to s_varm_crit_mem_copy_by_words are available below.

0000346c 19 fe 3a c7        mrc2          p7,0x0,r12,cr9,cr10,0x1
00003470 10 b5              push          {r4,lr}
00003472 03 46              mov           r3,r0
00003474 22 b1              cbz           r2,LAB_00003480
00003476 1a 44              add           r2,r3
    LAB_00003478
00003478 10 c9              ldmia         r1!,{r4}
0000347a 10 c0              stmia         r0!,{r4}
0000347c 90 42              cmp           r0,r2
0000347e fb d3              bcc           LAB_00003478
    LAB_00003480
00003480 c0 1a              subs          r0,r0,r3
00003482 09 fe 3a c7        mcr2          p7,0x0,r12,cr9,cr10,0x1
00003486 10 bd              pop           {r4,pc}

If a fault can be injected at 0x00003486, it’s possible for the pop {r4,pc} instruction to be altered, possibly resulting in the value of R4 not being restored.

Before pop {r4,pc}, R4 corresponds to the last word copied from flash to RAM, a value under the control of the attacker. This value could, for instance, be set to XIP_NOCACHE_NOALLOC_NOTRANSLATE_BASE.

R4 is subsequently used when calling sb_sha256_update_32.

00001d02 32 00              movs          r2,r6; size
00001d04 21 00              movs          r1,r4; src
00001d06 0f a8              add           r0,sp,#0x3c; sha context
00001d08 02 f0 49 fb        bl            sb_sha256_update_32

Once again, sb_sha256_update_32 could be forced to run against data pulled from the QSPI bus.

Fault Injection Synchronization

To synchronize the laser pulse with the execution of the Boot ROM, the activity of the QSPI is monitored by the gateware of the FPGA Board.

The last QSPI read corresponding to the call of s_varm_crit_mem_copy_by_words can be reliably detected. From this point, before calling sb_sha256_update_32, the following instructions are executed.

00003478 10 c9              ldmia         r1!,{r4} // Last load
0000347a 10 c0              stmia         r0!,{r4}
0000347c 90 42              cmp           r0,r2
0000347e fb d3              bcc           LAB_00003478
00003480 c0 1a              subs          r0,r0,r3
00003482 09 fe 3a c7        mcr2          p7,0x0,r12,cr9,cr10,0x1
00003486 10 bd              pop           {r4,pc}

00001d66 0d 9b              ldr           r3,[sp,#local_24]
00001d68 43 ec 70 07        mcrr          p7,0x7,r0,r3,cr0
00001d6c ac e7              b             LAB_00001cc8

00001cc8 0b 9b              ldr           r3,[sp,#local_2c]
00001cca 00 2b              cmp           r3,#0x0
00001ccc b3 d0              beq           LAB_00001c36
00001cce 03 9b              ldr           r3,[sp,#local_4c]
00001cd0 73 b1              cbz           r3,LAB_00001cf0
00001cf0 23 0e              lsrs          r3,r4,#0x18
00001cf2 10 2b              cmp           r3,#0x10
00001cf4 02 d8              bhi           LAB_00001cfc

00001cfc 03 23              movs          r3,#0x3
00001cfe 3e 00              movs          r6,r7
00001d00 9e 43              bics          r6,r3
00001d02 32 00              movs          r2,r6
00001d04 21 00              movs          r1,r4
00001d06 0f a8              add           r0,sp,#0x3c
00001d08 02 f0 49 fb        bl            sb_sha256_update_32

A redundancy co-processor instruction with a random delay of up to 127 ROSC cycles is called at 0x00001d68. This means reliably timing a pulse to always target a specific instruction isn’t possible.

However, this instruction is the only one adding a random delay. The software designed for the attack allows for about 20 attempts per second. That’s enough to inject interesting faults in a reasonable amount of time.

Attack Process

Flash Memory Organization

Before starting the attack, the two QSPI flashes of the I/O Board are organized as follows:

The first flash, whose content will be loaded to RAM, contains a copy of the valid, signed firmware, including a signature block, patched as follows:

A small shellcode is written at the reset handler offset. This shellcode is designed to jump to an arbitrary firmware located further in the flash memory at address 0x10010000.
The last word of the data loaded to RAM is set to XIP_NOCACHE_NOALLOC_NOTRANSLATE_BASE + 0x7000.

The second flash, whose content is expected to be read after a fault has been injected, contains the following:

An unmodified firmware is copied at offset 0x0. In case the fault injection path #2 is executed, sb_sha256_update_32 will be executed against this image, resulting in a valid hash being computed.
An unmodified firmware is copied at the offset corresponding to the size of the data copied to RAM. In case the fault injection path #1 is executed, sb_sha256_update_32 will be executed against this image, resulting in a valid hash being computed.
An unmodified firmware is copied at offset 0x7000. In case the fault injection path #3 is executed, sb_sha256_update_32 will be executed against this image, resulting in a valid hash being computed.
An arbitrary firmware is copied at offset 0x10000. This firmware will run if the shellcode is executed.

Attack Loop

During the attack, the following is repeated in a loop by the attack software:

The first flash is selected.
The RP2350 is powered on.
QSPI bus reads are monitored until the last byte corresponding to the call to s_varm_crit_mem_copy_by_words has been read.
The second flash is selected.
After an arbitrary delay, the laser is pulsed.
QSPI bus reads are monitored again to verify if the data pulled from the flash corresponds to a successful bypass.
In case of failure, the RP2350 is powered off, and a new attempt will be made.

flowchart TD
    A[Select First Flash] --> B[Power On RP2350]
    B --> C[Monitor QSPI Reads]
    C --> D[Select Second Flash]
    D --> E[Arbitrary Delay]
    E --> F[Pulse Laser]
    F --> G[Monitor QSPI Reads]
    G -->|Failure| H[Power Off RP2350]
    H --> A
    G -->|Success| I[Arbitrary Firmware Execution]

Results

When targeting the correct die area, injecting faults that result in data being pulled from the QSPI bus when sb_sha256_update_32 is quite common.

However, a major issue was that faults were impacting the system a bit too hard, and the value corresponding to the size parameter of sb_sha256_update_32 was also corrupted.

This reduced the success rate of the attack significantly. I have been able to see the first bypass after more than 16 hours of tweaking the laser pulse parameters and position. However, following this fine-tuning, I only had to wait for about 20 minutes for a new success to appear.

Possible Improvements

Several elements could be improved to increase the success rate of the attack:

The laser pulse duration could be fine-tuned. Due to the simple design of the Laser Driver Board, adjusting the pulse duration requires modifying the values of physical components. This process can be cumbersome, so I haven’t thoroughly explored the impact of pulse duration.
The laser spot size resulting from the Optical Subsystem is relatively large compared to what is typically used in similar attacks. A smaller spot size would likely lead to better-controlled and more repeatable faults.
The attack software could be enhanced. Instead of using two flashes, the FPGA could be utilized to dynamically supply the correct data, regardless of the flash address requested after the fault.

Last update: November 24, 2024