Info
This document is part of an original submission for the RP2350 Hacking Challenge.
For more detailed and up-to-date content, refer to “Laser Fault Injection on a Budget: RP2350 Edition”.
Boot ROM Fault Injection
Overview
This page describes, at the Boot ROM level, the way the RP2350
has been faulted to achieve the bypass of the secure boot feature.
Even though faults can be injected without triggering the Glitch Detectors, the Boot ROM implements many countermeasures.
Hence, the only way I managed to make things work was to combine a fault injection and the “Dual QSPI Flash” feature offered by the Custom I/O Board.
General Idea
The fault has been injected in the s_varm_crit_ram_trash_verify_block
function.
A very stripped-down version of the function is available below.
// Start computing a hash
sb_sha256_init(&sha);
// Load the data from the flash to the RAM
s_varm_crit_mem_copy_by_words((uint32_t *)to_storage_addr,
resolve_ram_or_absolute_flash_addr(
from_storage_addr),
(size + 3u) & ~3u);
// Feed the hash engine with the data, obtained from RAM
uint32_t *src = resolve_ram_or_absolute_flash_addr(to_storage_addr);
sb_sha256_update_32(&sha, src, size & ~3u);
// Include block data (containing source address, dest address and size)
// to the SHA-256 hash
sb_sha256_update_32(&sha, block_data, parsed_block->hash_def_block_words_included * 4);
// Finalize the hash
sb_sha256_finish(&sha, signature_workspace->hash.bytes);
// Do the complicated & well protected signature validation
sig_matches_image_keyx = s_arm8_verify_signature_secp256k1(signature_workspace->sig_context_buffer,
&public_key[0],
&signature_workspace->hash,
&signature[0]);
The idea is to inject a fault after the call to s_varm_crit_mem_copy_by_words
and before the call to sb_sha256_update_32
. This could result in the SHA-256 hash to be computed against different data than the one that was previously loaded to RAM.
For instance, the data used to compute the hash can be made the one corresponding to an unaltered firmware, while arbitrary data could be loaded to RAM.
Fault Injection Paths
Path #1 - Single Instruction Skip
The code listed above translates to the following assembly pieces.
s_varm_crit_mem_copy_by_words
is first called.
00001d62 01 f0 82 fb bl varm_to_s_native_crit_mem_copy_by_words_impl
00001d66 0d 9b ldr r3,[sp,#local_24]
00001d68 43 ec 70 07 mcrr p7,0x7,r0,r3,cr0
00001d6c ac e7 b LAB_00001cc8
The implementation of the function increments R1
until the target size has been reached.
0000346c 19 fe 3a c7 mrc2 p7,0x0,r12,cr9,cr10,0x1
00003470 10 b5 push {r4,lr}
00003472 03 46 mov r3,r0
00003474 22 b1 cbz r2,LAB_00003480
00003476 1a 44 add r2,r3
LAB_00003478
00003478 10 c9 ldmia r1!,{r4}
0000347a 10 c0 stmia r0!,{r4}
0000347c 90 42 cmp r0,r2
0000347e fb d3 bcc LAB_00003478
LAB_00003480
00003480 c0 1a subs r0,r0,r3
00003482 09 fe 3a c7 mcr2 p7,0x0,r12,cr9,cr10,0x1
00003486 10 bd pop {r4,pc}
Next, R1
is kept untouched until the call to sb_sha256_update_32
.
00001d02 32 00 movs r2,r6; size
00001d04 21 00 movs r1,r4; src
00001d06 0f a8 add r0,sp,#0x3c; sha context
00001d08 02 f0 49 fb bl sb_sha256_update_32
If the instruction at 0x00001d04
can be skipped, the following will in effect be executed:
r1 = resolve_ram_or_absolute_flash_addr(from_storage_addr) + size;
sb_sha256_update_32(&sha, r1, size);
The data pointed by R1
will be pulled from the QSPI bus. This data is under the control of an attacker.
With the help of the dual flash system, arbitrary values can easily be provided, including values that don’t match the content of the other flash memory.
Path #2 - Register Value Corruption
During my tests, I observed that injecting faults between s_varm_crit_mem_copy_by_words
and sb_sha256_update_32
often resulted in data being pulled from the flash starting from offset 0
.
I don’t have a clear explanation, such as a simple instruction skip, regarding why this happens.
It could, for instance, mean that when the following is executed:
00001d02 32 00 movs r2,r6; size
00001d04 21 00 movs r1,r4; src
00001d06 0f a8 add r0,sp,#0x3c; sha context
00001d08 02 f0 49 fb bl sb_sha256_update_32
The value of R4
corresponds to XIP_BASE
(or variants, such as XIP_NOCACHE_NOALLOC_NOTRANSLATE_BASE
).
Anyway, this means that once again, the call to sb_sha256_update_32
can use data under the control of an attacker instead of the data already loaded into RAM.
Path #3 - Instruction Mutation
The instructions corresponding to s_varm_crit_mem_copy_by_words
are available below.
0000346c 19 fe 3a c7 mrc2 p7,0x0,r12,cr9,cr10,0x1
00003470 10 b5 push {r4,lr}
00003472 03 46 mov r3,r0
00003474 22 b1 cbz r2,LAB_00003480
00003476 1a 44 add r2,r3
LAB_00003478
00003478 10 c9 ldmia r1!,{r4}
0000347a 10 c0 stmia r0!,{r4}
0000347c 90 42 cmp r0,r2
0000347e fb d3 bcc LAB_00003478
LAB_00003480
00003480 c0 1a subs r0,r0,r3
00003482 09 fe 3a c7 mcr2 p7,0x0,r12,cr9,cr10,0x1
00003486 10 bd pop {r4,pc}
If a fault can be injected at 0x00003486
, it’s possible for the pop {r4,pc}
instruction to be altered, possibly resulting in the value of R4
not being restored.
Before pop {r4,pc}
, R4
corresponds to the last word copied from flash to RAM, a value under the control of the attacker. This value could, for instance, be set to XIP_NOCACHE_NOALLOC_NOTRANSLATE_BASE
.
R4
is subsequently used when calling sb_sha256_update_32
.
00001d02 32 00 movs r2,r6; size
00001d04 21 00 movs r1,r4; src
00001d06 0f a8 add r0,sp,#0x3c; sha context
00001d08 02 f0 49 fb bl sb_sha256_update_32
Once again, sb_sha256_update_32
could be forced to run against data pulled from the QSPI bus.
Fault Injection Synchronization
To synchronize the laser pulse with the execution of the Boot ROM, the activity of the QSPI is monitored by the gateware of the FPGA Board.
The last QSPI read corresponding to the call of s_varm_crit_mem_copy_by_words
can be reliably detected. From this point, before calling sb_sha256_update_32
, the following instructions are executed.
00003478 10 c9 ldmia r1!,{r4} // Last load
0000347a 10 c0 stmia r0!,{r4}
0000347c 90 42 cmp r0,r2
0000347e fb d3 bcc LAB_00003478
00003480 c0 1a subs r0,r0,r3
00003482 09 fe 3a c7 mcr2 p7,0x0,r12,cr9,cr10,0x1
00003486 10 bd pop {r4,pc}
00001d66 0d 9b ldr r3,[sp,#local_24]
00001d68 43 ec 70 07 mcrr p7,0x7,r0,r3,cr0
00001d6c ac e7 b LAB_00001cc8
00001cc8 0b 9b ldr r3,[sp,#local_2c]
00001cca 00 2b cmp r3,#0x0
00001ccc b3 d0 beq LAB_00001c36
00001cce 03 9b ldr r3,[sp,#local_4c]
00001cd0 73 b1 cbz r3,LAB_00001cf0
00001cf0 23 0e lsrs r3,r4,#0x18
00001cf2 10 2b cmp r3,#0x10
00001cf4 02 d8 bhi LAB_00001cfc
00001cfc 03 23 movs r3,#0x3
00001cfe 3e 00 movs r6,r7
00001d00 9e 43 bics r6,r3
00001d02 32 00 movs r2,r6
00001d04 21 00 movs r1,r4
00001d06 0f a8 add r0,sp,#0x3c
00001d08 02 f0 49 fb bl sb_sha256_update_32
A redundancy co-processor instruction with a random delay of up to 127 ROSC
cycles is called at 0x00001d68
. This means reliably timing a pulse to always target a specific instruction isn’t possible.
However, this instruction is the only one adding a random delay. The software designed for the attack allows for about 20 attempts per second. That’s enough to inject interesting faults in a reasonable amount of time.
Attack Process
Flash Memory Organization
Before starting the attack, the two QSPI flashes of the I/O Board are organized as follows:
The first flash, whose content will be loaded to RAM, contains a copy of the valid, signed firmware, including a signature block, patched as follows:
- A small shellcode is written at the reset handler offset. This shellcode is designed to jump to an arbitrary firmware located further in the flash memory at address
0x10010000
. - The last word of the data loaded to RAM is set to
XIP_NOCACHE_NOALLOC_NOTRANSLATE_BASE + 0x7000
.
The second flash, whose content is expected to be read after a fault has been injected, contains the following:
- An unmodified firmware is copied at offset
0x0
. In case the fault injection path #2 is executed,sb_sha256_update_32
will be executed against this image, resulting in a valid hash being computed. - An unmodified firmware is copied at the offset corresponding to the size of the data copied to RAM. In case the fault injection path #1 is executed,
sb_sha256_update_32
will be executed against this image, resulting in a valid hash being computed. - An unmodified firmware is copied at offset
0x7000
. In case the fault injection path #3 is executed,sb_sha256_update_32
will be executed against this image, resulting in a valid hash being computed. - An arbitrary firmware is copied at offset
0x10000
. This firmware will run if the shellcode is executed.
Attack Loop
During the attack, the following is repeated in a loop by the attack software:
- The first flash is selected.
- The
RP2350
is powered on. - QSPI bus reads are monitored until the last byte corresponding to the call to
s_varm_crit_mem_copy_by_words
has been read. - The second flash is selected.
- After an arbitrary delay, the laser is pulsed.
- QSPI bus reads are monitored again to verify if the data pulled from the flash corresponds to a successful bypass.
- In case of failure, the
RP2350
is powered off, and a new attempt will be made.
flowchart TD A[Select First Flash] --> B[Power On RP2350] B --> C[Monitor QSPI Reads] C --> D[Select Second Flash] D --> E[Arbitrary Delay] E --> F[Pulse Laser] F --> G[Monitor QSPI Reads] G -->|Failure| H[Power Off RP2350] H --> A G -->|Success| I[Arbitrary Firmware Execution]
Results
When targeting the correct die area, injecting faults that result in data being pulled from the QSPI bus when sb_sha256_update_32
is quite common.
However, a major issue was that faults were impacting the system a bit too hard, and the value corresponding to the size
parameter of sb_sha256_update_32
was also corrupted.
This reduced the success rate of the attack significantly. I have been able to see the first bypass after more than 16 hours of tweaking the laser pulse parameters and position. However, following this fine-tuning, I only had to wait for about 20 minutes for a new success to appear.
Possible Improvements
Several elements could be improved to increase the success rate of the attack:
- The laser pulse duration could be fine-tuned. Due to the simple design of the Laser Driver Board, adjusting the pulse duration requires modifying the values of physical components. This process can be cumbersome, so I haven’t thoroughly explored the impact of pulse duration.
- The laser spot size resulting from the Optical Subsystem is relatively large compared to what is typically used in similar attacks. A smaller spot size would likely lead to better-controlled and more repeatable faults.
- The attack software could be enhanced. Instead of using two flashes, the FPGA could be utilized to dynamically supply the correct data, regardless of the flash address requested after the fault.
Last update: November 24, 2024