July 18, 2022

FirmwareBleed: The industry fails to adopt Return Stack Buffer mitigations in SMM

Binarly Team

Speculative execution mitigations have been discussed for some time, but most of the focus has been at the operating system level in order to adopt them in software stacks. What is happening at the firmware level? When it comes to applying these mitigations, how does the industry take advantage of them, and who coordinates their adoption specifically into the firmware? These are all good questions, but unfortunately no positive news can be shared.

The microarchitectural conditions complicate attack surfaces making them hard to mitigate in just one place. The different layers of the computer stack don’t have knowledge about active mitigations. As an example, the operating system doesn’t obtain information about active speculative execution mitigations like the branch target injection mitigation (retpoline) in System Management Mode (SMM) of UEFI firmware.

The problem is that the main CPU cores may use branch predictors other than the Return Stack Buffer (RSB) when the RSB underflows. Basically, RSB predict the target of RETinstructions but when RSB underflows happen some of the CPU cores could fall back tousing other branch predictors. This may impact software using the retpoline mitigation strategy on such processors. The best practice introduced by Intel in 2018 to address RSB behavior in SMM runtime was to leverage CALL instructions before returning from SMM context to avoid interfering with non-SMM usage of the retpoline mitigations. Regardless of whether Windows and Linux operating systems retpoline or not, if the SMM (SMI handlers) in system firmware don't have mitigations in place, an attack window will be available by design.

The impact of such attacks is focused on disclosing the content from privileged memory (including protected by virtualization technologies) to obtain sensitive data from processes running on the same processor (CPU). Cloud environments can have a greater impact when a physical server can be shared by multiple users or legal entities.

Our research shows that most enterprise vendors are affected by not correctly applying the Return Stack Buffer (RSB) stuffing mitigation logic before returning from SMM. Based on our data set, most enterprise vendors misuse RSB mitigations or only use them partially, exposing the same attack surface for the successful attacks.

The following table contains the results of the analysis showing the vendor platforms without the Intel EDKII patch after the 2018-08-15 release date.

Vendor Platform Count
AAEON Client 1
Acer Client 1
Compulab IoT 2
Dell Client/Serve 59
Forme Life Client 1
Fujitsu Client 2
HP Client 32
Intel NUCs 4
Intel Server Boards 10
Lenovo Client 248
StarLabs Client 4

Table 1: Number of firmware per vendor without EDKII patch (Stuff RSB before RSM)

The following table contains the results of the analysis showing the vendor platforms with EDK II RSB patch but with at least one unprotected RSM present.

Vendor Platform Count
Dell Client/Server 7
HP Client 53
Intel NUCs 10
Lenovo Client 133
StarLabs Client 15

Table 2: Number of firmware per vendor with EDKII RSB patch but unprotected RSM is present.

Presenting the same data for HP, Lenovo, Intel, Dell as pie charts:

Figure 1: These pie charts show the ratio between HP, Lenovo, Intel and Dell firmware images without patch against the firmware images where EDKII patch is present but unstuffed RSB is present in some of the cases.

Figure 1: These pie charts show the ratio between HP, Lenovo, Intel and Dell firmware images without patch against the firmware images where EDKII patch is present but unstuffed RSB is present in some of the cases.

In our FirmwareBleed Github repository, one can find the complete results of the processed dataset from LVFS consisting of released public firmware and additional data for validation and further research.

The inconsistency in applying mitigations indicates a failure in the firmware supply chain when reference code from Intel and AMD contains mitigations but device vendors have not adopted them as intended. It is difficult to detect such supply chain failures without a deep code inspection at the binary level. In order to benefit our customers, Binarly's team invested a lot of time and effort to develop a comprehensive binary analysis infrastructure.

An in-depth look at RSB failures

The original patch with mitigation code has been pushed by Intel to EDKII github repository in  August 2018 to UefiCpuPkg/PiSmmCpuDxeSmm module to mitigate CVE-2017-5715 by adding Return Stack Buffer stuffing logic before returning from SMM. After the stuffing, RSB entries should contain a trap like:

@Spectrap:
  pause
  lfence
  jmp     @Spectrap

On AMD and Intel processors based on EDKII system firmwares, the same mitigation logic applies and the same mistakes in implementation too.

As an example, most of the analyzed UEFI firmwares images (most recent versions) have multiple cases when Resume from SMM (RSM) in SmiEntry is not mitigated. In the code of the PiSmmCpuDxeSmm there are three RSM instructions and two RSB stuffing mitigations. However, the third one does not. Additionally, the first two (where RSB stuffing is present) are not used during exit from SMM. The following figure shows the correct applied RSB stuffing mitigation (StuffRsb EDKII macros):

Fig 2

Figure 2: Code snippet with StuffRsb mitigation code applied before RSM execution

On the other hand, during our research we found places where RSB mitigations are missed before the RSM instruction is executed in SmiEntry routine (entry point to SMI handler calls).

Fig 3

Figure 3: Code snippet without StuffRsb mitigation code before RSM execution in SmiEntry

On Intel-powered platforms, we also found alternate code patterns. We can see similar code to SmiEntry from EDKII without RSB stuffing like in the following figure:

Fig 4

Figure 4: Alternate code snippet without StuffRsb mitigation before RSM execution in SmiEntry

Applying the RSB stuffing mitigation incorrectly can lead to the following security issues:  # - UEFI system firmware does not have any RSB mitigations (in updates released in 2022 (!)) # - UEFI system firmware has RSB migitations, but the code flow exposed to the operating system is not protected (many of these devices have the recent generation of the hardware (!)).

Detecting RSB failures at scale using the Binarly Platform

With the release of Retbleed, we have seen increased interest in speculative execution bypasses. We decided to use Binarly internal deep code analysis infrastructure to apply new security checks related to mitigations like RSB into the firmware. The main reason was the hypothesis we want to prove the system firmware doesn’t introduce any inconsistency in Specter-like (RSB) mitigations.

In our experimental environment, we aimed to achieve a few goals: #

  • With the FwHunt rules we can detect effectively the presence of the mitigation, but have no additional information about the flows leading into all RSM instructions. #
  • With Binarly deep code analysis infrastructure, we can analyze all code flows into every RSM context and check for the presence of RSB mitigations. This allows us to identify places where the EDKII patch is present while detecting the RSM instructions left unprotected in the analyzed module.
Fig 5

Figure 5: Results of the analysis from Binarly internal code analysis infrastructure

The figure shows the output from the Binarly's code analysis infrastructure where all the code places with RSM instructions are used on certain SMM modules and all the RSB stuffing mitigations are applied.

During our analysis we detected two common patterns for unprotected RSM instructions: - This code pattern shared between Intel and Dell based firmware images: #

48 B8 00 00 00 00 00 00 00 00 8A 00 3C 00 74
?? B9 ?? ?? 00 00 89 D8 31 D2 0F 30 0F AA

48 B8 00 00 00 00 00 00 00 00    mov     rax, 0
8A 00                            mov     al, [rax]
3C 00                            cmp     al, 0
74 ??                            jz      short loc_??
B9 ?? ?? 00 00                   mov     ecx, ????h
89 D8                            mov     eax, ebx
32 D2                            xor     dl, dl
0F 30                            wrmsr
0F AA                            rsm
  • This code pattern shared between HP and some Dell based firmware images:
8A 00 3C 00 74 ?? 5A F7 C2 04 00 00 00 74 ??
B9 ?? ?? 00 00 0F 32 66 83 CA 04 0F 30 0F AA

8A 00                            mov     al, [rax]
3C 00                            cmp     al, 0
74 ??                            jz      short loc_??
5A                               pop     rdx
F7 C2 04 00 00 00                test    edx, 4
74 ??                            jz      short loc_??
B9 ?? ?? 00 00                   mov     ecx, ????h
0F 32                            rdmsr
66 83 CA 04                      or      dx, 4
0F 30                            wrmsr
0F AA                            rsm

Based on these discoveries, we developed the FwHunt rule to effectively detect the different variants of the problem for multiple vendors.

FwHunt vs RSB failures

A FwHunt rule was developed during our research to detect two types of problems: # - The original Intel patch from EDKII was not found inside the PiSmmCpuDxeSmm module. # - There may be an EDKII patch present but without RSB stuffing mitigation (EDKII macros not applied) before RSM instruction in SmiEntry.

All detection cases previously mentioned are covered by the FwHunt rule:

RsbStuffingCheck:
  meta:
    author: Binarly (https://github.com/binarly-io/FwHunt)
    license: CC0-1.0
    name: RsbStuffingCheck
    namespace: MitigationFailures
    description: Check if StuffRsb used before RSM
    url: https://binarly.io/posts/FirmwareBleed_The_industry_fails_to_adopt_Return_Stack_Buffer_mitigations_in_SMM
    volume guids:
      - a3ff0ef5-0c28-42f5-b544-8c7de1e80014
  variants:
    The patch from EDK2 is missing:
      hex_strings:
        not-any:
          - f3900faee8eb..48ffc875..4881c4........0faa
    RSB Stuffing before RSM skipped in SMI Entry code:
      hex_strings:
        or:
          - 48b800000000000000008a003c0074..b9....000089d831d20f300faa
          - 8a003c0074..5af7c20400000074..b9....00000f326683ca040f300faa

The result of the applying FwHunt rule with open-sourced fwhunt-scan (community version of our scanner):

Fig 6

Figure 6: Results of the analysis from fwhunt-scan

The result of the applying FwHunt rule with FwHunt.RUN (community FwHunt service):

Fig 7

Figure 7: Results of the analysis from FwHunt.RUN

Firmware supply chain ecosystems are quite complex and often contain repeatable failures when it comes to applying new industry-wide mitigations or fixing reference code vulnerabilities.

We proved in our research that even if a mitigation is present in the firmware, it doesn't mean it is applied correctly without creating security holes. It has been demonstrated again by FirmwareBleed that source code static analysis tools are constantly failing to detect misuse of runtime protections and mitigations. The research also demonstrates the power of binary-level analysis and the effectiveness of such an approach to detect FirmwareBleed at scale.

There will always be security gaps if additional validation is not enforced at the binary level, which can put even recently released hardware at risk. Developing secure firmware requires trust and confidence, but security cannot be guaranteed without validation.