Firmware threats such as bootkits and implants have become increasingly prevalent due to their persistence and ability to evade detection compared to traditional OS-level malware. Attackers favor these threats because they can remain undetected even when conventional security measures are in place, especially if UEFI Secure Boot is disabled. Detecting unknown bootkits under these circumstances is a critical challenge in cybersecurity. Mostly, the publicly known UEFI implants and bootkits have been detected after successful deployment, which points to the limitations of the existing security solutions.
In this blog post, the Binarly REsearch team introduces a novel methodology for detecting UEFI bootkits by analyzing their unique code behaviors. By starting from an in-depth analysis of known bootkits, we identify features that can be used for generically detecting bootkits and build rules that we used for hunting new unknown bootkits. Then, we show how these rules can be even further improved, by leveraging advanced static analysis techniques, semantic detection and ML-based clustering.
A bootkit is a type of rootkit that runs during the boot process, before the operating system starts up. Once installed, a bootkit is generally harder to detect than OS-level malware and can bypass OS security mechanisms like PatchGuard and Driver Signature Enforcement (DSE), allowing it to patch the OS kernel, run arbitrary kernel shellcode, or install malicious drivers.
Bootkits have been around for decades at this point, and have long expanded from targeting legacy BIOS to modern UEFI firmware. This evolution also followed the adoption of firmware security features, like Intel Boot Guard and BIOS Guard, which forced bootkits to move from infecting SPI flash memory to targeting the ESP.
UEFI Secure Boot also played an important role in this evolution. This security feature is designed to ensure that only trusted software is executed during the boot process, helping to protect against malware and unauthorized code. However, the risk of bootkit infection still exists as attackers can disable UEFI Secure Boot through physical access, exploits, or supply-chain attacks. This has been shown in the past by Black Lotus, which used a vulnerability known as Baton Drop (CVE-2022-21894) to bypass Secure Boot.
Additionally, while historically Windows has been the main target for bootkit attacks, Linux-targeting bootkits such as Bootkitty (Ubuntu) and Pacific Rim (modified Linux kernel?) have been recently discovered. This is one of the reasons that drove us to research generic bootkit hunting methods: bootkits remain a prevalent threat and give powerful options to attackers, so it is highly likely that new bootkit families will continue to emerge in the future.
To develop the methodology for generic detection techniques, we started with an in-depth analysis of all publicly known bootkits, including Lojax, MosaicRegressor, MoonBounce, CosmicStrand, ESPecter, and BlackLotus. This allowed us to find shared features and differences among various bootkits.
The table above shows some basic information about the analyzed bootkits. Except for MoonBounce, all bootkits are either DXE drivers or UEFI applications. More interestingly, most bootkits reuse large portions of open-source or leaked implementations of bootkits. For example, most of MosaicRegressor’s code is based on HackingTeam’s leaked Vector-EDK, while BlackLotus borrows some code from umap and EfiGuard.
As mentioned earlier, bootkits run during the boot process with the intent of compromising an operating system. Because firmware operates with high privileges, bootkit authors have several ways to achieve this goal. In the following sections we will discuss three key features that are shared amongst bootkits:
In particular, we will discuss how each of these features can be leveraged (or not leveraged) to build generic bootkit detections.
As we will see in the next sections, the key takeaway from this analysis is that the hook chain and additional components do not offer strong detection features, and using them would lead to imprecise results with many false positives. On the other hand, the OS persistence techniques are shared by modern bootkits and can be effectively modeled and leveraged for reliable detection.
Using the BootService table as a hooking point is very common among bootkits. Lojax and MosaicRegressor register their malicious callbacks using the legitimate BootService function CreateEventEx(EFI_EVENT_GROUP_READY_TO_BOOT)
, ensuring their callbacks are executed right before the boot manager is about to load and execute a boot option. MoonBounce and CosmicStrand instead use a more direct hooking strategy, and replace the function pointers stored in the global BootService. In terms of detection, none of these behaviors is very reliable. The first pattern is very common in UEFI firmware and it will lead to many false positives, while the second one would only work for detecting CosmicStrand, as MoonBounce is a DxeCore module that contains an already hooked BootService table.
Another hooking strategy, which is instead shared by many samples, is the more traditional inline code hooking, where code is directly patched in memory. This technique is used by MoonBounce, CosmicStrand, ESPecter and BlackLotus, and is usually implemented in two steps:
From a detection perspective, this inline hooking technique looked promising, since four bootkits patch the same target function (OslArchTransferToKernel
). However, this strategy quickly turned into a dead end: all memory scanning algorithms used by the bootkits were distinct, and the patched instructions varied across bootkit too, showing no common patterns.
For example, both MoonBounce and CosmicStrand use 4 byte signatures to search for OslArchTransferToKernel
, but using different signatures (0xCB485541
and 0x41106A56
).
Some bootkits include additional components and features to avoid detection, and these can also provide a venue for detection. For instance, Lojax and MosaicRegressor run a UEFI module by using the LoadImage
and StartImage
, but these are once again very common functions and thus don’t provide a reliable indicator. On the other hand, ESPecter and BlackLotus use inline hooks to disable security features, such as DSE and Windows Defender. Since these features are disabled via inline code patching, the generic detection is hard for the same reasons described before. VBS can be disabled through setting the NVRAM variable VbsPolicyDisabled
, but the variable name can be obfuscated (for example, BlackLotus encrypted the string).
Overall, creating a generic detection rule based on additional components and features found in bootkit would not yield good results.
To achieve OS persistence, Lojax and MosaicRegressor drop an executable in the NTFS filesystem, by using the BootService functions HandleProtocol
and OpenProtocol
and the Write function exported from the EFI_FILE_PROTOCOL
.
These functions are very common in file system drivers, and even detecting these persistence techniques through the filesystem paths used would also not be very effective, as these paths are obfuscated.
In any case, this technique is very noisy and can be easily detected by security solutions, making it uncommon in the latest bootkits.
On the other hand, as shown in the table above, the remaining four bootkits use two OS-persistence techniques that can be leveraged for creating a generic detection: clearing bits in control registers and shellcode-like PE parsing.
Clearing bits in control registers
Bootkits often clear the Write Protect (WP) bit in the CR0 register, to remove write protection on read-only memory pages, with the goal of in hooking code or to modifying PE header values, such as the entry point or the section permission. This behaviour is relatively uncommon in UEFI applications, so it provides a great venue for building a detection rule.
Shellcode-like PE parsing
Several bootkits parse PE format structures to execute kernel shellcode or to load kernel drivers, a behavior that is very rare in benign UEFI modules and applications. In particular, we found that multiple bootkits parse the IMAGE_EXPORT_DIRECTORY
structure in the PE header for finding kernel API addresses by string hashes, and also the IMAGE_BASE_RELOCATION
structure for resolving code relocations.
For this reason, we decided to use these OS-persistence techniques as a base for building our generic hunting and detection methods.
Based on the two OS-persistence techniques discussed above, we developed detection rules in YARA, which can be used for hunting on VirusTotal, and in the FwHunt format, which is compatible with our Binary Risk Hunt scanner. We followed an iterative approach to develop them: suspicious samples were identified using the YARA and FwHunt rules, which were then statically triaged to refine the rules. In the following sections, we present the hunting results related to the YARA rules and to VirusTotal, as those also cover the results from the FwHunt rules and Binarly Risk Hunt. However, FwHunt will be discussed in the next section, when we explore more advanced detection methods.
The first rule detects the clearing of the WP bit in CR0, and we created it based on the behavior found in MoonBounce, CosmicStrand and ESPecter.
As we can see in the figure above, the code sequence to read/write CR0 is rather simple ($clear_wp_in_cr0
), which resulted in a few false positives in some edk2 modules (e.g. OvmfPkg
, UefiCpuPkg
and EmulatorPkg
) and commercial bootloaders that we had to exclude.
Using this rule to hunt for unknown bootkits, we detected two Bootlicker variants with VT detection rates of 1/71 and 2/68. Bootlicker is an open-source bootkit based on DmaBackdoorBoot, but the two samples were only detected as Win/malicious_confidence_70%
and MALICIOUS
, not as bootkits.
The hook chain used in these variants fully matches the one implemented in Bootlicker: ExitBootServices → OslArchTransferToKernel → ACPI.sys .rsrc shellcode → PsSetCreateThreadNotifyRoutine → shellcode in .text slack space → KeInsertQueueApc → APC callback → KeInsertQueueApc → user-mode shellcode
. One of the samples has no user-mode payload (null function), while the other downloads shellcode from a local IP (192.168.1.44). Therefore, we suspect that the developers submitted their PoCs to check the detection rate.
EfiGuard, an open source bootkit implementation, also clears the WP bit in CR0. However, the previous rule was not effective because EfiGuard calls AsmWriteCr0
and passes a bit mask value to clear WP as an argument.
For this reason, to detect this bootkit, we created another rule matching the clearing of the CET (Control-Flow Enforcement Technology) bit in CR4, a functionality which is implemented in EfiGuard. Since we directly defined the code sequence bytes of the assembly-written function AsmDisableCet
, this rule is specific to EfiGuard. However, we think it’s worth creating it because EfiGuard has been actively abused in the wild and used as a starting point for malicious bootkits.
Hunting with this rule on VirusTotal led to the discovery of two unknown bootkit samples named "Vixen.efi", which were found without detections (0/75, 0/73). We compared the samples with the original EfiGuardDxe
binary and found only trivial differences:
print() message
, pdb path, etc.)The code behavior of these two Vixen samples was the same as EfiGuardDxe
: disabling both PatchGuard and DSE. Since the number of EfiGuard detections is usually in the range 10-20 (e.g., the latest release of EfiGuard has 13 positive detections), we were surprised to see that the Vixen samples were not detected at all.
We also tried to identify the purpose of the binary, however, we could not get any concrete evidence for the identification. We found a related loader for Vixen.efi, but the code didn’t present any notable difference from the Loader.efi of EfiGuard. We also searched the bundled files reported on VirusTotal using OSINT engines, but no other related sample was found, thus the behavior of this sample remains unknown to us.
The second set of rules capture how bootkits parse PE headers in memory to extract information, such as addresses, that enable them to spread in the OS.
As shown in the following figure, MoonBounce, CosmicStrand and ESPecter access multiple offsets of OS kernel export directory (IMAGE_DATA_DIRECTORY
) and IMAGE_EXPORT_DIRECTORY
structures to resolve OS kernel API addresses.
We translated the structure offset accesses into one code sequence using YARA jumps.
Hunting with this rule identified two unknown bootkit samples (1/72, 0/72), that we named “Valkyrie”, since both output an ASCII art showing the text “Valkyrie” in debug mode. These samples are based on umap, another open source bootkit that allows for manual mapping of kernel drivers, but with a better engineered implementation:
go.cfg
” is used BlImgAllocateImageBuffer
is called only once, to store the loader binary in the slack space of the legitimate binary, whereas umap calls it twice to map both the legitimate binary and the loader separately.The remaining behavior, including the hook chain, is the same as umap: ImgArchStartBootApplication → BlImgAllocateImageBuffer → OslFwpKernelSetupPhase1 → ExitBootServices → acpiex.sys entrypoint → injected “loader” entrypoint
.
Based on the VirusTotal relation information, we identified the kernel driver sample loaded by the bootkit (“loader”). The driver uses the same string hash algorithm for resolving kernel API addresses. Additionally, the data and strings were highly obfuscated with SSE instructions. One of the decoded strings was an IP address whose hostname was resolved as valkyrie[.]cx.
This last discovery is what led us to the real purpose of this bootkit: according to the website, the bootkit is part of game cheat software.
Unlike the kernel API addresses resolution matched by the rule, bootkits resolve code relocations differently, preventing us from creating a single rule. We observed two distinct patterns in the instruction bytes when accessing the relocation directory (IMAGE_DATA_DIRECTORY
) and IMAGE_BASE_RELOCATION
. This rule was created from the MoonBounce and BlackLotus samples.
Using this rule we found four additional umap variants. The first one had zero detections (0/71), despite its code being the same as the binary downloaded from the GitHub release page. The other three samples (mp.efi/winboot.efi) have instead a different hook chain from umap: ExitBootServices → CreateEvent callback with EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE
(an event notified when SetVirtualAddressMap()
is performed) → IoInitSystem
in OS kernel. While these findings looked promising at first glance, we concluded that they were probably another game cheat after checking their compressed parent files from VirusTotal.
With this rule, we also discovered an unknown bootkit sample named BOOTKIT.efi (4/71), which interestingly saw its detection number drop from 6 to 2 last month.
The sample was a bootkit that disables PatchGuard and DSE like EfiGuard, but the code was not similar (it just reused part of the signatures from EfiGuard). The hook chain was also unique: OpenProtocol → BlImgLoadPEImageEx →
Several functions in the OS kernel (KiSwInterrupt
, KiMcaDeferredRecoveryService
, SeCodeIntegrityQueryInformation
, SeValidateImageData
, etc.).
Our additional VirusTotal RetroHunt revealed another variant of this bootkit, called SandboxBootkit.efi (3/71). Analysis of its related files (exe/sys in the parent compressed file) showed it was another sample used for game cheat software.
Finally, we decided to build a detection rule for PeiBackdoor, a bootkit running during the PEI phase. Unlike the later-stage bootkits that we have investigated so far, it does not include OS-persistence code, but it contains similar code for resolving code relocations of the infected backdoor image.
This rule detected another backdoor by the same author, but no additional samples were discovered.
The summary results of our hunting on VirusTotal with the presented rules are shown in the following table. In total, we identified 11 new unknown bootkit samples and 1 old version of umap. Of all these samples, 10 of them were not detected as bootkits and 4 had zero detections on VirusTotal.
All samples except for BOOTKIT.efi reused a large portion of open source bootkit code: bootlicker (DmaBackdoorBoot
), EfiGuard, and umap. The DmaBackdoorBoot
and EfiGuard binaries distributed on GitHub are detected on VirusTotal, with 36 and 13 detections, respectively. On the other hand, umap is not detected even though the code is the same. We believe umap should be detected because the code is reused by several bootkits, including BlackLotus.
The samples identified during our hunting were mostly related to game cheating software. For the remaining 4 samples (bootmgr.exe and Vixen.efi), we were unable to determine their true purpose and whether the samples (or any enhanced variants) are currently being used in-the-wild by threat actors. However, we hope that this research brings clarity on the bootkit threat landscape, enabling AV vendors and security teams to integrate our findings into their telemetry systems and improve their ability to detect bootkits.
The YARA rules introduced in the previous sections were refined over a dataset of malicious and non-malicious firmware. In this section, we explore how their detection accuracy can be improved by leveraging more advanced detection capabilities. As shown in Table 7, these improvements come from different perspectives, such as using code analysis techniques that are more advanced than the byte-matching capabilities offered by YARA.
As mentioned before, YARA is not effective at detecting code that clears / restores the WP bit in CR0 when the bitmask value is passed as an argument to AsmWriteCr0
. However, by using static analysis this technique and the related code patch can easily be detected with the following algorithm:
AsmWriteCr0
function memcpy
-like calls and check if the copied size matches the size of the decoded instructions pointed by the source argument. For example, the inline hook by Bootkitty is detected as follows.
We also scanned over 500 samples containing the AsmWriteCr0
function, finding no false positives.
Our FwHunt format and related community scanner fwhunt-scan, allows using semantic information like UEFI GUIDs, protocol, PPI, NVRAM variable, and so on for detection, which also provides an improvement over YARA rules. For example, when writing the YARA rule to detect code that clears the WP bit in CR0, we had to define some code sequences to exclude potential false positives. However, as shown in the following picture, this can be easily specified in FwHunt, making the rule simpler to understand and easier to maintain.
Classifying large sets of samples using only YARA, requires writing individual rules for each bootkit family. However, UEFI semantic information can provide features for accurate sample clustering without the need for rules to be developed, similarly to using IAT information for Windows malware classification.
To create clusters based on UEFI semantic information, we take the following steps:
This clustering technique can be used for quickly triaging new suspicious samples. For example, during our research, we identified one unknown sample using VirusTotal Livehunt (MD5: ee7fd78bde28fe707b6847034e0a59fe
). This sample had zero detections, so we wanted to analyze it and see if it was matching any of the bootkits we encountered so far. By using this clustering technique, we were able to quickly check similarities with previously analyzed samples.
The clustering result visualized with Multidimensional Scaling (MDS) is shown in Figure 16. As we can see, the target sample belongs to Cluster 7 (which represents umap variants) and the closest sample is a Valkyrie sample.
We also went a step further, to understand how clustering using semantic information compares with clustering based on raw data.
We evaluated the results using Adjusted Rand Index (ARI). Below are results with different ratios applied during distance calculation by TLSH in Step 3.
The leftmost result (ARI=0.302) demonstrates that fuzzy hashes of raw binary data are too sensitive for effective clustering, even within the same bootkit family. On the other hand, classification based on only semantic information (second from the left, ARI=0.78) puts samples with little semantic information in the same cluster. Through experimentation, the best ratio of binary:GUID:protocol:NVRAM variable (0.2:0.4:0.3:0.1) yielded the highest ARI score (0.898). Fine-tuning DBSCAN's eps parameter to 110 further improved the ARI to 0.922.
Thus, considering semantic information, we could significantly improve the UEFI sample clustering results.
Semantic-based detection works well in most cases, but detailed analysis may be needed for infection-type bootkits. For example, MoonBounce and CosmicStrand infect UEFI modules, while ESPecter and bootlicker infect bootloaders. Their code is rather small and has no additional semantic information. But what if a new and unknown infections-style bootkit that doesn’t use any of the code patterns found in this research emerges?
To resolve this issue, we propose anomaly detection based on differential firmware analysis: by comparing different versions of the same firmware over time, we can detect unknown threats that can be present in the supply-chain attack space.
We support multiple difference detections in module information (added/changed/removed modules and module dependency expressions) and semantic information. Additionally, modules with identical names and GUIDs are compared, using function representations that allow us to find near-duplicates using Weisfeiler-Lehman LSH. We then calculate the pairwise module similarity based on the percentage of matched duplicates. We also perform capability diffing and its similarity measurement between the modules.
If the differential analysis finds any changes, we additionally run generic malicious code checkers. For instance, we check UEFI service table function hooks, as well as embedded executables that are later scanned using capa rules.
For example, the figure below demonstrates the detection of MoonBounce infection. The process consists of three steps: the changed module is detected based on file hashing, then the module function similarity is calculated, so finally identifying the malicious kernel driver embedded in the bootkit can be detected. Through the differential analysis, our anomaly detection enables identification of subtle yet malicious changes.
The research underscores that traditional bootkit detection technologies are struggling to keep pace with increasingly sophisticated firmware threats. By homing in on OS-persistence techniques, our approach has effectively unearthed previously undetected bootkits, revealing the inadequacies of legacy systems like YARA. Leveraging advanced static analysis, semantic detection, and ML-based clustering, our methodology not only mitigates the risk of false positives but also sets a new standard in proactive firmware security – transforming how we monitor, triage, and counteract these stealthy attacks.
Looking ahead, the work invites a broader rethinking of bootkit detection, pointing to untapped areas such as ARM-based firmware threats and alternative generic detection paradigms. This research lays the groundwork for evolving our defenses against bootkits by integrating diverse detection techniques that capture both known and emergent threats. It is a call to action for the cybersecurity community: to move beyond conventional tools, embrace innovative detection frameworks, and stay ahead in the relentless battle against firmware exploitation.
For this research, we appreciate the help of the following researchers.
They shared their telemetry analysis results or more sample information. The Binarly REsearch team will continue to collaborate with external researchers to fight against common threats.