The Hidden Danger of Probabilistic Scoring: Introducing Exploitation Maturity Score (EMS)

Authors: Binarly REsearch

In the ongoing race to secure modern infrastructure, organizations are turning to scoring models like the Exploit Prediction Scoring System (EPSS) to prioritize which vulnerabilities to patch first. At first glance, it’s an elegant approach: assign a probability to each CVE based on historical exploitation trends, asset metadata, and statistical modeling. Focus on the highest risks, save time, and win the patching game, right? Not quite.

While EPSS is a useful signal for estimating exploitation likelihood, it’s inherently probabilistic and blind to what’s actually happening right now. It doesn’t tell you whether a proof-of-concept (PoC) has been weaponized, whether ransomware gangs are actively abusing the flaw, or whether it’s landed on the CISA KEV list. It can’t distinguish between a theoretical risk and a live grenade.

This generalization is the hidden danger. Security teams often treat EPSS as gospel, building automation and prioritization workflows around it. But in practice, this can result in a false sense of security or worse, missed signs of active exploitation. In high-stakes environments like healthcare, OT, or embedded firmware, a low EPSS score may delay response for months and sometimes even years to a vulnerability that’s already being exploited and weaponized in the wild.

That’s where Exploitation Maturity Score (EMS) comes in. The Binarly REsearch team designed EMS not to predict the future, but to measure the present by using real-world signals like public PoCs, exploit reliability, ransomware activity, public and private threat intelligence telemetry. It closes the visibility gap left by statistical models, offering a clearer signal on which vulnerabilities are truly dangerous right now.

EPSS is built for scale, making it ideal for developers and asset owners who need a fast, probabilistic way to sort through thousands of CVEs. It’s efficient, abstract, and designed to reduce patching noise. EMS, on the other hand, is grounded in real-world exploitation signals and is purpose-built for security teams who need to act on what’s being exploited now. Where EPSS predicts, EMS confirms. One is about likelihood, the other about real-world evidence setting up the priority.

The Motivation Behind Exploitation Maturity Scoring (EMS)

The motivation behind creating EMS was simple: traditional scoring systems weren’t telling the whole story. While CVSS measures technical severity and EPSS estimates statistical likelihood, neither captures the actual state of exploitation in-the-wild. Security teams were forced to rely on predictions or theoretical models while attackers were already leveraging public PoCs, weaponized exploits, and vulnerabilities listed in the CISA KEV catalog. EMS was built to fill that gap and to shift the focus from what might happen to what is happening now. By scoring vulnerabilities based on real-world exploitation signals, EMS gives security teams a grounded, evidence-based view of what truly demands urgent attention.

We chose a Weighted Linear Scoring Model for EMS because it offers the right balance of transparency, flexibility, and real-world applicability. Each exploitation signal carries a different level of threat, and a weighted model lets us reflect that nuance while keeping the formula simple, transparent and defensible. Unlike black-box proprietary scores, this approach ensures that security teams can understand, trust, and even tune the scoring based on their own threat model.

‍

The EMS builds directly on the principles of a Weighted Linear Scoring Model, assigning each exploitation signal a specific weight based on its threat significance. However, while the model is mathematically simple, EMS is carefully tuned to reflect the realities of modern threat landscapes. Each input is weighted according to how strongly it correlates with active exploitation. Below is the specific EMS formula that brings this weighted logic to life.

Specific EMS formula that brings this weighted logic to life.

This EMS scoring model transforms raw threat intelligence into a practical signal for defenders. By aligning technical indicators with real-world attacker behavior, EMS ensures that security teams focus not just on what could be exploited, but on what is being exploited. It turns external noise into actionable insight, prioritization that’s grounded in reality, not just prediction.

BTP v3 representation of EMS in TI monitoring

EMS Was Born from Reality, Not Theory

The Exploitation Maturity Score (EMS) wasn’t designed in a vacuum or derived from abstract threat models, it was forged in response to real-world failures of traditional scoring systems. Security teams repeatedly encountered vulnerabilities labeled as “low risk” by predictive models, only to discover active exploits already circulating in GitHub repos, dark web forums, or ransomware payloads. EMS was created to close that gap by reflecting the actual exploitation status of a vulnerability. Below is an example of EMS calculation using Python code.

def calculate_ems(
    total_exploits=0,
    verified_exploits=0,
    public_pocs=0,
    forks=0,
    watchers=0,
    is_kev=False,
    ransomware_use=False,
    in_the_wild_detected_exploits=0,
    in_the_wild_detected_pocs=0
):
    """
    Calculate Exploitation Maturity Score (EMS) 
    using weighted linear scoring model.
    Returns a value between 0 and 10.
    """
    score = (
        total_exploits * 2 +
        verified_exploits * 2 +
        public_pocs * 1 +
        forks * 0.05 +
        watchers * 0.05 +
        (4 if is_kev else 0) +
        (2 if ransomware_use else 0) +
        in_the_wild_detected_exploits * 2 +
        in_the_wild_detected_pocs * 1
    )
    return min(score, 10.0)

To illustrate how EMS works in practice, let’s look at one of the most exploited vulnerabilities in recent memory: Log4Shell (CVE-2021-44228).

#Log4Shell CVE-2021-44228 inputs
ems_score = calculate_ems(
    total_exploits=1,
    verified_exploits=1,
    public_pocs=3,
    forks=100,
    watchers=20,
    is_kev=True,
    ransomware_use=True,
    in_the_wild_detected_exploits=3,
    in_the_wild_detected_pocs=2
)

#EMS Score for CVE-2021-44228 (Log4Shell): 10.00
#EPSS Score for CVE-2021-44228 (Log4Shell): 0.94

The Exploit Prediction Scoring System (EPSS) score for CVE-2021-44228 (Log4Shell) is approximately 0.94381, indicating a high likelihood of exploitation. In this case, both EMS and EPSS indicate a similar urgency level.

The CVE-2022-22963 example is where EMS proves its value by capturing the real-time exploitation maturity that EPSS fails to reflect. Despite being a remote code execution vulnerability affecting Spring Cloud Function (a widely used component in cloud-native environments), CVE-2022-22963 initially got a very low EPSS score, hovering around 1.2%. From an EPSS perspective, this vulnerability looked benign and wasn’t prioritized by most automation pipelines.

However, real-world events told a different story: within 24 hours, public PoCs appeared on GitHub, followed by a working Metasploit module and widespread scanning activity. It was soon added to the CISA KEV catalog, confirming active exploitation in-the-wild. Let’s walk through the EMS score for this CVE based on those observable signals.

#CVE-2022-22963 real-world EMS signals
ems_score = calculate_ems(
    total_exploits=1,              
    verified_exploits=1,          
    public_pocs=2,                
    forks=40,                     
    watchers=5,                    
    is_kev=True,                   
    ransomware_use=False,      
    in_the_wild_detected_exploits=1,         
    in_the_wild_detected_pocs=1

#EMS Score for CVE-2022-22963:  10.00
#EPSS Score for CVE-2022-22963: 0.94

Today, the EPSS score for CVE-2022-22963 is 0.94, which aligns well with EMS prioritization and supporting data points. While EPSS provides valuable insight into the likelihood of exploitation, its score for a given CVE can evolve over time due to the dynamic nature of threat intelligence. However, it’s important to note that EPSS updates may lag in the early days following the disclosure of a new vulnerability making it less reliable during that initial time window.

Example of escalating EMS and EPSS change over the time and representation on BTP v3

EMS Visual Explainability

Binarly REsearch team remains deeply committed to evolving EMS by transforming it into a real-time threat signal that continuously adapts to the shifting landscape of exploitation. As software supply chain attacks become more frequent and sophisticated and accurate prioritization is no longer optional, we will continue expanding EMS data representation to integrate richer risk-centric context often overlooked by traditional scanners. By surfacing real-world exploitation signals earlier EMS helping enterprises with more actionable prioritization and focus on what matter the most in specific period of time.

Utilizing the heatmap concept for priority setting transforms EMS from a raw score into an intuitive, visual decision-making tool. Instead of sifting through complex metrics or lengthy reports, security teams can quickly interpret the criticality of a vulnerability based on color intensity across key exploitation signals coming from the real-time threat intelligence. The heatmap reveals which factors are contributing most to the EMS, helping to understand why a vulnerability demands urgent attention. This visual clarity accelerates triage, improves team alignment, and supports more defensible prioritization decisions in high-pressure environments where time and accuracy are critical.

The EMS vs EPSS Percentile Scatter Plot offers a powerful visual intersection of exploitation evidence and statistical likelihood, helping security teams identify vulnerabilities that are both actively dangerous and likely to be targeted next. By plotting EMS scores against EPSS percentiles, organizations can clearly see which CVEs fall into high-risk quadrants, such as those with confirmed exploitation and high predictive probability. It also surfaces hidden risks: CVEs with low EPSS but high EMS that might otherwise be overlooked. This dual-axis view provides a more nuanced, action-ready understanding of risk, enabling smarter prioritization and faster response.

This plot maps vulnerabilities across two dimensions:

X-axis: EMS Score - real-world exploitation maturity (what is being exploited)
Y-axis: EPSS Percentile - predictive likelihood of exploitation (what might be exploited soon)

Quadrant I: “Critical Now” (High EMS, High EPSS).

Action: Top priority, patch immediately.

Quadrant II: “Niche Active Threat” (High EPSS, Low EMS).

Action: Investigate, may target specific vendors, verticals, or firmware.

Quadrant III: “Monitor” (Low EMS, Low EPSS).

Action: Monitor or defer based on asset exposure.

Quadrant IV: “Likely Exploited Soon” (Low EPSS, High EMS).

Action: Preemptively patch or harden, especially if the CVE applies to exposed systems.

In a threat landscape where attackers move faster than ever, relying solely on predictive models like EPSS is not enough. The Exploitation Maturity Score (EMS) brings critical real-world context to vulnerability prioritization by surfacing what’s actively being exploited right now. By combining EMS with EPSS and visual tools like heatmaps and scatter plots, organizations gain a clearer, more actionable understanding of their exposure. This layered approach empowers security teams to respond with greater speed and precision, focusing resources on what truly matters. In the end, EMS isn’t just a score, it’s a reality check that helps turn threat data into defensible decisions.

In our recent release of the Binarly Transparency Platform, we integrated EMS to bring real-time clarity and actionability to vulnerability prioritization. As the threat landscape evolves rapidly, EMS helps security teams stay ahead by highlighting actively exploited vulnerabilities based on live threat intelligence signals and not just theoretical risk.

The Hidden Danger of Probabilistic Scoring: Introducing Exploitation Maturity Score (EMS)

Authors: Binarly REsearch

The Motivation Behind Exploitation Maturity Scoring (EMS)

EMS Was Born from Reality, Not Theory

EMS Visual Explainability

What's lurking in your firmware?

Platform

REsearch

Learn

Company

Platform

REsearch

Learn

Company