Security Scoring

How security scores are computed (0-100), scoring factors, score reproducibility, and mapping to certification levels.

5 min read

MCP Hub computes a deterministic security score for every certified version of an MCP server. Scores range from 0 to 100 and are designed to be reproducible, transparent, and version-tracked. They serve as the foundation for certification levels, governance policies, and catalog ranking.

Scoring Dimensions

The scoring model evaluates MCP servers across four dimensions:

Global Score (0-100)

The Global Score is the primary metric shown in catalog listings, badges, and version summaries. It is a weighted composite of the three sub-scores:

Global Score = (Security Score * 0.50) +
               (Supply Chain Score * 0.30) +
               (Maturity Score * 0.20)

The weights (50%, 30%, 20%) are configurable per scoring version and may be adjusted as the platform evolves.

Security Score (0-100)

Evaluates the security posture of the MCP server’s source code. Factors include:

Vulnerabilities: Known CVEs in the code or its dependencies.
Exposed Secrets: API keys, credentials, or tokens found in the source.
Insecure Patterns: Dangerous code patterns such as unrestricted filesystem access, network access without validation, or command injection vectors.
Code Quality: Security-relevant code quality indicators.

Penalties are applied proportionally to the severity of each finding:

Severity	Penalty
Critical	-40 points
High	-20 points
Medium	-10 points
Low	-5 points

The minimum score is 0 – scores cannot go negative. A perfect Security Score of 100 indicates no findings of any severity were detected.

Supply Chain Score (0-100)

Evaluates the health and integrity of the MCP server’s dependency chain:

Vulnerable Dependencies: Known vulnerabilities in direct and transitive dependencies.
License Compliance: Whether all dependency licenses are recognized and approved.
Lockfile Presence: Whether a package lockfile exists to pin dependency versions.
Dependency Pinning: Whether dependencies are pinned to specific versions rather than ranges.
Provenance: Evidence of where dependencies come from.

Transitive dependencies receive an attenuated penalty (50% of the direct dependency impact) to reflect the reduced control publishers have over indirect dependencies.

Maturity Score (0-100)

Evaluates indicators of project maturity and maintainability:

Documentation: Presence of README, API docs, or usage guides.
Tests: Existence of a test suite or test configuration.
CI/CD: Presence of CI/CD configuration files (GitHub Actions, GitLab CI, etc.).
Semantic Versioning: Whether the project uses version tags.
Changelogs: Presence of a CHANGELOG or release notes.

The Maturity Score is based on the presence or absence of these artifacts, not on their quality or coverage depth.

How Scores Are Computed

The Scoring Pipeline

1. Raw Findings (from analysis tools)
       |
       v
2. Normalized Findings (standard format)
       |
       v
3. Controls Mapping (tool-agnostic identifiers)
       |
       v
4. Penalty Calculation (severity-based deductions)
       |
       v
5. Score Computation (per dimension + weighted global)

Step-by-Step Process

Collect Findings: The analysis toolchain produces raw findings from MCP-Scan (and optionally Trivy for SBOM).
Normalize: Findings are converted to a standard format with consistent severity levels, affected locations, and descriptions.
Map to Controls: Each finding is mapped to one or more controls from the platform’s controls catalog. Controls are stable identifiers like SECRET_EXPOSED or KNOWN_VULNERABLE_DEPENDENCY_CRITICAL that do not depend on which tool detected the issue.
Calculate Penalties: For each dimension, penalties are accumulated based on the severity of failed controls. Controls are categorized into their respective dimensions (security, supply chain, maturity).
Compute Scores: Each dimension starts at 100 and subtracts accumulated penalties, with a floor of 0. The Global Score is the weighted average.

Score Reproducibility

MCP Hub guarantees that the same inputs will always produce the same score. This is achieved through three mechanisms:

Toolchain Versioning

Every snapshot records the exact versions of all analysis tools used:

{
  "toolchain_version": "mcp-scan:1.0.0,trivy:0.50.1"
}

Analysis tools run in containers with pinned versions. Vulnerability feeds are snapshotted with timestamps so that the same analysis can be reproduced.

Scoring Version

The scoring algorithm itself is versioned:

{
  "scoring_version": "v2.1"
}

Changes to the scoring weights, penalty values, or computation logic result in a new scoring version. Existing snapshots are never recalculated with a different scoring version – re-evaluation creates new snapshots.

Controls Catalog Version

The mapping from findings to controls is maintained as a versioned catalog:

{
  "controls_catalog_version": "v1.4"
}

When new controls are added or mappings are refined, the catalog version increments. Scores produced with different catalog versions are distinguishable.

No Randomness

The scoring algorithm does not use randomness or variable processing order. Findings are sorted canonically before calculation. Given identical inputs and identical tool/scoring/catalog versions, the output is always identical.

Mapping to Certification Levels

The Global Score maps directly to a certification level:

Certification Level	Name	Minimum Score	Requirements
0	Integrity Verified	0	Digest validation + schema check
1	Static Verified	60	Score >= 60, basic static analysis
2	Security Certified	80	Score >= 80, full analysis with evidence
3	Runtime Certified	90	Score >= 90, dynamic analysis (future)

See Certification Levels for detailed requirements of each level.

Score Display

Scores are displayed throughout the platform:

Catalog listings: Global Score with a letter grade (A through F).
Version detail page: All four scores with breakdown.
Snapshot history: Score trends across snapshots.
API responses: Full score details including weights and toolchain versions.

Letter Grades

Grade	Score Range
A	90-100
B	80-89
C	70-79
D	60-69
F	0-59

Scores and Governance Policies

Enterprise organizations can define governance policies based on scores:

Minimum score thresholds: Deny downloads of MCPs below a specified Global Score.
Per-dimension thresholds: Require minimum Security Score or Supply Chain Score.
CI/CD gating: The /api/v1/ci/evaluate endpoint evaluates MCPs against score-based policies and returns pass/fail results for pipeline integration.

Example policy: “Deny any MCP with a Global Score below 70” would be configured as:

{
  "action": "DENY",
  "target_type": "SCORE",
  "target_value": "min:70"
}

When Scores Change

Scores for a specific snapshot never change – snapshots are immutable. However, the effective score for an MCP version can appear to change when:

Re-evaluation: A new snapshot is created with an updated toolchain or scoring version. The new snapshot may produce different scores.
New vulnerability data: If new CVEs are discovered that affect the MCP’s dependencies, a re-evaluation will reflect the updated findings.

In both cases, the original snapshot with its original scores is preserved for auditability.