Certification Pipeline

How the certification pipeline works end-to-end, from ingestion through AMQP-based analysis to scoring, snapshot creation, and publishing.

The certification pipeline is the core workflow that transforms raw MCP server source code into a certified, scored, and published artifact. It operates as an asynchronous, event-driven system using AMQP (LavinMQ) for job distribution between services.

Pipeline Overview

The pipeline consists of five phases, each handled by a dedicated job type:

1. Detection         Scheduler detects new commit via git ls-remote
       |
       v
2. Ingestion         Worker clones repo, creates tarball, uploads to S3
       |
       v
3. Analysis          Scan-worker runs security analysis (MCP-Scan + optional Trivy)
       |
       v
4. Scoring           Hub-worker maps findings to controls, computes scores, creates snapshot
       |
       v
5. Publishing        Hub-worker publishes certified artifact to mcp-registry

Phase 1: Detection

The scheduler service runs continuously and polls registered MCP repositories for new commits. It executes git ls-remote on each active MCP’s configured branch and compares the result against the last known commit hash in the database.

  • Polling interval: Daily by default; configurable per MCP for Enterprise organizations.
  • Jitter: A small random delay is added to each poll to avoid thundering herd problems.
  • Backoff: If a poll fails (e.g., repository is temporarily unavailable), the scheduler applies exponential backoff before retrying.

When a new commit is detected:

  1. A new MCPVersion record is created with status PENDING_ANALYSIS.
  2. An INGEST_COMMIT message is published to the AMQP exchange.

Alternative triggers include Git webhooks (for real-time detection) and direct uploads via the CLI.

Phase 2: Ingestion (INGEST_COMMIT)

The hub-worker consumes the INGEST_COMMIT message and performs the following steps:

  1. Clone: Clone the Git repository at the specific commit hash.
  2. Verify: Confirm the commit hash matches the expected value.
  3. Package: Create a compressed tarball (.tar.gz) of the source code.
  4. Upload: Upload the tarball to S3 storage with a deterministic key: sources/mcp-{uuid}/{commit_hash}.tar.gz.
  5. Update: Set the version status to INGESTED and record the S3 key and file size.
  6. Dispatch: Publish an ANALYZE_VERSION message to AMQP for the next phase.

Size limits and timeouts are enforced to protect against zip bombs and excessively large repositories. The ingestion timeout is 10-30 minutes depending on repository size.

Phase 3: Analysis (ANALYZE_VERSION)

Analysis is delegated to the scan-worker via AMQP. The hub-worker publishes an ANALYZE job, and the scan-worker processes it independently:

  1. The hub-worker uploads the source tarball to S3 and publishes an ANALYZE message.
  2. The scan-worker (running mcp-scan) downloads the tarball, extracts it, and runs the security analysis toolchain.
  3. Analysis results are uploaded to S3 as structured JSON.
  4. The scan-worker publishes an ANALYZE_COMPLETE message back to AMQP.

The analysis toolchain includes:

  • MCP-Scan: Specialized static analysis engine that detects 14 vulnerability classes (A through N) using pattern matching, taint analysis, and optional AI-assisted detection. Supports Python, TypeScript, JavaScript, and Go.
  • Trivy (optional): For SBOM (Software Bill of Materials) generation and dependency vulnerability scanning.

The scan-worker is a separate container with all analysis tools pre-installed. The hub-worker itself does not execute any analysis tools locally.

Phase 4: Scoring and Snapshot Creation

When the hub-worker receives the ANALYZE_COMPLETE message, it downloads the analysis results from S3 and performs post-analysis processing:

Findings Normalization

Raw tool outputs are normalized into a standard findings format with consistent fields for severity, location, description, and remediation guidance.

Controls Mapping

Normalized findings are mapped to the platform’s controls catalog – a set of stable, tool-agnostic security control identifiers:

Raw Findings (tool-specific)
    --> Normalized Findings
        --> Controls (tool-agnostic)
            --> Scores (deterministic)

Examples of controls: SECRET_EXPOSED, KNOWN_VULNERABLE_DEPENDENCY_CRITICAL, NO_LOCKFILE, UNPINNED_DEPENDENCIES.

Score Computation

Scores are computed deterministically across four dimensions:

DimensionWeightWhat It Measures
Security Score50%Vulnerabilities, secrets, insecure patterns
Supply Chain Score30%Dependencies, licenses, lockfiles
Maturity Score20%Documentation, tests, CI/CD configuration
Global ScoreWeighted composite of the above three

Penalty values are applied based on finding severity: Critical (-40), High (-20), Medium (-10), Low (-5). The minimum score is 0.

Snapshot Creation

An immutable security snapshot is created containing all scores, control results, findings counts, and versioning metadata (toolchain version, scoring version, controls catalog version). The version status is updated to CERTIFIED.

Phase 5: Publishing (PUBLISH_TO_REGISTRY)

After certification, the hub-worker publishes the certified artifact to mcp-registry:

  1. The global score is mapped to a certification level (0-3).
  2. A manifest is generated from the snapshot data.
  3. The hub-worker calls the registry’s publish API with a service token.
  4. The version record is updated with the registry URL and tarball SHA-256 hash.

The certified MCP is now available for download from the registry by end users via mcp-client.

AMQP Job Flow

The pipeline uses a direct exchange (mcp.jobs) with dedicated queues per job type:

QueueRouting KeyPurpose
mcp.jobs.ingest_commitingest_commitClone and upload source
mcp.jobs.analyze_versionanalyze_versionRun security analysis
mcp.jobs.publish_to_registrypublish_to_registryPublish to registry
mcp.jobs.generate_reportgenerate_reportGenerate PDF report
mcp.jobs.reevaluate_versionreevaluate_versionRe-analyze existing version

Retry and Error Handling

Failed jobs are retried with exponential backoff using dedicated retry queues:

AttemptRetry QueueDelay
1mcp.jobs.retry.1m1 minute
2mcp.jobs.retry.2m2 minutes
3mcp.jobs.retry.5m5 minutes
4mcp.jobs.retry.15m15 minutes
5mcp.jobs.retry.1h1 hour

After all retry attempts are exhausted, the job is moved to a dead letter queue (mcp.jobs.dead_letter) for manual inspection.

Message Format

Each AMQP message contains:

{
  "job_id": "uuid-v4",
  "type": "INGEST_COMMIT",
  "idempotency_key": "mcp-uuid:commit-hash",
  "mcp_id": "uuid-v4",
  "version_id": "uuid-v4",
  "org_id": "public",
  "payload": { ... },
  "attempts": 0,
  "max_attempts": 5,
  "created_at": "2026-01-22T10:00:00Z"
}

Messages are persistent (delivery_mode: 2) and use idempotency keys to prevent duplicate processing.

Delivery Guarantees

  • At-least-once delivery: If a worker fails to acknowledge a message, it is redelivered.
  • No concurrent duplicates: Each message is delivered to exactly one worker at a time (via AMQP consumer prefetch).
  • Horizontal scaling: Multiple workers can consume from the same queues; AMQP distributes messages via round-robin.

Idempotency

The pipeline is designed for at-least-once processing, so all operations must be idempotent:

  • Version uniqueness: UNIQUE (mcp_id, commit_hash) prevents duplicate versions.
  • Job deduplication: UNIQUE (type, idempotency_key) prevents duplicate jobs.
  • S3 key determinism: Storage keys are derived deterministically from version and snapshot identifiers.
  • Snapshot uniqueness: UNIQUE (version_id, toolchain_version, scoring_version, controls_catalog_version) prevents duplicate snapshots.