Skip to content

MCP server

KDBL Context Lake (K-Lake) ships a Model Context Protocol (MCP) server so AI clients — Claude Desktop, IDE assistants, custom agents — can search and retrieve your crawled and extracted content directly, without a bespoke integration. It speaks MCP revision 2025-11-25 over the Streamable HTTP transport on a single endpoint (/mcp).

What it exposes: read-only search over extracted file text and retrieval of that text plus file metadata. It never serves original file bytes — K-Lake doesn't store those; only extracted text and metadata are returned.

The security posture has three parts, and all three are always on when the server is enabled:

  • OAuth 2.1 Resource Server auth — every call carries a bearer token validated against an MCP-specific audience. No token, wrong audience, or wrong origin is rejected before any query runs.
  • Per-file trimming — every tool opens a row-level-security-scoped transaction, so a caller sees only the files they are authorized to see. See per-file security trimming.
  • Audit — every operation (and every auth denial) is recorded with who/what/when/where, both to a queryable table and to the structured log stream.

There is currently no UI for MCP configuration or for browsing the audit log. You enable the server with environment variables, smoke-test and review audit entries with the kdbl-control CLI, and query the audit log over the API.

Enabling the server

The MCP server ships dark. It is mounted only when KDBL_MCP_ENABLED is true, and the API process fails fast at boot if you enable it without setting a resource URI — there is no audience to validate tokens against otherwise.

Set these on the kdbl-api deployment:

Env var Default Purpose
KDBL_MCP_ENABLED false Master switch. When false, neither /mcp nor the metadata routes are mounted.
KDBL_MCP_RESOURCE_URI Canonical MCP resource URI, e.g. https://kdbl.example.com/mcp. Advertised in metadata as resource and is the aud every OIDC token must target (RFC 8707). Required when enabled — boot fails if empty.
KDBL_MCP_SCOPES kdbl.search,kdbl.read Comma-separated scopes advertised in the metadata document (scopes_supported).
KDBL_MCP_ALLOWED_ORIGINS empty Comma-separated Origin allowlist for browser clients. Empty rejects any request that carries an Origin header (DNS-rebinding protection). Server-to-server clients send no Origin and pass.
KDBL_MCP_ALLOW_API_AUDIENCE false Escape hatch: when true, the MCP audience check may fall back to the tenant's existing API audience for IdPs that can't mint resource-scoped tokens. Weakens the no-token-passthrough guarantee — leave off unless you need it.

Authentication

The MCP endpoint is an OAuth 2.1 Resource Server. The kdbl-api reuses its existing token-resolution logic, but validates against the MCP audience rather than the API audience.

Supported credential types

Credential How it's recognized Notes
OIDC bearer token A JWT whose issuer matches a configured tenant The primary path. The token's aud must equal the configured MCP resource URI. Tenant-scoped.
Personal access token (PAT) kdblpat_… prefix First-party token minted from /api/tokens. Tenant-scoped.
Cluster-admin token Matches the configured cluster-admin secret Authenticates, but the tools are tenant-scoped — a cluster-admin token has no tenant to scope to, so tool calls return a "tenant-scoped" error. Useful for smoke-testing initialize/tools/list, not for retrieval.

Audience binding

For OIDC tokens, the expected audience is resolved per-tenant from oidc_config.mcp_audience on the tenant record. A token whose aud does not match is rejected with 401. If a tenant has no mcp_audience set and KDBL_MCP_ALLOW_API_AUDIENCE is off, that tenant is simply unreachable over MCP (its tokens 401) — the server never falls back to validating an MCP request against the API audience unless you explicitly opt in.

This is the RFC 8707 model: mint tokens whose audience is the MCP resource URI, so an MCP token can't be replayed against the main API and vice versa.

Protected Resource Metadata (discovery)

An MCP client that hits /mcp without a token gets a 401 carrying a WWW-Authenticate: Bearer challenge with a resource_metadata= pointer. Following that pointer, the client fetches the RFC 9728 Protected Resource Metadata document — served unauthenticated at two routes:

  • /.well-known/oauth-protected-resource
  • /.well-known/oauth-protected-resource/mcp

Both return the same JSON:

{
  "resource": "https://kdbl.example.com/mcp",
  "authorization_servers": ["https://login.example.com/tenant-a/v2.0"],
  "scopes_supported": ["kdbl.search", "kdbl.read"],
  "bearer_methods_supported": ["header"]
}

authorization_servers is the set of distinct OIDC issuers configured across all tenants — K-Lake is multi-tenant and multi-IdP, so a client picks the authorization server that issued it and obtains a token whose aud is the single canonical resource.

Identity providers

OIDC validation is shared with the main API, so any standards-compliant provider works: Microsoft Entra ID (Azure AD), Google, Okta, and Keycloak are all supported. Per-tenant settings (issuer, audience, groups claim) live on the tenant's oidc_config; the MCP path adds one field, mcp_audience. Group-to-identity correlation for per-file trimming is covered in directory sync.

Origin guard

When a request carries an Origin header, it must appear in KDBL_MCP_ALLOWED_ORIGINS or the request is rejected with 403 (origin not allowed). This protects against DNS-rebinding from browser contexts. Non-browser clients send no Origin and are unaffected.

Connecting a client

Point your MCP client at the Streamable HTTP endpoint:

https://<your-kdbl-host>/mcp

The endpoint accepts a single POST per JSON-RPC request and replies application/json. There is no server-initiated SSE stream — a GET /mcp returns 405. Configure the client for OAuth: it discovers the authorization server and resource via the metadata document and obtains a token with the MCP aud.

Fetch the metadata yourself, unauthenticated, to confirm the server is up and advertising the right issuers and resource:

curl "$KDBL_URL/.well-known/oauth-protected-resource/mcp"

End-to-end smoke test (fetches metadata, then runs initialize + tools/list with your token):

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" mcp smoke

The report includes the metadata document, the initialize response (protocol version + server info), and the advertised tool list.

Tools

Five read-only JSON-RPC tools are exposed via tools/list / tools/call. Every one runs inside an RLS-scoped transaction, so results are already trimmed to the caller's authorization. The tools are tenant-scoped — a cluster-admin token cannot call them for retrieval.

Tool Arguments Returns
search_content query (required), source_id (optional), limit (1–100, default 20) Ranked snippets over extracted text the caller can see, each with a citation (source, key, page/char/timestamp locator) and a resource link to fetch full text.
get_file_text source_id, key (both required) Extraction status, chunk count, and the ordered text chunks for one file, plus the reconstructed full text as an embedded resource (capped at 5000 chunks).
get_file_window source_id, key (both required), start (chunk cursor, default 0), length (1–200, default 40) A bounded, sliding window of a file's text by chunk position. Returns only the window (an indexed range scan, never the whole file), total_chunks for planning full coverage, a char_range, and a next_cursor to slide forward. Use it to read a large document in pieces, or to pull context around a search_content hit's seq.
get_file_metadata source_id, key (both required) Size, mtime, etag, storage class, owner, indexed-at, content type, and POSIX mode/uid/gid where captured.
list_sources cursor (optional), limit (1–200, default 50) Paginated list of sources the caller can access (id, protocol, enabled, last-indexed-at), with a nextCursor.
list_files source_id (required), cursor (optional), limit (1–200, default 50) Paginated list of files within one source (key, size, mtime), with a nextCursor.

Files are also addressable as MCP resources via a kdbl://file/<source_id>/<key> URI (percent-encoded components). A resources/read of that URI returns the same body as get_file_text.

Audit

Every tool call, resource read, and auth denial is recorded twice: a row in the mcp_audit table, and a structured tracing event on target mcp.audit. Both are best-effort with respect to the request — an audit-write failure is logged but never fails the MCP call. Tool arguments are sanitized before they're stored (long strings truncated; the bearer token is a header and is never in arguments).

The mcp_audit table

Field Meaning
id Record id.
tenant_id Owning tenant. NULL for pre-identity denials.
user_id Resolved user. NULL for anonymous denials.
principal Email / oidc:iss:sub / first-party identifier. The who.
principal_kind oidc | pat | cluster_admin | anonymous.
tool search_content, resources/read, (auth) for denials, etc. The what.
arguments Sanitized argument summary (no secrets).
sources Source ids touched/returned. The where (in the data).
keys_returned Count of file keys returned.
keys_sample Bounded sample (≤20) of returned keys for spot audits.
status ok | denied | error | not_found.
error_code e.g. invalid_token, not_found, bad_request.
row_count Hits / chunks / rows returned.
client_ip Caller IP (proxy-forwarded when behind nginx). The where (on the network).
origin Origin header.
user_agent Client user agent.
session_id MCP-Session-Id when present.
request_id JSON-RPC id, for trace correlation.
protocol_ver MCP-Protocol-Version.
created_at Timestamp. The when.

The table is RLS-scoped: a tenant sees only its own rows; a cluster admin sees all.

Reviewing the audit log

CLI — newest first, with optional filters:

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" mcp audit
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" mcp audit \
  --tool search_content --principal alice@example.com --limit 100

API — the CLI is a thin wrapper over this endpoint:

curl -H "Authorization: Bearer $KDBL_TOKEN" \
     "$KDBL_URL/api/mcp/audit?tool=search_content&limit=100"
Method Path Description
GET /api/mcp/audit List MCP audit entries, newest first. Query params: tool, principal, limit (1–500, default 50). RLS-scoped to the caller's tenant.

Streaming to your SIEM

The parallel mcp.audit tracing event carries the same fields as JSON on stdout, so you don't need DB access to feed a security service. Filter your log pipeline on target: "mcp.audit" to capture every MCP operation and denial. See Telemetry for the logging setup.