MCP server¶

KDBL Context Lake (K-Lake) ships a Model Context Protocol (MCP) server so AI clients — Claude Desktop, IDE assistants, custom agents — can search and retrieve your crawled and extracted content directly, without a bespoke integration. It speaks MCP revision 2025-11-25 over the Streamable HTTP transport on a single endpoint (/mcp).

What it exposes: read-only search over extracted file text and retrieval of that text plus file metadata. It never serves original file bytes — K-Lake doesn't store those; only extracted text and metadata are returned.

The security posture has three parts, and all three are always on when the server is enabled:

OAuth 2.1 Resource Server auth — every call carries a bearer token validated against an MCP-specific audience. No token, wrong audience, or wrong origin is rejected before any query runs.
Per-file trimming — every tool opens a row-level-security-scoped transaction, so a caller sees only the files they are authorized to see. See per-file security trimming.
Audit — every operation (and every auth denial) is recorded with who/what/when/where, both to a queryable table and to the structured log stream.

There is currently no UI for MCP configuration or for browsing the audit log. You enable the server with environment variables, smoke-test and review audit entries with the kdbl-control CLI, and query the audit log over the API.

Enabling the server¶

The MCP server ships dark. It is mounted only when KDBL_MCP_ENABLED is true, and the API process fails fast at boot if you enable it without setting a resource URI — there is no audience to validate tokens against otherwise.

Set these on the kdbl-api deployment:

Env var	Default	Purpose
`KDBL_MCP_ENABLED`	`false`	Master switch. When false, neither `/mcp` nor the metadata routes are mounted.
`KDBL_MCP_RESOURCE_URI`	—	Canonical MCP resource URI, e.g. `https://kdbl.example.com/mcp`. Advertised in metadata as `resource` and is the `aud` every OIDC token must target (RFC 8707). Required when enabled — boot fails if empty.
`KDBL_MCP_SCOPES`	`kdbl.search,kdbl.read`	Comma-separated scopes advertised in the metadata document (`scopes_supported`).
`KDBL_MCP_ALLOWED_ORIGINS`	empty	Comma-separated `Origin` allowlist for browser clients. Empty rejects any request that carries an `Origin` header (DNS-rebinding protection). Server-to-server clients send no `Origin` and pass.
`KDBL_MCP_ALLOW_API_AUDIENCE`	`false`	Escape hatch: when true, the MCP audience check may fall back to the tenant's existing API audience for IdPs that can't mint resource-scoped tokens. Weakens the no-token-passthrough guarantee — leave off unless you need it.

Authentication¶

The MCP endpoint is an OAuth 2.1 Resource Server. The kdbl-api reuses its existing token-resolution logic, but validates against the MCP audience rather than the API audience.

Supported credential types¶

Credential	How it's recognized	Notes
OIDC bearer token	A JWT whose issuer matches a configured tenant	The primary path. The token's `aud` must equal the configured MCP resource URI. Tenant-scoped.
Personal access token (PAT)	`kdblpat_…` prefix	First-party token minted from `/api/tokens`. Tenant-scoped.
Cluster-admin token	Matches the configured cluster-admin secret	Authenticates, but the tools are tenant-scoped — a cluster-admin token has no tenant to scope to, so tool calls return a "tenant-scoped" error. Useful for smoke-testing `initialize`/`tools/list`, not for retrieval.

Audience binding¶

For OIDC tokens, the expected audience is resolved per-tenant from oidc_config.mcp_audience on the tenant record. A token whose aud does not match is rejected with 401. If a tenant has no mcp_audience set and KDBL_MCP_ALLOW_API_AUDIENCE is off, that tenant is simply unreachable over MCP (its tokens 401) — the server never falls back to validating an MCP request against the API audience unless you explicitly opt in.

This is the RFC 8707 model: mint tokens whose audience is the MCP resource URI, so an MCP token can't be replayed against the main API and vice versa.

Protected Resource Metadata (discovery)¶

An MCP client that hits /mcp without a token gets a 401 carrying a WWW-Authenticate: Bearer challenge with a resource_metadata= pointer. Following that pointer, the client fetches the RFC 9728 Protected Resource Metadata document — served unauthenticated at two routes:

/.well-known/oauth-protected-resource
/.well-known/oauth-protected-resource/mcp

Both return the same JSON:

{
  "resource": "https://kdbl.example.com/mcp",
  "authorization_servers": ["https://login.example.com/tenant-a/v2.0"],
  "scopes_supported": ["kdbl.search", "kdbl.read"],
  "bearer_methods_supported": ["header"]
}

authorization_servers is the set of distinct OIDC issuers configured across all tenants — K-Lake is multi-tenant and multi-IdP, so a client picks the authorization server that issued it and obtains a token whose aud is the single canonical resource.

Identity providers¶

OIDC validation is shared with the main API, so any standards-compliant provider works: Microsoft Entra ID (Azure AD), Google, Okta, and Keycloak are all supported. Per-tenant settings (issuer, audience, groups claim) live on the tenant's oidc_config; the MCP path adds one field, mcp_audience. Group-to-identity correlation for per-file trimming is covered in directory sync.

Origin guard¶

When a request carries an Origin header, it must appear in KDBL_MCP_ALLOWED_ORIGINS or the request is rejected with 403 (origin not allowed). This protects against DNS-rebinding from browser contexts. Non-browser clients send no Origin and are unaffected.

Connecting a client¶

Point your MCP client at the Streamable HTTP endpoint:

https://<your-kdbl-host>/mcp

The endpoint accepts a single POST per JSON-RPC request and replies application/json. There is no server-initiated SSE stream — a GET /mcp returns 405. Configure the client for OAuth: it discovers the authorization server and resource via the metadata document and obtains a token with the MCP aud.

Fetch the metadata yourself, unauthenticated, to confirm the server is up and advertising the right issuers and resource:

curl "$KDBL_URL/.well-known/oauth-protected-resource/mcp"

End-to-end smoke test (fetches metadata, then runs initialize + tools/list with your token):

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" mcp smoke

The report includes the metadata document, the initialize response (protocol version + server info), and the advertised tool list.

Tools¶

Five read-only JSON-RPC tools are exposed via tools/list / tools/call. Every one runs inside an RLS-scoped transaction, so results are already trimmed to the caller's authorization. The tools are tenant-scoped — a cluster-admin token cannot call them for retrieval.

Tool	Arguments	Returns
`search_content`	`query` (required), `source_id` (optional), `limit` (1–100, default 20)	Ranked snippets over extracted text the caller can see, each with a citation (source, key, page/char/timestamp locator) and a resource link to fetch full text.
`get_file_text`	`source_id`, `key` (both required)	Extraction status, chunk count, and the ordered text chunks for one file, plus the reconstructed full text as an embedded resource (capped at 5000 chunks).
`get_file_window`	`source_id`, `key` (both required), `start` (chunk cursor, default 0), `length` (1–200, default 40)	A bounded, sliding window of a file's text by chunk position. Returns only the window (an indexed range scan, never the whole file), `total_chunks` for planning full coverage, a `char_range`, and a `next_cursor` to slide forward. Use it to read a large document in pieces, or to pull context around a `search_content` hit's `seq`.
`get_file_metadata`	`source_id`, `key` (both required)	Size, mtime, etag, storage class, owner, indexed-at, content type, and POSIX mode/uid/gid where captured.
`list_sources`	`cursor` (optional), `limit` (1–200, default 50)	Paginated list of sources the caller can access (id, protocol, enabled, last-indexed-at), with a `nextCursor`.
`list_files`	`source_id` (required), `cursor` (optional), `limit` (1–200, default 50)	Paginated list of files within one source (key, size, mtime), with a `nextCursor`.

Files are also addressable as MCP resources via a kdbl://file/<source_id>/<key> URI (percent-encoded components). A resources/read of that URI returns the same body as get_file_text.

Audit¶

Every tool call, resource read, and auth denial is recorded twice: a row in the mcp_audit table, and a structured tracing event on target mcp.audit. Both are best-effort with respect to the request — an audit-write failure is logged but never fails the MCP call. Tool arguments are sanitized before they're stored (long strings truncated; the bearer token is a header and is never in arguments).

The `mcp_audit` table¶

Field	Meaning
`id`	Record id.
`tenant_id`	Owning tenant. `NULL` for pre-identity denials.
`user_id`	Resolved user. `NULL` for anonymous denials.
`principal`	Email / `oidc:iss:sub` / first-party identifier. The who.
`principal_kind`	`oidc` \| `pat` \| `cluster_admin` \| `anonymous`.
`tool`	`search_content`, `resources/read`, `(auth)` for denials, etc. The what.
`arguments`	Sanitized argument summary (no secrets).
`sources`	Source ids touched/returned. The where (in the data).
`keys_returned`	Count of file keys returned.
`keys_sample`	Bounded sample (≤20) of returned keys for spot audits.
`status`	`ok` \| `denied` \| `error` \| `not_found`.
`error_code`	e.g. `invalid_token`, `not_found`, `bad_request`.
`row_count`	Hits / chunks / rows returned.
`client_ip`	Caller IP (proxy-forwarded when behind nginx). The where (on the network).
`origin`	`Origin` header.
`user_agent`	Client user agent.
`session_id`	`MCP-Session-Id` when present.
`request_id`	JSON-RPC id, for trace correlation.
`protocol_ver`	`MCP-Protocol-Version`.
`created_at`	Timestamp. The when.

The table is RLS-scoped: a tenant sees only its own rows; a cluster admin sees all.

Reviewing the audit log¶

CLI — newest first, with optional filters:

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" mcp audit
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" mcp audit \
  --tool search_content --principal alice@example.com --limit 100

API — the CLI is a thin wrapper over this endpoint:

curl -H "Authorization: Bearer $KDBL_TOKEN" \
     "$KDBL_URL/api/mcp/audit?tool=search_content&limit=100"

Method	Path	Description
`GET`	`/api/mcp/audit`	List MCP audit entries, newest first. Query params: `tool`, `principal`, `limit` (1–500, default 50). RLS-scoped to the caller's tenant.

Streaming to your SIEM¶

The parallel mcp.audit tracing event carries the same fields as JSON on stdout, so you don't need DB access to feed a security service. Filter your log pipeline on target: "mcp.audit" to capture every MCP operation and denial. See Telemetry for the logging setup.