KDBL Context Lake (K-Lake) MCP Server — Skills Reference¶
The K-Lake knowledge platform exposes its crawled-and-extracted content to LLM agents over the Model Context Protocol (MCP). This is the capability ("skills") reference for that server: the tools it offers, what each returns, the security model, and the usage patterns that get good answers.
Every call is security-trimmed per tenant, source, and file (enforced by the datastore's row-level security) and audited. The model only ever sees content the calling principal is authorised to see.
Connection¶
| Endpoint | POST /mcp on kdbl-api (in-cluster: http://kdbl-api.kdbl.svc.cluster.local/mcp) |
| Transport | Streamable HTTP, JSON-RPC 2.0. POST-only — GET /mcp returns 405 (no SSE channel) |
| Protocol revision | 2025-11-25 |
| Auth | Authorization: Bearer <token> — a tenant PAT (kdblpat_…, offline hash check, no IdP) or an OIDC access token. Tenant-scoped: a cluster-admin token is rejected (MCP tools are tenant-scoped). |
| Discovery | Protected-Resource Metadata at /.well-known/oauth-protected-resource |
| Enable | KDBL_MCP_ENABLED=true on kdbl-api; scopes kdbl.search, kdbl.read |
| Smoke test | kdbl-control --api-url … --api-token <PAT> mcp smoke (initialize → tools/list → tools/call) |
Note: the ingress only proxies
/api/, so in-cluster MCP clients must hitkdbl-api.kdbl.svc.cluster.local/mcpdirectly, not the public front door.
Result shape¶
Each tools/call returns both a back-compat content array (a JSON text block,
plus any citations/resources) and a validated structuredContent object. Search
hits carry resource_link citations; file reads can carry an embedded
resource. Files are addressed by a stable URI:
These URIs are resolvable via MCP resources/read (returns the same body as
get_file_text), and resources/list enumerates them.
Skills (tools)¶
Six tools: one for finding content, three for reading it, two for browsing the catalogue.
🔍 search_content — search (start here)¶
Search over extracted content the caller may see. Hybrid by default (semantic
+ keyword search, fused and reranked); set mode:"lexical" for keyword-only search.
Input: query (required), source_id (optional — restrict to one source),
limit (default 20, max 100), mode ("hybrid" default | "lexical"),
path / content_type / modified_after / modified_before (optional filters).
Output: { hits: [ … ] }, each hit:
- source_id, key, seq — the citation + chunk locator
- text — the matched chunk and its neighbouring chunks (a small window,
expanded server-side, bounded ~1800 chars). Answer from this.
- truncated — true if the window was longer than returned (read on with
get_file_window)
- snippet — a <mark>-highlighted fragment (a locator, not the full text)
- page_no / char_start / char_end / ts_start_ms / ts_end_ms — in-document
locators (documents set page/char; audio/video set ts)
- url — a clickable, short-lived signed link to the ORIGINAL file (when
downloads are enabled). Cite this so the user can open the source and verify
the grounding; null when the feature is off.
- rank — relevance score
Query semantics: the keyword arm is OR-over-terms weighted by term rarity — a chunk holding the rare decisive term out-ranks one dense in a common word, with no manual phrasing tricks. Lead with the most distinctive terms (a proper noun / rare word / specific figure), not a full sentence. There is no AND/coverage dance to manage: one query covers it, and a question phrased differently from the corpus still recalls on the words it shares. (Snippet highlighting still marks any query term.)
Modes. hybrid (default) adds a semantic (vector) arm — recall for
paraphrase / no-shared-lexeme queries — fused with the keyword arm and reranked.
lexical is keyword-only (no embeddings/rerank): faster, and the right choice when
embeddings are disabled for the tenant/source, or to drive a multi-query agentic
search (see below). Hybrid is a per-tenant / per-source toggle; if it's off (or no
embedder is deployed) a mode:"hybrid" request transparently degrades to lexical.
📄 get_file_text — whole document¶
Retrieve a file's full extracted text (as ordered chunks) plus its extraction
status. Input: source_id, key. Output: { chunks: [{seq, text, page_no,
…}], status } (+ embedded resource). Bounded at 5000 chunks (truncated flag);
use get_file_window for very large files.
🪟 get_file_window — bounded sliding read (the read-through tool)¶
Read a bounded window of a file's chunks — a cursor over the document. Input:
source_id, key, start (0-based chunk index, default 0), length (default 40,
max 200). Output: { chunks, total_chunks, next_cursor, window: {returned},
has_more, text }.
Use it to (a) read a large document in pieces, or (b) pull more context around
a search_content hit — set start ≈ the hit's seq minus a few, then slide
forward with next_cursor.
🏷️ get_file_metadata — file facts¶
Metadata for one file. Input: source_id, key. Output: size, mtime,
etag, storage class, owner, content type, and POSIX mode/uid/gid where captured.
📚 list_sources — browse sources¶
Sources the caller can access, paginated. Input: cursor, limit (default
50, max 200). Output: { sources: [{source_id, protocol, enabled,
last_indexed_at}], next_cursor? }.
🗂️ list_files — browse a source¶
Files within a source, paginated by key. Input: source_id (required),
cursor, limit (default 50, max 200). Output: { files: [...], next_cursor? }.
Recommended usage pattern (locate → read → answer)¶
- Locate with
search_content. Lead with the most distinctive terms. Don't stop after one query — if the top hits don't contain the answer, reformulate with the document's own wording (exact line-item names, section/statement titles, proper nouns, a specific figure or date) and search again; narrow withpath/content_type/modified_*; or page viahas_more. - Read from the hit's
text(it already includes neighbouring chunks). Iftextistruncated, or a hit looks relevant but the answer continues beyond the window, callget_file_windowaround the hit'sseq. - Answer, grounded, citing the
kdbl://file/…source. If nothing answers the question after reading, say so — don't fall back to prior knowledge.
This loop is what a chat client's system prompt should enforce; see the Air-gapped AI demo for a ready-made prompt.
Agentic search beats single-shot — especially in lexical mode¶
The reformulate-and-retry loop is not a fallback; it is the way to get high
recall. On a public financial-filings benchmark, single-shot lexical retrieval
found the right document in the top-10 only ~40–46% of the time, vs ~60–75% for
hybrid. But an agent that issues a few lexical queries — reformulating with each
statement's printed wording — found the right document 75–95% of the time
(scaling with model quality), matching or beating single-shot hybrid with no
embeddings at all. The takeaway for client/system-prompt authors: instruct the
model to treat the first result set as a clue, not an answer, and to re-query in
the corpus's vocabulary before giving up. lexical mode is a first-class agentic
retrieval surface — cheaper than hybrid and fully air-gappable (no embedder, no
GPU).
Opening the original file (download links)¶
Each search_content hit can include a url — a clickable, short-lived
HS256-signed link to the original source file (the PDF/etc.), so a user can
open it and verify the grounding. K-Lake doesn't store originals; the link re-fetches
on demand:
The token (≈15 min) carries the principal it was minted for; the endpoint
re-checks RLS before streaming (404 if the principal can no longer see the
file), and the byte fetch is delegated to the component that holds the source
connectors and credentials (the API never does). Every download is audited
(tool = files/download). Enabled when KDBL_DOWNLOAD_SIGNING_SECRET,
KDBL_API_PUBLIC_URL, and KDBL_INTERNAL_FETCH_TOKEN are set; otherwise url
is null and the route 404s.
Security & audit¶
- Per-file trimming — every query runs in a row-level-security-scoped transaction that cannot bypass the policies, so tenant + source-ACL + per-file-grant visibility is enforced on top of the explicit tenant filter. There is no way for a tool to return content the principal can't see.
- Audit — every call writes an audit row (principal, tool, arguments,
sources, keys returned, row count, status, client IP, timestamp). Review with
kdbl-control … mcp audit --tool search_content. - Offline auth — PATs verify offline against a stored hash with no IdP/JWKS fetch, so the server is fully usable air-gapped.
Retrieval characteristics & limits¶
- Keyword search: ranking weights terms by rarity, so a rare decisive term out-ranks a common one — no AND/coverage tuning needed.
- Hybrid (default) adds semantics. The semantic arm is an approximate
nearest-neighbour search over chunk embeddings, fused with the keyword arm,
diversified, and reranked. This catches paraphrase / no-shared-lexeme queries
that pure keyword search misses. Hybrid is a per-tenant / per-source toggle
and requires a deployed embedder; without one, or when disabled, retrieval is
lexical-only and
mode:"hybrid"degrades transparently. Cost/latency of each mode is reported byGET /capacity. - Keyword mode matches where the words appear. In
mode:"lexical", a question phrased in words that never occur in the answer text won't surface it on the first try — but reformulating with the corpus's own vocabulary across a few queries recovers it (see "Agentic search" above). Or uselist_files+get_file_windowto read a known document directly. - Bounded by design. The keyword candidate scan returns a true global top-N
(cap ≫
limit) before RLS/filters;get_file_textcaps at 5000 chunks;get_file_windowcaps at 200 per call — all to keep latency and payloads bounded at scale.
Documented clients¶
Cloud and local MCP clients are covered in Connecting AI clients; a fully air-gapped self-hosted-LLM + chat-interface setup driving these tools is in the Air-gapped AI demo.