Skip to content

Chat-UI QuickStart — talk to KDBL Context Lake (K-Lake) over MCP

K-Lake ships an optional local chat surface: a self-hosted chat interface wired to a self-hosted LLM, with K-Lake's knowledge base exposed to it as tools over the Model Context Protocol (MCP). It is self-seeding — deploying it yields a fully-configured chat (models present, system prompt set, K-Lake tools bound, grounded answers that cite signed original-file links) with no manual UI clicks. Answers are grounded in your content and carry the same per-user security and auditing as the rest of K-Lake.

For connecting a remote, cloud AI (such as Claude or ChatGPT connectors) to an on-prem K-Lake instead, see Connect your AI.

What gets deployed

The optional chat bundle stands up two things in your cluster:

  • A local chat interface — the chat surface the user types into. It is configured login-free for evaluation.
  • An MCP bridge that re-exposes K-Lake's MCP tools (search_content, get_file_text, and the rest) so the chat interface can call them.

The model backend is a self-hosted LLM running in your cluster (CPU or GPU).

Prerequisites

  • A self-hosted LLM running in your cluster. The served model name must match the model id the chat bundle expects.
  • A tenant PAT for the data the chat should retrieve. The PAT's permission scope is exactly what the chat can see.

Deploy

  1. Create one out-of-band secret holding the tenant PAT the chat uses to authenticate to K-Lake's MCP endpoint (kept out of source control).

  2. Apply the optional chat bundle included in your deployment package:

kubectl apply -f <chat-bundle>.yaml
  1. Reach the chat interface — either point a hostname at any node IP via the bundled Ingress, or port-forward the chat service:
kubectl -n kdbl port-forward svc/<chat-service> 8080:80
  1. Open it, pick a model, and ask a grounded question. The answer should cite each fact with a clickable signed original-file link — re-fetched, permissions-re-checked, and audited on click.

How the seeding works

The chat bundle is self-configuring: on deploy, a one-shot bootstrap step seeds the model definitions, the retrieval system prompt, and the binding to K-Lake's MCP tools, then exits. A redeploy re-reconciles this configuration rather than duplicating it, so re-applying the bundle is safe. The system prompt is the source of truth for retrieval behaviour — edit it and re-apply to re-sync.

Caveats

  • Shared GPU. If more than one LLM is configured but they share a single GPU, only one is loaded at a time; the others simply contribute no models to the picker until scaled up.
  • Login-free mode. The chat interface is configured without authentication for ease of evaluation. Do not expose it to untrusted networks in this mode.