Enablement runbook¶

A single "turn it on" guide for an operator standing up KDBL Context Lake (K-Lake)'s major capabilities. Each section is a discrete operation with copy-pasteable steps; the end-to-end ordering below ties them together for the security-trimming path, where order is load-bearing.

This page consolidates enablement that lives in detail elsewhere. It does not duplicate the reference docs — it sequences them. Follow the links for the full surface of each feature:

Sources — adding, listing, enabling, removing sources
MCP server — the AI-client endpoint, auth model, audit log
MCP clients — connecting Claude and other clients
Per-file security trimming — return only files a caller's source-native permissions grant
Directory enrichment — correlate IdP identities to file-native principals

Conventions¶

Snippets are ```bash. Replace <placeholders> before running.
API examples assume KDBL_URL is your API base (e.g. https://kdbl.example.com) and KDBL_TOKEN is a bearer token. CLI examples assume kdbl-control is on PATH.
Some CLI commands run in API mode (--api-url/--api-token, RLS-enforced) and some in direct-DB mode (--postgres-url/KDBL_POSTGRES_URL, master-key-bearing). Each step says which.
Secrets are always passed via an environment variable or stdin, never a flag — so credentials stay out of argv, shell history, and logs. This is a hard rule in the CLI: there is no --client-secret, no --password, etc.
Deploy-side env vars live on the kdbl-api Deployment. The non-secret ones go in the kdbl-api-config ConfigMap; secrets go in the kdbl-api-secrets Secret. Apply, then roll the Deployment.

End-to-end ordering¶

If you are enabling per-file security trimming end to end, order matters. Trimming is fail-closed: the moment a source flips to per_file, every file hides until (a) its grants are computed and (b) the caller is correlated to the file's principals. Do directory enrichment and ACL capture first, or per_file just hides everything.

Do these in this order:

Enable the MCP server (if AI clients are the consumer) — deploy env vars + roll the API.
Set the per-tenant IdP audience — oidc_config.mcp_audience on each tenant, or MCP tokens 401.
Configure directory enrichment — in sub-order: Entra app permissions + admin consent → directory set-graph / set-ldap → sync-graph / sync-ldap to materialize the alias graph.
Enable ACL capture + backfill — meta-caps ntfs_acl / nfs4_acl, then backfill-meta so already-crawled files get their ACL bytes (and grants) computed.
Enable per-file trimming — source security-trim --mode per_file. Only now is it safe: grants exist and callers correlate to them.
Connect a client — point Claude or another MCP client at /mcp and confirm it sees the files it should.

Steps 1–2 are independent of 3–5 if you are not using MCP (a plain API consumer skips them). Steps 3 and 4 are independent of each other and can run in parallel; both must complete before step 5.

1. Add a source¶

A source is one location K-Lake indexes (an S3 bucket, SMB/SMBFS share, or NFS export). The full per-protocol command surface — required vs optional config, credential handling, the API equivalents — lives in sources. In brief:

Protocol	CLI subcommand	Credential
S3	`source add-s3`	access key via `--secret-access-key-stdin`, or ambient (IRSA)
SMB (userspace)	`source add-smb`	password via stdin
SMBFS (kernel mount)	`source add-smbfs`	password via stdin
NFS	`source add-nfs`	none (server-side `/etc/exports`)

Add a source, then trigger a first crawl:

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  crawl --source-id '<source-id>'

For NTFS/NFSv4 ACL trimming, prefer SMBFS over userspace SMB — userspace smb has no ACL surface (see step 5).

2. Enable content extraction on a source¶

Extraction pulls file contents (not just metadata) into the catalog so they can be searched and retrieved. Enable it per source; it takes effect within ~30 s. All source extract subcommands are API mode only.

Enable extraction. --extensions and --max-bytes are preserved when omitted; pass an empty --extensions "" to mean "any extension".

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source extract enable --source-id '<source-id>' \
  --extensions pdf,docx --max-bytes 26214400

Optional narrowing: --modified-after / --modified-before (RFC3339), --include-path / --exclude-path (repeatable globs; exclude wins).

(Already-crawled sources) Re-enqueue extraction for existing files by re-crawling — the crawl re-enqueues every eligible file:

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source extract backfill --source-id '<source-id>'

Watch progress and coverage:

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source extract show --source-id '<source-id>'
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source extract progress --source-id '<source-id>' --watch

The UI exposes this as the extraction panel on the source detail page; the API is POST /api/sources/<urlencoded-source-id>/extract. Disable (preserving the allowlist/cap) with source extract disable.

3. Enable the MCP server¶

The MCP server lets AI clients (Claude and others) search and retrieve over the Model Context Protocol. It ships dark — mounted only when KDBL_MCP_ENABLED is true, and the API fails fast at boot if you enable it without a resource URI. Full detail in mcp; client setup in mcp-clients.

Deploy env vars¶

Set these on the kdbl-api Deployment (the non-secret ones in the kdbl-api-config ConfigMap):

Env var	Default	Meaning
`KDBL_MCP_ENABLED`	`false`	Master switch. False → neither `/mcp` nor the metadata routes are mounted.
`KDBL_MCP_RESOURCE_URI`	—	Canonical MCP resource URI, e.g. `https://kdbl.example.com/mcp`. Advertised as `resource` and is the `aud` every token must target (RFC 8707). Required when enabled — boot fails if empty.
`KDBL_MCP_SCOPES`	`kdbl.search,kdbl.read`	Comma-separated scopes advertised in Protected Resource Metadata.
`KDBL_MCP_ALLOWED_ORIGINS`	empty	Comma-separated `Origin` allowlist for browser clients. Empty rejects any request carrying an `Origin` header (DNS-rebinding protection). Server-to-server clients send no `Origin` and pass.
`KDBL_MCP_ALLOW_API_AUDIENCE`	`false`	Escape hatch: fall back to the tenant's API audience for IdPs that can't mint resource-scoped tokens. Weakens the no-passthrough guarantee — leave off unless needed.

Add the vars to the ConfigMap, then apply and roll the API:

kubectl -n kdbl apply -f kdbl-api.yaml
kubectl -n kdbl rollout restart deployment/kdbl-api

Confirm boot logged MCP endpoint enabled and the readiness probe is green.

Per-tenant IdP audience¶

For OIDC tokens, the expected audience is resolved per tenant from oidc_config.mcp_audience. A tenant with no mcp_audience set (and KDBL_MCP_ALLOW_API_AUDIENCE off) is unreachable over MCP — its tokens 401. Mint tokens whose aud is the MCP resource URI so they can't be replayed against the main API.

There is no dedicated CLI flag for mcp_audience — it is a key inside the tenant's oidc_config. Set it at create time or by patching the tenant.

New tenant (CLI, direct-DB mode):

kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
  tenant create --slug '<tenant-slug>' --name '<Name>' \
  --oidc-config-json '{"issuer":"<issuer-url>","audience":"<api-aud>","mcp_audience":"https://kdbl.example.com/mcp"}'

Existing tenant (API; cluster-admin). Note PATCH /api/tenants/:slug replaces oidc_config wholesale — send the complete block, not just the new key:

curl -X PATCH -H "Authorization: Bearer $KDBL_TOKEN" \
     -H "Content-Type: application/json" \
     "$KDBL_URL/api/tenants/<tenant-slug>" \
     -d '{"oidc_config":{"issuer":"<issuer-url>","audience":"<api-aud>","mcp_audience":"https://kdbl.example.com/mcp"}}'

Smoke check¶

End-to-end check (fetches Protected Resource Metadata, then runs initialize + tools/list with your token):

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" mcp smoke

Then connect a client per mcp-clients, and review activity with kdbl-control ... mcp audit.

4. Enable per-file security trimming¶

Per-file trimming makes a caller see only the files their source-native owner/group/ACL already grant them. It is strictly opt-in and fail-closed.

CRITICAL ORDERING. Trimming is fail-closed: the moment a source flips to per_file, every file is hidden until both its grants are computed and the caller is correlated to the file's principals. If you enable per_file before directory enrichment and ACL capture + backfill are in place, the source trims down to nothing. Do those two first. See the end-to-end ordering.

The policy lives in a security_trim block on the source config (mode + fail_closed):

Field	Values	Default	Meaning
`mode`	`per_file` \| `source_only` \| `open`	`source_only`	`per_file` turns trimming on; `source_only` is today's source-level visibility; `open` marks the source fully public.
`fail_closed`	`true` \| `false`	`true`	With `per_file`: hide files whose grants aren't computed (`true`) or fall back to source-level visibility (`false`).

source security-trim is API mode only; the change applies within ~30 s.

UI: the source detail page's Per-file security trimming panel (mode dropdown + fail-closed toggle).
CLI (omit --fail-closed to leave the stored value untouched):

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source security-trim --source-id '<source-id>' \
  --mode per_file --fail-closed true

API:

curl -X POST -H "Authorization: Bearer $KDBL_TOKEN" \
     -H "Content-Type: application/json" \
     "$KDBL_URL/api/sources/<urlencoded-source-id>/security-trim" \
     -d '{ "mode": "per_file", "fail_closed": true }'

POSIX grants (mode/uid/gid) are derived automatically. ACL grants require ACL capture first. See security-trimming for the grant model, grants_state, and limitations.

5. Enable ACL capture (NTFS / NFSv4)¶

This is the meta-caps prerequisite for ACL-based grants. POSIX grants need no cap, but an ACL-controlled file has no grants until K-Lake captures its raw ACL bytes — without the cap it falls back to POSIX bits (commonly grants_state = 0 → hidden under fail-closed). Do this before flipping the source to per_file.

Protocol	Cap	Notes
SMBFS (kernel mount)	`ntfs_acl`	Userspace `smb` has no ACL surface; `ntfs_acl` is masked out for it at registry time — use `smbfs`.
NFS	`nfs4_acl`

source meta-caps is API mode only; applies within ~30 s.

Enable the cap (UI: the source detail page's enrichment controls toggle NTFS / NFSv4 ACL capture):

# SMBFS:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source meta-caps --source-id '<smbfs-source-id>' --caps ntfs_acl
# NFS:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source meta-caps --source-id '<nfs-source-id>' --caps nfs4_acl

API: POST /api/sources/<urlencoded-source-id>/meta-caps with { "caps": "ntfs_acl" }.

Backfill already-crawled files so existing files get their ACL bytes (and therefore their grants) computed — a crawl only enqueues enrichment for new listings:

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source backfill-meta --source-id '<source-id>' --caps ntfs_acl

Watch the backfill drain:

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source meta-coverage --source-id '<source-id>'

6. Configure directory enrichment¶

Directory enrichment correlates IdP identities (the caller's token) to file-native principals (POSIX gids, NTFS/AD SIDs), so trimming authorizes the right people. Run this before enabling per_file. POSIX name resolution is automatic; the cross-directory cases below need setup. Full reference: directory.

All directory CLI commands run in direct-DB mode (--postgres-url / KDBL_POSTGRES_URL); secret-bearing ones also need the master key. They are not proxied through the API because the API process does not hold the master key.

a. Entra app registration (portal — the prerequisite people miss)¶

Before any Graph command works, the app registration in the Entra portal needs Microsoft Graph application permissions and admin consent:

In the Entra admin center, open your app registration → API permissions.
Add Microsoft Graph Application permissions (all read-only):
Group.Read.All
GroupMember.Read.All
User.Read.All
Click Grant admin consent for the tenant. Without consent the app-only token Graph mints has no effect and discovery returns nothing.
Create a client secret under Certificates & secrets; you'll pass it via KDBL_GRAPH_CLIENT_SECRET below.

(Skip this for an unlinked on-prem AD that Entra has never synced — use set-ldap instead. See choosing a strategy.)

b. Store the directory config (CLI — the secret path)¶

Secrets are read from env vars, never flags.

Entra Graph — same / AD-synced Entra tenant. Stores non-secret config in oidc_config.graph and encrypts the client secret (KDBL_GRAPH_CLIENT_SECRET) at rest with KDBL_MASTER_KEY. The workers then run Graph discovery automatically:

export KDBL_GRAPH_CLIENT_SECRET='<app-client-secret>'
export KDBL_MASTER_KEY='<base64-master-key>'
kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
  directory set-graph \
  --tenant '<tenant-slug>' \
  --entra-tenant '<azure-ad-tenant-id>' \
  --client-id '<app-registration-client-id>'

AD / LDAP — separate, unlinked on-prem AD vs cloud-only Entra. Stores config in oidc_config.ldap and encrypts the bind password (KDBL_LDAP_BIND_PASSWORD). --upn-rewrite (from=to, repeatable) bridges an on-prem UPN suffix to the cloud one; --name-scope sets the NetBIOS-domain scope for name:: fallback refs. The worker must trust the DC's CA (LDAPS):

export KDBL_LDAP_BIND_PASSWORD='<ad-bind-password>'
export KDBL_MASTER_KEY='<base64-master-key>'
kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
  directory set-ldap \
  --tenant '<tenant-slug>' \
  --url 'ldaps://dc.demo.kdbl.com:636' \
  --bind-user 'DEMO\svc-kdbl' \
  --base-dn 'DC=demo,DC=kdbl,DC=com' \
  --name-scope DEMO \
  --upn-rewrite demo.kdbl.com=kdbl.co.uk

c. UI alternative (non-secret config only)¶

The tenant detail page's Directory correlation card (cluster-admin) edits the non-secret graph / ldap / declared-mapping blocks and shows a per-block badge for whether the encrypted secret is stored yet. The equivalent API is PATCH /api/tenants/:slug/directory (merges only the directory sub-keys, leaving issuer / audience / mcp_audience untouched).

Secrets stay CLI-only — the encrypted secrets need KDBL_MASTER_KEY, which the API and UI never hold. So the two surfaces compose: set the non-secret config over the UI/API, then store the credential once with set-graph / set-ldap.

d. Verify¶

Run a one-shot sync to materialize edges immediately (instead of waiting ~10 min for the workers' automatic refresh):

# Graph (uses KDBL_GRAPH_CLIENT_SECRET directly — does not need stored config):
export KDBL_GRAPH_CLIENT_SECRET='<app-client-secret>'
kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
  directory sync-graph \
  --tenant '<tenant-slug>' \
  --entra-tenant '<azure-ad-tenant-id>' \
  --client-id '<app-registration-client-id>'

# LDAP (decrypts the stored secret with the master key):
export KDBL_MASTER_KEY='<base64-master-key>'
kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
  directory sync-ldap --tenant '<tenant-slug>'

A healthy alias graph after a Graph sync should contain: group objectId ⇄ SID edges (sid::<SID>), synced-user SID ⇄ UPN/email edges, and directed membership edges (upn:<member> → group) that resolve the JWT group-overage case. For LDAP it should contain sid::<AD SID> ⇄ upn:<cloud-UPN> edges from the rewrite. See the alias graph for what high vs medium confidence means.

Once the alias graph is populated and ACL capture + backfill are done, proceed to step 4 — enable per_file.