Skip to content

Enablement runbook

A single "turn it on" guide for an operator standing up KDBL Context Lake (K-Lake)'s major capabilities. Each section is a discrete operation with copy-pasteable steps; the end-to-end ordering below ties them together for the security-trimming path, where order is load-bearing.

This page consolidates enablement that lives in detail elsewhere. It does not duplicate the reference docs — it sequences them. Follow the links for the full surface of each feature:

Conventions

  • Snippets are ```bash. Replace <placeholders> before running.
  • API examples assume KDBL_URL is your API base (e.g. https://kdbl.example.com) and KDBL_TOKEN is a bearer token. CLI examples assume kdbl-control is on PATH.
  • Some CLI commands run in API mode (--api-url/--api-token, RLS-enforced) and some in direct-DB mode (--postgres-url/KDBL_POSTGRES_URL, master-key-bearing). Each step says which.
  • Secrets are always passed via an environment variable or stdin, never a flag — so credentials stay out of argv, shell history, and logs. This is a hard rule in the CLI: there is no --client-secret, no --password, etc.
  • Deploy-side env vars live on the kdbl-api Deployment. The non-secret ones go in the kdbl-api-config ConfigMap; secrets go in the kdbl-api-secrets Secret. Apply, then roll the Deployment.

End-to-end ordering

If you are enabling per-file security trimming end to end, order matters. Trimming is fail-closed: the moment a source flips to per_file, every file hides until (a) its grants are computed and (b) the caller is correlated to the file's principals. Do directory enrichment and ACL capture first, or per_file just hides everything.

Do these in this order:

  1. Enable the MCP server (if AI clients are the consumer) — deploy env vars + roll the API.
  2. Set the per-tenant IdP audienceoidc_config.mcp_audience on each tenant, or MCP tokens 401.
  3. Configure directory enrichment — in sub-order: Entra app permissions + admin consent → directory set-graph / set-ldapsync-graph / sync-ldap to materialize the alias graph.
  4. Enable ACL capture + backfillmeta-caps ntfs_acl / nfs4_acl, then backfill-meta so already-crawled files get their ACL bytes (and grants) computed.
  5. Enable per-file trimmingsource security-trim --mode per_file. Only now is it safe: grants exist and callers correlate to them.
  6. Connect a client — point Claude or another MCP client at /mcp and confirm it sees the files it should.

Steps 1–2 are independent of 3–5 if you are not using MCP (a plain API consumer skips them). Steps 3 and 4 are independent of each other and can run in parallel; both must complete before step 5.

1. Add a source

A source is one location K-Lake indexes (an S3 bucket, SMB/SMBFS share, or NFS export). The full per-protocol command surface — required vs optional config, credential handling, the API equivalents — lives in sources. In brief:

Protocol CLI subcommand Credential
S3 source add-s3 access key via --secret-access-key-stdin, or ambient (IRSA)
SMB (userspace) source add-smb password via stdin
SMBFS (kernel mount) source add-smbfs password via stdin
NFS source add-nfs none (server-side /etc/exports)

Add a source, then trigger a first crawl:

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  crawl --source-id '<source-id>'

For NTFS/NFSv4 ACL trimming, prefer SMBFS over userspace SMB — userspace smb has no ACL surface (see step 5).

2. Enable content extraction on a source

Extraction pulls file contents (not just metadata) into the catalog so they can be searched and retrieved. Enable it per source; it takes effect within ~30 s. All source extract subcommands are API mode only.

  1. Enable extraction. --extensions and --max-bytes are preserved when omitted; pass an empty --extensions "" to mean "any extension".
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source extract enable --source-id '<source-id>' \
  --extensions pdf,docx --max-bytes 26214400

Optional narrowing: --modified-after / --modified-before (RFC3339), --include-path / --exclude-path (repeatable globs; exclude wins).

  1. (Already-crawled sources) Re-enqueue extraction for existing files by re-crawling — the crawl re-enqueues every eligible file:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source extract backfill --source-id '<source-id>'
  1. Watch progress and coverage:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source extract show --source-id '<source-id>'
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source extract progress --source-id '<source-id>' --watch

The UI exposes this as the extraction panel on the source detail page; the API is POST /api/sources/<urlencoded-source-id>/extract. Disable (preserving the allowlist/cap) with source extract disable.

3. Enable the MCP server

The MCP server lets AI clients (Claude and others) search and retrieve over the Model Context Protocol. It ships dark — mounted only when KDBL_MCP_ENABLED is true, and the API fails fast at boot if you enable it without a resource URI. Full detail in mcp; client setup in mcp-clients.

Deploy env vars

Set these on the kdbl-api Deployment (the non-secret ones in the kdbl-api-config ConfigMap):

Env var Default Meaning
KDBL_MCP_ENABLED false Master switch. False → neither /mcp nor the metadata routes are mounted.
KDBL_MCP_RESOURCE_URI Canonical MCP resource URI, e.g. https://kdbl.example.com/mcp. Advertised as resource and is the aud every token must target (RFC 8707). Required when enabled — boot fails if empty.
KDBL_MCP_SCOPES kdbl.search,kdbl.read Comma-separated scopes advertised in Protected Resource Metadata.
KDBL_MCP_ALLOWED_ORIGINS empty Comma-separated Origin allowlist for browser clients. Empty rejects any request carrying an Origin header (DNS-rebinding protection). Server-to-server clients send no Origin and pass.
KDBL_MCP_ALLOW_API_AUDIENCE false Escape hatch: fall back to the tenant's API audience for IdPs that can't mint resource-scoped tokens. Weakens the no-passthrough guarantee — leave off unless needed.
  1. Add the vars to the ConfigMap, then apply and roll the API:
kubectl -n kdbl apply -f kdbl-api.yaml
kubectl -n kdbl rollout restart deployment/kdbl-api

Confirm boot logged MCP endpoint enabled and the readiness probe is green.

Per-tenant IdP audience

For OIDC tokens, the expected audience is resolved per tenant from oidc_config.mcp_audience. A tenant with no mcp_audience set (and KDBL_MCP_ALLOW_API_AUDIENCE off) is unreachable over MCP — its tokens 401. Mint tokens whose aud is the MCP resource URI so they can't be replayed against the main API.

There is no dedicated CLI flag for mcp_audience — it is a key inside the tenant's oidc_config. Set it at create time or by patching the tenant.

  • New tenant (CLI, direct-DB mode):
kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
  tenant create --slug '<tenant-slug>' --name '<Name>' \
  --oidc-config-json '{"issuer":"<issuer-url>","audience":"<api-aud>","mcp_audience":"https://kdbl.example.com/mcp"}'
  • Existing tenant (API; cluster-admin). Note PATCH /api/tenants/:slug replaces oidc_config wholesale — send the complete block, not just the new key:
curl -X PATCH -H "Authorization: Bearer $KDBL_TOKEN" \
     -H "Content-Type: application/json" \
     "$KDBL_URL/api/tenants/<tenant-slug>" \
     -d '{"oidc_config":{"issuer":"<issuer-url>","audience":"<api-aud>","mcp_audience":"https://kdbl.example.com/mcp"}}'

Smoke check

End-to-end check (fetches Protected Resource Metadata, then runs initialize + tools/list with your token):

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" mcp smoke

Then connect a client per mcp-clients, and review activity with kdbl-control ... mcp audit.

4. Enable per-file security trimming

Per-file trimming makes a caller see only the files their source-native owner/group/ACL already grant them. It is strictly opt-in and fail-closed.

CRITICAL ORDERING. Trimming is fail-closed: the moment a source flips to per_file, every file is hidden until both its grants are computed and the caller is correlated to the file's principals. If you enable per_file before directory enrichment and ACL capture + backfill are in place, the source trims down to nothing. Do those two first. See the end-to-end ordering.

The policy lives in a security_trim block on the source config (mode + fail_closed):

Field Values Default Meaning
mode per_file | source_only | open source_only per_file turns trimming on; source_only is today's source-level visibility; open marks the source fully public.
fail_closed true | false true With per_file: hide files whose grants aren't computed (true) or fall back to source-level visibility (false).

source security-trim is API mode only; the change applies within ~30 s.

  • UI: the source detail page's Per-file security trimming panel (mode dropdown + fail-closed toggle).
  • CLI (omit --fail-closed to leave the stored value untouched):
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source security-trim --source-id '<source-id>' \
  --mode per_file --fail-closed true
  • API:
curl -X POST -H "Authorization: Bearer $KDBL_TOKEN" \
     -H "Content-Type: application/json" \
     "$KDBL_URL/api/sources/<urlencoded-source-id>/security-trim" \
     -d '{ "mode": "per_file", "fail_closed": true }'

POSIX grants (mode/uid/gid) are derived automatically. ACL grants require ACL capture first. See security-trimming for the grant model, grants_state, and limitations.

5. Enable ACL capture (NTFS / NFSv4)

This is the meta-caps prerequisite for ACL-based grants. POSIX grants need no cap, but an ACL-controlled file has no grants until K-Lake captures its raw ACL bytes — without the cap it falls back to POSIX bits (commonly grants_state = 0 → hidden under fail-closed). Do this before flipping the source to per_file.

Protocol Cap Notes
SMBFS (kernel mount) ntfs_acl Userspace smb has no ACL surface; ntfs_acl is masked out for it at registry time — use smbfs.
NFS nfs4_acl

source meta-caps is API mode only; applies within ~30 s.

  1. Enable the cap (UI: the source detail page's enrichment controls toggle NTFS / NFSv4 ACL capture):
# SMBFS:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source meta-caps --source-id '<smbfs-source-id>' --caps ntfs_acl
# NFS:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source meta-caps --source-id '<nfs-source-id>' --caps nfs4_acl

API: POST /api/sources/<urlencoded-source-id>/meta-caps with { "caps": "ntfs_acl" }.

  1. Backfill already-crawled files so existing files get their ACL bytes (and therefore their grants) computed — a crawl only enqueues enrichment for new listings:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source backfill-meta --source-id '<source-id>' --caps ntfs_acl
  1. Watch the backfill drain:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source meta-coverage --source-id '<source-id>'

6. Configure directory enrichment

Directory enrichment correlates IdP identities (the caller's token) to file-native principals (POSIX gids, NTFS/AD SIDs), so trimming authorizes the right people. Run this before enabling per_file. POSIX name resolution is automatic; the cross-directory cases below need setup. Full reference: directory.

All directory CLI commands run in direct-DB mode (--postgres-url / KDBL_POSTGRES_URL); secret-bearing ones also need the master key. They are not proxied through the API because the API process does not hold the master key.

a. Entra app registration (portal — the prerequisite people miss)

Before any Graph command works, the app registration in the Entra portal needs Microsoft Graph application permissions and admin consent:

  1. In the Entra admin center, open your app registration → API permissions.
  2. Add Microsoft Graph Application permissions (all read-only):
  3. Group.Read.All
  4. GroupMember.Read.All
  5. User.Read.All
  6. Click Grant admin consent for the tenant. Without consent the app-only token Graph mints has no effect and discovery returns nothing.
  7. Create a client secret under Certificates & secrets; you'll pass it via KDBL_GRAPH_CLIENT_SECRET below.

(Skip this for an unlinked on-prem AD that Entra has never synced — use set-ldap instead. See choosing a strategy.)

b. Store the directory config (CLI — the secret path)

Secrets are read from env vars, never flags.

  • Entra Graph — same / AD-synced Entra tenant. Stores non-secret config in oidc_config.graph and encrypts the client secret (KDBL_GRAPH_CLIENT_SECRET) at rest with KDBL_MASTER_KEY. The workers then run Graph discovery automatically:
export KDBL_GRAPH_CLIENT_SECRET='<app-client-secret>'
export KDBL_MASTER_KEY='<base64-master-key>'
kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
  directory set-graph \
  --tenant '<tenant-slug>' \
  --entra-tenant '<azure-ad-tenant-id>' \
  --client-id '<app-registration-client-id>'
  • AD / LDAP — separate, unlinked on-prem AD vs cloud-only Entra. Stores config in oidc_config.ldap and encrypts the bind password (KDBL_LDAP_BIND_PASSWORD). --upn-rewrite (from=to, repeatable) bridges an on-prem UPN suffix to the cloud one; --name-scope sets the NetBIOS-domain scope for name:: fallback refs. The worker must trust the DC's CA (LDAPS):
export KDBL_LDAP_BIND_PASSWORD='<ad-bind-password>'
export KDBL_MASTER_KEY='<base64-master-key>'
kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
  directory set-ldap \
  --tenant '<tenant-slug>' \
  --url 'ldaps://dc.demo.kdbl.com:636' \
  --bind-user 'DEMO\svc-kdbl' \
  --base-dn 'DC=demo,DC=kdbl,DC=com' \
  --name-scope DEMO \
  --upn-rewrite demo.kdbl.com=kdbl.co.uk

c. UI alternative (non-secret config only)

The tenant detail page's Directory correlation card (cluster-admin) edits the non-secret graph / ldap / declared-mapping blocks and shows a per-block badge for whether the encrypted secret is stored yet. The equivalent API is PATCH /api/tenants/:slug/directory (merges only the directory sub-keys, leaving issuer / audience / mcp_audience untouched).

Secrets stay CLI-only — the encrypted secrets need KDBL_MASTER_KEY, which the API and UI never hold. So the two surfaces compose: set the non-secret config over the UI/API, then store the credential once with set-graph / set-ldap.

d. Verify

Run a one-shot sync to materialize edges immediately (instead of waiting ~10 min for the workers' automatic refresh):

# Graph (uses KDBL_GRAPH_CLIENT_SECRET directly — does not need stored config):
export KDBL_GRAPH_CLIENT_SECRET='<app-client-secret>'
kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
  directory sync-graph \
  --tenant '<tenant-slug>' \
  --entra-tenant '<azure-ad-tenant-id>' \
  --client-id '<app-registration-client-id>'

# LDAP (decrypts the stored secret with the master key):
export KDBL_MASTER_KEY='<base64-master-key>'
kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
  directory sync-ldap --tenant '<tenant-slug>'

A healthy alias graph after a Graph sync should contain: group objectId ⇄ SID edges (sid::<SID>), synced-user SID ⇄ UPN/email edges, and directed membership edges (upn:<member> → group) that resolve the JWT group-overage case. For LDAP it should contain sid::<AD SID> ⇄ upn:<cloud-UPN> edges from the rewrite. See the alias graph for what high vs medium confidence means.

Once the alias graph is populated and ACL capture + backfill are done, proceed to step 4 — enable per_file.