Enablement runbook¶
A single "turn it on" guide for an operator standing up KDBL Context Lake (K-Lake)'s major capabilities. Each section is a discrete operation with copy-pasteable steps; the end-to-end ordering below ties them together for the security-trimming path, where order is load-bearing.
This page consolidates enablement that lives in detail elsewhere. It does not duplicate the reference docs — it sequences them. Follow the links for the full surface of each feature:
- Sources — adding, listing, enabling, removing sources
- MCP server — the AI-client endpoint, auth model, audit log
- MCP clients — connecting Claude and other clients
- Per-file security trimming — return only files a caller's source-native permissions grant
- Directory enrichment — correlate IdP identities to file-native principals
Conventions¶
- Snippets are
```bash. Replace<placeholders>before running. - API examples assume
KDBL_URLis your API base (e.g.https://kdbl.example.com) andKDBL_TOKENis a bearer token. CLI examples assumekdbl-controlis onPATH. - Some CLI commands run in API mode (
--api-url/--api-token, RLS-enforced) and some in direct-DB mode (--postgres-url/KDBL_POSTGRES_URL, master-key-bearing). Each step says which. - Secrets are always passed via an environment variable or stdin, never a flag — so credentials stay out of
argv, shell history, and logs. This is a hard rule in the CLI: there is no--client-secret, no--password, etc. - Deploy-side env vars live on the
kdbl-apiDeployment. The non-secret ones go in thekdbl-api-configConfigMap; secrets go in thekdbl-api-secretsSecret. Apply, then roll the Deployment.
End-to-end ordering¶
If you are enabling per-file security trimming end to end, order matters. Trimming is fail-closed: the moment a source flips to per_file, every file hides until (a) its grants are computed and (b) the caller is correlated to the file's principals. Do directory enrichment and ACL capture first, or per_file just hides everything.
Do these in this order:
- Enable the MCP server (if AI clients are the consumer) — deploy env vars + roll the API.
- Set the per-tenant IdP audience —
oidc_config.mcp_audienceon each tenant, or MCP tokens 401. - Configure directory enrichment — in sub-order: Entra app permissions + admin consent →
directory set-graph/set-ldap→sync-graph/sync-ldapto materialize the alias graph. - Enable ACL capture + backfill —
meta-caps ntfs_acl/nfs4_acl, thenbackfill-metaso already-crawled files get their ACL bytes (and grants) computed. - Enable per-file trimming —
source security-trim --mode per_file. Only now is it safe: grants exist and callers correlate to them. - Connect a client — point Claude or another MCP client at
/mcpand confirm it sees the files it should.
Steps 1–2 are independent of 3–5 if you are not using MCP (a plain API consumer skips them). Steps 3 and 4 are independent of each other and can run in parallel; both must complete before step 5.
1. Add a source¶
A source is one location K-Lake indexes (an S3 bucket, SMB/SMBFS share, or NFS export). The full per-protocol command surface — required vs optional config, credential handling, the API equivalents — lives in sources. In brief:
| Protocol | CLI subcommand | Credential |
|---|---|---|
| S3 | source add-s3 |
access key via --secret-access-key-stdin, or ambient (IRSA) |
| SMB (userspace) | source add-smb |
password via stdin |
| SMBFS (kernel mount) | source add-smbfs |
password via stdin |
| NFS | source add-nfs |
none (server-side /etc/exports) |
Add a source, then trigger a first crawl:
For NTFS/NFSv4 ACL trimming, prefer SMBFS over userspace SMB — userspace smb has no ACL surface (see step 5).
2. Enable content extraction on a source¶
Extraction pulls file contents (not just metadata) into the catalog so they can be searched and retrieved. Enable it per source; it takes effect within ~30 s. All source extract subcommands are API mode only.
- Enable extraction.
--extensionsand--max-bytesare preserved when omitted; pass an empty--extensions ""to mean "any extension".
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source extract enable --source-id '<source-id>' \
--extensions pdf,docx --max-bytes 26214400
Optional narrowing: --modified-after / --modified-before (RFC3339), --include-path / --exclude-path (repeatable globs; exclude wins).
- (Already-crawled sources) Re-enqueue extraction for existing files by re-crawling — the crawl re-enqueues every eligible file:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source extract backfill --source-id '<source-id>'
- Watch progress and coverage:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source extract show --source-id '<source-id>'
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source extract progress --source-id '<source-id>' --watch
The UI exposes this as the extraction panel on the source detail page; the API is POST /api/sources/<urlencoded-source-id>/extract. Disable (preserving the allowlist/cap) with source extract disable.
3. Enable the MCP server¶
The MCP server lets AI clients (Claude and others) search and retrieve over the Model Context Protocol. It ships dark — mounted only when KDBL_MCP_ENABLED is true, and the API fails fast at boot if you enable it without a resource URI. Full detail in mcp; client setup in mcp-clients.
Deploy env vars¶
Set these on the kdbl-api Deployment (the non-secret ones in the kdbl-api-config ConfigMap):
| Env var | Default | Meaning |
|---|---|---|
KDBL_MCP_ENABLED |
false |
Master switch. False → neither /mcp nor the metadata routes are mounted. |
KDBL_MCP_RESOURCE_URI |
— | Canonical MCP resource URI, e.g. https://kdbl.example.com/mcp. Advertised as resource and is the aud every token must target (RFC 8707). Required when enabled — boot fails if empty. |
KDBL_MCP_SCOPES |
kdbl.search,kdbl.read |
Comma-separated scopes advertised in Protected Resource Metadata. |
KDBL_MCP_ALLOWED_ORIGINS |
empty | Comma-separated Origin allowlist for browser clients. Empty rejects any request carrying an Origin header (DNS-rebinding protection). Server-to-server clients send no Origin and pass. |
KDBL_MCP_ALLOW_API_AUDIENCE |
false |
Escape hatch: fall back to the tenant's API audience for IdPs that can't mint resource-scoped tokens. Weakens the no-passthrough guarantee — leave off unless needed. |
- Add the vars to the ConfigMap, then apply and roll the API:
Confirm boot logged MCP endpoint enabled and the readiness probe is green.
Per-tenant IdP audience¶
For OIDC tokens, the expected audience is resolved per tenant from oidc_config.mcp_audience. A tenant with no mcp_audience set (and KDBL_MCP_ALLOW_API_AUDIENCE off) is unreachable over MCP — its tokens 401. Mint tokens whose aud is the MCP resource URI so they can't be replayed against the main API.
There is no dedicated CLI flag for mcp_audience — it is a key inside the tenant's oidc_config. Set it at create time or by patching the tenant.
- New tenant (CLI, direct-DB mode):
kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
tenant create --slug '<tenant-slug>' --name '<Name>' \
--oidc-config-json '{"issuer":"<issuer-url>","audience":"<api-aud>","mcp_audience":"https://kdbl.example.com/mcp"}'
- Existing tenant (API; cluster-admin). Note
PATCH /api/tenants/:slugreplacesoidc_configwholesale — send the complete block, not just the new key:
curl -X PATCH -H "Authorization: Bearer $KDBL_TOKEN" \
-H "Content-Type: application/json" \
"$KDBL_URL/api/tenants/<tenant-slug>" \
-d '{"oidc_config":{"issuer":"<issuer-url>","audience":"<api-aud>","mcp_audience":"https://kdbl.example.com/mcp"}}'
Smoke check¶
End-to-end check (fetches Protected Resource Metadata, then runs initialize + tools/list with your token):
Then connect a client per mcp-clients, and review activity with kdbl-control ... mcp audit.
4. Enable per-file security trimming¶
Per-file trimming makes a caller see only the files their source-native owner/group/ACL already grant them. It is strictly opt-in and fail-closed.
CRITICAL ORDERING. Trimming is fail-closed: the moment a source flips to
per_file, every file is hidden until both its grants are computed and the caller is correlated to the file's principals. If you enableper_filebefore directory enrichment and ACL capture + backfill are in place, the source trims down to nothing. Do those two first. See the end-to-end ordering.
The policy lives in a security_trim block on the source config (mode + fail_closed):
| Field | Values | Default | Meaning |
|---|---|---|---|
mode |
per_file | source_only | open |
source_only |
per_file turns trimming on; source_only is today's source-level visibility; open marks the source fully public. |
fail_closed |
true | false |
true |
With per_file: hide files whose grants aren't computed (true) or fall back to source-level visibility (false). |
source security-trim is API mode only; the change applies within ~30 s.
- UI: the source detail page's Per-file security trimming panel (mode dropdown + fail-closed toggle).
- CLI (omit
--fail-closedto leave the stored value untouched):
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source security-trim --source-id '<source-id>' \
--mode per_file --fail-closed true
- API:
curl -X POST -H "Authorization: Bearer $KDBL_TOKEN" \
-H "Content-Type: application/json" \
"$KDBL_URL/api/sources/<urlencoded-source-id>/security-trim" \
-d '{ "mode": "per_file", "fail_closed": true }'
POSIX grants (mode/uid/gid) are derived automatically. ACL grants require ACL capture first. See security-trimming for the grant model, grants_state, and limitations.
5. Enable ACL capture (NTFS / NFSv4)¶
This is the meta-caps prerequisite for ACL-based grants. POSIX grants need no cap, but an ACL-controlled file has no grants until K-Lake captures its raw ACL bytes — without the cap it falls back to POSIX bits (commonly grants_state = 0 → hidden under fail-closed). Do this before flipping the source to per_file.
| Protocol | Cap | Notes |
|---|---|---|
| SMBFS (kernel mount) | ntfs_acl |
Userspace smb has no ACL surface; ntfs_acl is masked out for it at registry time — use smbfs. |
| NFS | nfs4_acl |
source meta-caps is API mode only; applies within ~30 s.
- Enable the cap (UI: the source detail page's enrichment controls toggle NTFS / NFSv4 ACL capture):
# SMBFS:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source meta-caps --source-id '<smbfs-source-id>' --caps ntfs_acl
# NFS:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source meta-caps --source-id '<nfs-source-id>' --caps nfs4_acl
API: POST /api/sources/<urlencoded-source-id>/meta-caps with { "caps": "ntfs_acl" }.
- Backfill already-crawled files so existing files get their ACL bytes (and therefore their grants) computed — a crawl only enqueues enrichment for new listings:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source backfill-meta --source-id '<source-id>' --caps ntfs_acl
- Watch the backfill drain:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source meta-coverage --source-id '<source-id>'
6. Configure directory enrichment¶
Directory enrichment correlates IdP identities (the caller's token) to file-native principals (POSIX gids, NTFS/AD SIDs), so trimming authorizes the right people. Run this before enabling per_file. POSIX name resolution is automatic; the cross-directory cases below need setup. Full reference: directory.
All directory CLI commands run in direct-DB mode (--postgres-url / KDBL_POSTGRES_URL); secret-bearing ones also need the master key. They are not proxied through the API because the API process does not hold the master key.
a. Entra app registration (portal — the prerequisite people miss)¶
Before any Graph command works, the app registration in the Entra portal needs Microsoft Graph application permissions and admin consent:
- In the Entra admin center, open your app registration → API permissions.
- Add Microsoft Graph Application permissions (all read-only):
Group.Read.AllGroupMember.Read.AllUser.Read.All- Click Grant admin consent for the tenant. Without consent the app-only token Graph mints has no effect and discovery returns nothing.
- Create a client secret under Certificates & secrets; you'll pass it via
KDBL_GRAPH_CLIENT_SECRETbelow.
(Skip this for an unlinked on-prem AD that Entra has never synced — use set-ldap instead. See choosing a strategy.)
b. Store the directory config (CLI — the secret path)¶
Secrets are read from env vars, never flags.
- Entra Graph — same / AD-synced Entra tenant. Stores non-secret config in
oidc_config.graphand encrypts the client secret (KDBL_GRAPH_CLIENT_SECRET) at rest withKDBL_MASTER_KEY. The workers then run Graph discovery automatically:
export KDBL_GRAPH_CLIENT_SECRET='<app-client-secret>'
export KDBL_MASTER_KEY='<base64-master-key>'
kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
directory set-graph \
--tenant '<tenant-slug>' \
--entra-tenant '<azure-ad-tenant-id>' \
--client-id '<app-registration-client-id>'
- AD / LDAP — separate, unlinked on-prem AD vs cloud-only Entra. Stores config in
oidc_config.ldapand encrypts the bind password (KDBL_LDAP_BIND_PASSWORD).--upn-rewrite(from=to, repeatable) bridges an on-prem UPN suffix to the cloud one;--name-scopesets the NetBIOS-domain scope forname::fallback refs. The worker must trust the DC's CA (LDAPS):
export KDBL_LDAP_BIND_PASSWORD='<ad-bind-password>'
export KDBL_MASTER_KEY='<base64-master-key>'
kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
directory set-ldap \
--tenant '<tenant-slug>' \
--url 'ldaps://dc.demo.kdbl.com:636' \
--bind-user 'DEMO\svc-kdbl' \
--base-dn 'DC=demo,DC=kdbl,DC=com' \
--name-scope DEMO \
--upn-rewrite demo.kdbl.com=kdbl.co.uk
c. UI alternative (non-secret config only)¶
The tenant detail page's Directory correlation card (cluster-admin) edits the non-secret graph / ldap / declared-mapping blocks and shows a per-block badge for whether the encrypted secret is stored yet. The equivalent API is PATCH /api/tenants/:slug/directory (merges only the directory sub-keys, leaving issuer / audience / mcp_audience untouched).
Secrets stay CLI-only — the encrypted secrets need KDBL_MASTER_KEY, which the API and UI never hold. So the two surfaces compose: set the non-secret config over the UI/API, then store the credential once with set-graph / set-ldap.
d. Verify¶
Run a one-shot sync to materialize edges immediately (instead of waiting ~10 min for the workers' automatic refresh):
# Graph (uses KDBL_GRAPH_CLIENT_SECRET directly — does not need stored config):
export KDBL_GRAPH_CLIENT_SECRET='<app-client-secret>'
kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
directory sync-graph \
--tenant '<tenant-slug>' \
--entra-tenant '<azure-ad-tenant-id>' \
--client-id '<app-registration-client-id>'
# LDAP (decrypts the stored secret with the master key):
export KDBL_MASTER_KEY='<base64-master-key>'
kdbl-control --postgres-url "$KDBL_POSTGRES_URL" \
directory sync-ldap --tenant '<tenant-slug>'
A healthy alias graph after a Graph sync should contain: group objectId ⇄ SID edges (sid::<SID>), synced-user SID ⇄ UPN/email edges, and directed membership edges (upn:<member> → group) that resolve the JWT group-overage case. For LDAP it should contain sid::<AD SID> ⇄ upn:<cloud-UPN> edges from the rewrite. See the alias graph for what high vs medium confidence means.
Once the alias graph is populated and ACL capture + backfill are done, proceed to step 4 — enable per_file.