Security trimming¶
Per-file security trimming makes a caller see only the files their source-native permissions already grant them — the file's own owner, group, or ACL. A user who can't read a file on the NAS can't see it, its metadata, or its extracted content through KDBL Context Lake (K-Lake) either.
Enforcement lives in the datastore's row-level security (RLS), so it applies everywhere automatically: list, file detail, and content search are all trimmed by the same predicate. There is no separate "filter the search results" code path to keep in sync.
Trimming is fail-closed and strictly opt-in. A source you don't configure for it keeps today's source-level visibility. A source you do configure for per-file trimming hides any file whose grants haven't been computed yet — K-Lake would rather hide a file you could read than leak one you can't.
How it works¶
Both sides of the question — "who is this caller?" and "who may read this file?" — are collapsed into one namespace, the principal ref, of the form <kind>:<scope>:<value>. Each distinct ref is interned to an integer principal_id.
- File side. When a file is indexed, K-Lake derives the set of principal refs that may read it from its POSIX mode bits and (if captured) its ACL, and records a read-grant for each granted principal.
- Caller side. When a caller hits the API, their IdP claims (object id, email, UPN, group memberships) are turned into principal refs, resolved to the
principal_idset, and handed to the datastore.
A file is authorized for a caller when the caller's resolved principal id set intersects the file's grant set — a positive set-intersection. The same authorization predicate is applied to file listings, metadata, and content, ANDed on top of the existing tenant + source-visibility checks. It short-circuits in this order:
- Tenant admin, or source
owner/editor→ unrestricted (admin path). - Source not in
per_filemode → source-level visibility (today's behavior). - File is world/everyone-readable (
grants_state = 2) → visible. - Caller holds a read grant on the exact file → visible.
fail_closedis off and the file's grants aren't computed yet → visible (fail-open fallback).- Otherwise → hidden.
The principal-ref kinds are:
| Kind | Scope | Example | Origin |
|---|---|---|---|
sid |
AD/domain (often empty) | sid::S-1-5-21-… |
NTFS owner / ACE SID |
upn |
(none) | upn:alice@corp.com |
User principal name — the cross-directory bridge key |
email |
(none) | email:alice@corp.com |
Email / Google Workspace group address (lowercased) |
oid |
issuer | oid:<issuer>:<objectId> |
Entra/Azure object GUID (user or group) |
posixuid |
source_id | posixuid:<source_id>:<uid> |
Numeric POSIX owner — source-scoped (uid spaces are server-local) |
posixgid |
source_id | posixgid:<source_id>:<gid> |
Numeric POSIX group — source-scoped |
name |
directory | name:<dir>:<group> |
Resolved friendly user/group name |
nfs4who |
source_id | nfs4who:<source_id>:<who> |
NFSv4 who-string, pre-resolution |
The same ref can appear on both sides — e.g. a POSIX group on the file and the caller's resolved group — and that's the common case. When the file-side and caller-side refs are in different namespaces (an on-prem AD SID on the file vs. an Entra group object id in the token), they're bridged by directory enrichment, which builds the alias graph.
Enabling it on a source¶
The trim policy lives in a security_trim block inside the source's config JSON, read by both the workers and the authorization predicate. The block has two fields:
| Field | Values | Default | Meaning |
|---|---|---|---|
mode |
per_file | source_only | open |
source_only |
per_file turns on per-file trimming. source_only is today's source-level visibility. open explicitly marks the source fully public (no trimming, even if other sources in the tenant trim). An absent block or any unrecognised value falls back to source_only. |
fail_closed |
true | false |
true |
With per_file, decides what happens to files whose grants aren't computed yet: hide them (true) or fall back to source-level visibility (false). |
Set it through the dedicated endpoint, which merges the block into the source config without disturbing the rest (the same merge pattern as meta-caps / subtree). The change takes effect within ~30 s.
CLI:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source security-trim --source-id 'smbfs://nas.corp/finance' \
--mode per_file --fail-closed true
--fail-closed is optional — omit it to flip the mode while leaving the stored fail_closed untouched.
API:
curl -X POST -H "Authorization: Bearer $KDBL_TOKEN" \
-H "Content-Type: application/json" \
"$KDBL_URL/api/sources/<urlencoded-source-id>/security-trim" \
-d '{ "mode": "per_file", "fail_closed": true }'
The response echoes the stored policy ({ "mode": ..., "fail_closed": ... }). Auth is tenant-admin or the source's owner/editor; an unauthorised caller gets 404 (the row isn't visible under RLS), not 403.
In the UI, the source detail page has a Per-file security trimming panel (mode dropdown + fail-closed toggle) that posts to this endpoint.
Prerequisite for ACL-based trimming: capture the ACL bytes¶
POSIX grants are derived automatically from the mode/uid/gid the crawl already captures — no extra configuration. ACL grants are not: the source must first capture the raw ACL bytes via a metadata-capture cap (ntfs_acl for SMB/SMBFS, nfs4_acl for NFS). Without the cap, there are no ACL bytes to derive grants from, and an ACL-controlled file falls back to its POSIX bits (commonly grants_state = 0 → hidden under fail-closed).
Enable the cap the same way as any other enrichment — see sources.
UI: the source detail page's enrichment controls let you toggle NTFS / NFSv4 ACL capture.
CLI:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source meta-caps --source-id 'smbfs://nas.corp/finance' --caps ntfs_acl
# NFS export:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source meta-caps --source-id 'nfs://nas.corp/export/data' --caps nfs4_acl
API:
curl -X POST -H "Authorization: Bearer $KDBL_TOKEN" \
-H "Content-Type: application/json" \
"$KDBL_URL/api/sources/<urlencoded-source-id>/meta-caps" \
-d '{ "caps": "ntfs_acl" }'
SMB userspace (the
smbprotocol) has no ACL surface;ntfs_aclis masked out for it at registry time. Use thesmbfs(kernel-mount) backend for NTFS ACL capture.
After enabling a cap on a source that's already indexed, run a backfill so existing files get their ACL bytes (and therefore their grants) computed:
POSIX vs ACL grants¶
K-Lake derives read-grants from whichever permission model the source exposes. Each cap that produced a grant is tracked against that grant, so re-crawling one model's grants never clobbers another's.
POSIX (automatic). Derived from the file's mode bits and owner/group:
| Mode bit | Octal | Effect |
|---|---|---|
| other-read | 0o004 |
File is world-readable → grants_state = 2, no per-principal rows needed |
| owner-read | 0o400 |
Grant to posixuid:<source_id>:<uid> |
| group-read | 0o040 |
Grant to posixgid:<source_id>:<gid> |
ACL (requires the cap). Derivation is deliberately conservative — it only emits a grant it is certain of, and any parse error or malformed field yields no grants (fail-closed):
- NFSv4 (
nfs4_acl, RFC 7530 §5.11 wire format). Allow-read ACEs become grants;OWNER@/GROUP@resolve to the file's POSIX owner/group principals, named whos becomenfs4who:<source>:<who>refs for directory enrichment to alias later.EVERYONE@allow-read marks the file world-readable; aDENY-read onEVERYONE@yields no grants. - NTFS (
ntfs_acl, parsed from the self-relativeSECURITY_DESCRIPTOR, MS-DTYP §2.4.6). DACL allow-read ACEs becomesid::<SID>grants; the well-known Everyone SID (S-1-1-0) marks the file world-readable. Deny-overrides-allow is resolved at compute time so reads stay a positive set-intersection.
grants_state and fail-closed behavior¶
Each file's grants_state records whether grants have been computed and what they say. It's how the read-time predicate distinguishes "no grants yet" (fail-closed input) from "grants present" without scanning the full grant set on every hit.
grants_state |
Meaning | Visible under per_file? |
|---|---|---|
0 |
No per-file grants computed (uncomputed, or no metadata captured yet) | Hidden when fail_closed = true; falls back to source-level visibility when fail_closed = false |
1 |
Grants computed and present | Visible only to callers holding a matching grant |
2 |
Grants computed; file is world / everyone-readable | Visible to everyone with source access |
The default for every new file is 0. So the moment you flip a source to mode = per_file, files trim down to nothing until their grants are computed — which is exactly the fail-closed guarantee, but it means you should backfill (and, for ACL sources, enable the capture cap first) before relying on the source being usefully visible.
Correlating callers to files¶
POSIX trimming works out of the box when the caller's IdP groups resolve to the same names/ids as the file owners. Cross-directory cases — an on-prem AD SID on a Windows ACL vs. an Entra object id in an OIDC token, or a numeric POSIX gid vs. a name-based IdP group — need the two namespaces bridged.
That bridging is directory enrichment: it materializes the alias graph from declared operator mappings, POSIX name resolution, Entra Graph discovery (group + user SIDs, with group-membership/overage resolution), and AD/LDAP correlation for unlinked on-prem directories. Only high-confidence alias edges are used to expand grants, so a fuzzy match can never produce a false grant. See directory enrichment for how to configure and run it.
Limitations and current management surface¶
- Set via UI, CLI, or API. The source detail page's Per-file security trimming panel, the
source security-trimCLI command, or thePOST /sources/:id/security-trimendpoint (above) all set the sameconfig.security_trimblock. - ACL trimming needs the capture cap first. Enable
ntfs_acl/nfs4_acl(and backfill) before trusting ACL-controlled files to be visible. POSIX-only trimming needs no cap. - Cross-directory trimming needs directory enrichment. Without the alias graph, only same-namespace (typically POSIX) grants match. Entra Graph and AD/LDAP discovery run automatically on the worker's directory sweep once configured — see directory enrichment.
- Fail-closed means "hidden until computed." Flipping a source to
per_filehides its files until grants land. Plan a backfill as part of the cutover. - The trim layer is consumed by the API and MCP — both resolve the caller's principal set into the same authorization context, so a caller sees the same trimmed view through either surface.