Skip to content

Security trimming

Per-file security trimming makes a caller see only the files their source-native permissions already grant them — the file's own owner, group, or ACL. A user who can't read a file on the NAS can't see it, its metadata, or its extracted content through KDBL Context Lake (K-Lake) either.

Enforcement lives in the datastore's row-level security (RLS), so it applies everywhere automatically: list, file detail, and content search are all trimmed by the same predicate. There is no separate "filter the search results" code path to keep in sync.

Trimming is fail-closed and strictly opt-in. A source you don't configure for it keeps today's source-level visibility. A source you do configure for per-file trimming hides any file whose grants haven't been computed yet — K-Lake would rather hide a file you could read than leak one you can't.

How it works

Both sides of the question — "who is this caller?" and "who may read this file?" — are collapsed into one namespace, the principal ref, of the form <kind>:<scope>:<value>. Each distinct ref is interned to an integer principal_id.

  • File side. When a file is indexed, K-Lake derives the set of principal refs that may read it from its POSIX mode bits and (if captured) its ACL, and records a read-grant for each granted principal.
  • Caller side. When a caller hits the API, their IdP claims (object id, email, UPN, group memberships) are turned into principal refs, resolved to the principal_id set, and handed to the datastore.

A file is authorized for a caller when the caller's resolved principal id set intersects the file's grant set — a positive set-intersection. The same authorization predicate is applied to file listings, metadata, and content, ANDed on top of the existing tenant + source-visibility checks. It short-circuits in this order:

  1. Tenant admin, or source owner/editor → unrestricted (admin path).
  2. Source not in per_file mode → source-level visibility (today's behavior).
  3. File is world/everyone-readable (grants_state = 2) → visible.
  4. Caller holds a read grant on the exact file → visible.
  5. fail_closed is off and the file's grants aren't computed yet → visible (fail-open fallback).
  6. Otherwise → hidden.

The principal-ref kinds are:

Kind Scope Example Origin
sid AD/domain (often empty) sid::S-1-5-21-… NTFS owner / ACE SID
upn (none) upn:alice@corp.com User principal name — the cross-directory bridge key
email (none) email:alice@corp.com Email / Google Workspace group address (lowercased)
oid issuer oid:<issuer>:<objectId> Entra/Azure object GUID (user or group)
posixuid source_id posixuid:<source_id>:<uid> Numeric POSIX owner — source-scoped (uid spaces are server-local)
posixgid source_id posixgid:<source_id>:<gid> Numeric POSIX group — source-scoped
name directory name:<dir>:<group> Resolved friendly user/group name
nfs4who source_id nfs4who:<source_id>:<who> NFSv4 who-string, pre-resolution

The same ref can appear on both sides — e.g. a POSIX group on the file and the caller's resolved group — and that's the common case. When the file-side and caller-side refs are in different namespaces (an on-prem AD SID on the file vs. an Entra group object id in the token), they're bridged by directory enrichment, which builds the alias graph.

Enabling it on a source

The trim policy lives in a security_trim block inside the source's config JSON, read by both the workers and the authorization predicate. The block has two fields:

Field Values Default Meaning
mode per_file | source_only | open source_only per_file turns on per-file trimming. source_only is today's source-level visibility. open explicitly marks the source fully public (no trimming, even if other sources in the tenant trim). An absent block or any unrecognised value falls back to source_only.
fail_closed true | false true With per_file, decides what happens to files whose grants aren't computed yet: hide them (true) or fall back to source-level visibility (false).

Set it through the dedicated endpoint, which merges the block into the source config without disturbing the rest (the same merge pattern as meta-caps / subtree). The change takes effect within ~30 s.

CLI:

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source security-trim --source-id 'smbfs://nas.corp/finance' \
  --mode per_file --fail-closed true

--fail-closed is optional — omit it to flip the mode while leaving the stored fail_closed untouched.

API:

curl -X POST -H "Authorization: Bearer $KDBL_TOKEN" \
     -H "Content-Type: application/json" \
     "$KDBL_URL/api/sources/<urlencoded-source-id>/security-trim" \
     -d '{ "mode": "per_file", "fail_closed": true }'

The response echoes the stored policy ({ "mode": ..., "fail_closed": ... }). Auth is tenant-admin or the source's owner/editor; an unauthorised caller gets 404 (the row isn't visible under RLS), not 403.

In the UI, the source detail page has a Per-file security trimming panel (mode dropdown + fail-closed toggle) that posts to this endpoint.

Prerequisite for ACL-based trimming: capture the ACL bytes

POSIX grants are derived automatically from the mode/uid/gid the crawl already captures — no extra configuration. ACL grants are not: the source must first capture the raw ACL bytes via a metadata-capture cap (ntfs_acl for SMB/SMBFS, nfs4_acl for NFS). Without the cap, there are no ACL bytes to derive grants from, and an ACL-controlled file falls back to its POSIX bits (commonly grants_state = 0 → hidden under fail-closed).

Enable the cap the same way as any other enrichment — see sources.

UI: the source detail page's enrichment controls let you toggle NTFS / NFSv4 ACL capture.

CLI:

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source meta-caps --source-id 'smbfs://nas.corp/finance' --caps ntfs_acl
# NFS export:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source meta-caps --source-id 'nfs://nas.corp/export/data' --caps nfs4_acl

API:

curl -X POST -H "Authorization: Bearer $KDBL_TOKEN" \
     -H "Content-Type: application/json" \
     "$KDBL_URL/api/sources/<urlencoded-source-id>/meta-caps" \
     -d '{ "caps": "ntfs_acl" }'

SMB userspace (the smb protocol) has no ACL surface; ntfs_acl is masked out for it at registry time. Use the smbfs (kernel-mount) backend for NTFS ACL capture.

After enabling a cap on a source that's already indexed, run a backfill so existing files get their ACL bytes (and therefore their grants) computed:

kdbl-control source backfill-meta --source-id 'smbfs://nas.corp/finance'

POSIX vs ACL grants

K-Lake derives read-grants from whichever permission model the source exposes. Each cap that produced a grant is tracked against that grant, so re-crawling one model's grants never clobbers another's.

POSIX (automatic). Derived from the file's mode bits and owner/group:

Mode bit Octal Effect
other-read 0o004 File is world-readable → grants_state = 2, no per-principal rows needed
owner-read 0o400 Grant to posixuid:<source_id>:<uid>
group-read 0o040 Grant to posixgid:<source_id>:<gid>

ACL (requires the cap). Derivation is deliberately conservative — it only emits a grant it is certain of, and any parse error or malformed field yields no grants (fail-closed):

  • NFSv4 (nfs4_acl, RFC 7530 §5.11 wire format). Allow-read ACEs become grants; OWNER@/GROUP@ resolve to the file's POSIX owner/group principals, named whos become nfs4who:<source>:<who> refs for directory enrichment to alias later. EVERYONE@ allow-read marks the file world-readable; a DENY-read on EVERYONE@ yields no grants.
  • NTFS (ntfs_acl, parsed from the self-relative SECURITY_DESCRIPTOR, MS-DTYP §2.4.6). DACL allow-read ACEs become sid::<SID> grants; the well-known Everyone SID (S-1-1-0) marks the file world-readable. Deny-overrides-allow is resolved at compute time so reads stay a positive set-intersection.

grants_state and fail-closed behavior

Each file's grants_state records whether grants have been computed and what they say. It's how the read-time predicate distinguishes "no grants yet" (fail-closed input) from "grants present" without scanning the full grant set on every hit.

grants_state Meaning Visible under per_file?
0 No per-file grants computed (uncomputed, or no metadata captured yet) Hidden when fail_closed = true; falls back to source-level visibility when fail_closed = false
1 Grants computed and present Visible only to callers holding a matching grant
2 Grants computed; file is world / everyone-readable Visible to everyone with source access

The default for every new file is 0. So the moment you flip a source to mode = per_file, files trim down to nothing until their grants are computed — which is exactly the fail-closed guarantee, but it means you should backfill (and, for ACL sources, enable the capture cap first) before relying on the source being usefully visible.

Correlating callers to files

POSIX trimming works out of the box when the caller's IdP groups resolve to the same names/ids as the file owners. Cross-directory cases — an on-prem AD SID on a Windows ACL vs. an Entra object id in an OIDC token, or a numeric POSIX gid vs. a name-based IdP group — need the two namespaces bridged.

That bridging is directory enrichment: it materializes the alias graph from declared operator mappings, POSIX name resolution, Entra Graph discovery (group + user SIDs, with group-membership/overage resolution), and AD/LDAP correlation for unlinked on-prem directories. Only high-confidence alias edges are used to expand grants, so a fuzzy match can never produce a false grant. See directory enrichment for how to configure and run it.

Limitations and current management surface

  • Set via UI, CLI, or API. The source detail page's Per-file security trimming panel, the source security-trim CLI command, or the POST /sources/:id/security-trim endpoint (above) all set the same config.security_trim block.
  • ACL trimming needs the capture cap first. Enable ntfs_acl / nfs4_acl (and backfill) before trusting ACL-controlled files to be visible. POSIX-only trimming needs no cap.
  • Cross-directory trimming needs directory enrichment. Without the alias graph, only same-namespace (typically POSIX) grants match. Entra Graph and AD/LDAP discovery run automatically on the worker's directory sweep once configured — see directory enrichment.
  • Fail-closed means "hidden until computed." Flipping a source to per_file hides its files until grants land. Plan a backfill as part of the cutover.
  • The trim layer is consumed by the API and MCP — both resolve the caller's principal set into the same authorization context, so a caller sees the same trimmed view through either surface.