CLI reference¶
kdbl-control is the command-line interface to KDBL Context Lake (K-Lake). Most commands call the same REST API as the UI (so they respect tenant isolation and the audit trail), but a few run in direct-DB mode — see the per-command notes below.
Installing¶
The CLI ships as a static binary inside the K-Lake container image at /usr/local/bin/kdbl-control. Pull it from the image you run in your cluster, or ask your administrator for a host binary.
Once on PATH:
Authentication¶
The CLI has two modes:
| Mode | Flags | When to use |
|---|---|---|
| API | --api-url <URL> --api-token <PAT> |
Day-to-day. Goes through the REST API and respects tenant isolation. Required by init, onboard, doctor, source extract / security-trim / multichannel / schedule, mcp, and status. |
| Direct database | --postgres-url <URL> (+ KDBL_MASTER_KEY for secrets) |
Bootstrap, disaster-recovery, and the directory and tenant families (the API process holds no master key, so storing encrypted directory secrets is direct-DB only). |
For most work, set:
and pass --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" (or rely on the environment if your build supports it).
Subcommands¶
crawl¶
Enqueue a crawl for a source. Workers pick up the work and expand it from the root prefix.
kdbl-control crawl --source-id 's3://my-bucket'
kdbl-control crawl --source-id 's3://my-bucket' --prefix 'reports/2026/'
Flags:
--source-id <ID>or--bucket <NAME>— one is required (--bucketis a shorthand for an S3 source)--prefix <PATH>— narrow the crawl to a subtree (default: full source)--mode <hierarchical|flat>— listing strategy (defaulthierarchical)--label <NAME>— tag the run for later auditing--force-reextract— re-extract every eligible file, even unchanged ones (default off)
Recrawls skip unchanged content. By default a recrawl only extracts files that are new or changed since they were last extracted — unchanged content is not re-run, because extraction is GPU-heavy. "Changed" is decided per file by its etag (S3) or, for sources without one (nfs/smbfs/smb), its
mtime+size. Use--force-reextractfor a deliberate full re-run, e.g. after upgrading the extractor or changing the extract policy. Thekdbl_extract_skipped_unchanged_totalmetric counts how many files a recrawl skips.
status¶
Show queue depth and per-source rollup.
doctor¶
Health-check a deployment end-to-end and print a ✓/⚠/• checklist: API
liveness (/healthz) and readiness (/readyz), token validity + kind, the
queue/source rollup, the extractor fleet, and whether the MCP endpoint is
enabled. Use it as the "you're done" gate right after scripts/bootstrap.sh,
or any time to triage. API mode only.
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" doctor
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" doctor --mcp # also smoke-test /mcp
kdbl doctor — https://kdbl.example.com
✓ API reachable — /healthz 200
✓ API ready (DB reachable) — /readyz 200
✓ token valid (cluster-admin)
✓ queue reachable — pending=0 running=0 done=0 failed=0
• no sources indexed yet — next: add one with `source add-*` + `crawl`
✓ extractor fleet healthy (2/2)
✓ MCP endpoint enabled — PRM 200
✓ all checks passed
✗ marks a critical failure (unreachable, not ready, bad token, MCP smoke
failure) and makes doctor exit non-zero — so it's CI/scriptable. ⚠ flags
something degraded (failed queue tasks, sources with errors, a stale extractor
heartbeat); • is informational and expected on a fresh cluster (no sources
yet, extraction not deployed, MCP off) — neither fails the gate. Works with a
cluster-admin token (cluster-wide rollup) or a user PAT (tenant-scoped).
init¶
Interactive first-source wizard — the guided Day-1 path that does what source
add-* + source extract enable + crawl do, but walks you through it and
surfaces the bit newcomers miss: search needs extraction. It connects a
source (S3 / SMB / SMBFS), offers to enable content extraction, and offers to
start the first crawl. Runs as a tenant user PAT (a cluster-admin token has
no tenant — onboard one first and use the PAT it prints).
kdbl init — connecting your first source as alice@acme.com.
What kind of source do you want to connect?
1) S3 / object store (s3://)
2) SMB / Windows share — kernel mount (smbfs://)
3) SMB / Windows share — userspace (smb://)
choice [1]: 1
S3 bucket: acme-docs
source id [s3://acme-docs]:
…
✓ source created
Enable content extraction? (full-text search needs extracted text) [Y/n]: y
✓ extraction enabled — PDFs/Office docs etc. are parsed as they're crawled
Start the first crawl now? [Y/n]: y
✓ crawl started (task 1042)
It prompts on a TTY (passwords are read without echo); for scripting you can
pipe answers on stdin in order. NFS sources aren't offered here yet — register
those with source add-nfs (direct-DB mode). When you'd rather not be prompted,
the non-interactive equivalents are source add-*, source extract enable, and
crawl.
source¶
Manage the source registry.
| Subcommand | What it does |
|---|---|
add-s3 |
Register an S3 source |
add-smb |
Register an SMB (userspace) source |
add-smbfs |
Register an SMB source mounted via the kernel |
add-nfs |
Register an NFS source |
list |
List sources in your tenant |
show |
Show a single source in detail |
enable / disable |
Toggle the enabled flag |
remove |
Delete a source and its indexed files |
bulk-ingest |
Toggle the bulk-ingest fast path |
meta-caps |
Choose which optional enrichments to gather (incl. ntfs_acl / nfs4_acl ACL capture) |
backfill-meta |
Enqueue enrichment for files indexed before a cap was enabled |
meta-coverage |
Show how many files have each enrichment populated |
subtree |
Tune per-source concurrency hints |
security-trim |
Set the per-file trimming policy (--mode per_file\|source_only\|open) |
multichannel |
SMB3 multi-channel on an smbfs source |
extract |
Content-extraction control: enable / disable / show / backfill / progress |
schedule |
Scheduled crawls/backfills: add / list / rm / pause / resume / run |
progress |
Live crawl/extract progress for a source |
See Sources for add-* examples and Enablement for extract / security-trim / schedule.
Tokens and users are managed through the REST API and the UI, not the CLI — there is no
kdbl-control tokensoruserscommand. Mint a PAT and manage tenant users from the web console (orPOST /api/users, the user-create response returns the one-time token).
directory (direct-DB)¶
Build the alias graph that ties IdP identities to file-native principals (for per-file trimming). Runs in direct-DB mode and needs KDBL_MASTER_KEY to encrypt secrets; secrets come from env vars, never flags. See Directory enrichment.
kdbl-control directory set-graph --tenant acme --entra-tenant <id> --client-id <id> # KDBL_GRAPH_CLIENT_SECRET in env
kdbl-control directory set-ldap --tenant acme --url ldaps://dc:636 --bind-user 'CORP\svc' --base-dn 'DC=corp' --upn-rewrite corp.local=corp.com
kdbl-control directory sync-graph --tenant acme # one-shot run
kdbl-control directory sync-ldap --tenant acme
files¶
Browse a source's files and mint signed links that open the original file (re-fetched on demand from the source, access re-checked, audited). API mode only; the calling token's tenant scopes what's visible.
kdbl-control files list --source-id s3://docs-bucket --limit 50 # paginated by key
kdbl-control files show --source-id s3://docs-bucket --key reports/q3.pdf # detail + signed links
kdbl-control files link --source-id s3://docs-bucket --key reports/q3.pdf # print an inline preview link
kdbl-control files link --source-id s3://docs-bucket --key reports/q3.pdf --download # attachment link
files link prints just the URL (so it's pipe-friendly); files show returns
the full detail including preview_url / download_url. The links require the
server to have signed downloads enabled (KDBL_DOWNLOAD_SIGNING_SECRET /
KDBL_API_PUBLIC_URL / KDBL_INTERNAL_FETCH_TOKEN); they expire in ~15 minutes.
mcp¶
kdbl-control mcp smoke # fetch PRM + run initialize + tools/list against /mcp
kdbl-control mcp audit --tool search_content --limit 50 # query the audit trail
extractors¶
onboard (cluster admin)¶
Stand up a new customer in one step: create the tenant (idempotent — reuses an
existing one), create its first tenant-admin user, and mint that user's initial
PAT. The token is printed once — hand it to the customer over a secure
channel. This is the secure self-service path: it goes through the API's auth +
audit boundary, so you don't need direct DB or kubectl access to onboard.
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_CLUSTER_ADMIN_TOKEN" \
onboard acme --name "ACME Corp" --admin-name "Jane Admin" --admin-email jane@acme.com
✓ created tenant acme (id=019eb528-…)
✓ created tenant-admin user Jane Admin (id=375e848c-…)
Admin PAT for acme (shown ONCE — store it securely):
kdblpat_…
Flags:
<slug>— tenant slug (positional); also the default display name. Keep it DNS/URL-safe.--name <NAME>— tenant display name (defaults to the slug)--admin-name <NAME>— the first tenant-admin user's display name (defaultadmin)--admin-email <EMAIL>— the admin user's email (optional)--oidc-config-json <JSON>— inline OIDC config so the customer can federate sign-in from day one (optional)
API mode only, with a cluster-admin token.
onboardneeds--api-url/--api-tokenand the token must be theKDBL_CLUSTER_ADMIN_TOKEN. The admin API is sealed from the public MCP tunnel by the path-scoping gateway, so run it against the in-cluster API — e.g.kubectl -n kdbl port-forward svc/kdbl-api 18080:80and--api-url http://localhost:18080.The token is shown once and is never recoverable. Re-running
onboardfor an existing tenant is safe — it reuses the tenant and mints a new admin user.
tenant (cluster admin)¶
For the common case (new tenant + first admin + token in one shot) use
onboard. The tenant subcommands below are the
lower-level CRUD primitives.
kdbl-control tenant create --slug acme --name "ACME Corp"
kdbl-control tenant list
kdbl-control tenant show --slug acme
kdbl-control tenant retention --slug acme # read the current override
kdbl-control tenant retention --slug acme --days 90 # set
kdbl-control tenant retention --slug acme --clear # revert to the cluster default
secret keygen¶
Generate a fresh master key for credential encryption (used during cluster bootstrap).
bench, bench-sink, bench-queries¶
Operator-side benchmarking helpers. See --help for each. These are diagnostic — they exercise the live system. Run them against a non-production tenant.
Exit codes¶
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Generic error |
| 2 | Authentication failed |
| 3 | Authorization denied |
| 4 | Resource not found |
| 5 | Validation error (bad arguments) |
Tips¶
- Pipe secrets in over stdin (
--secret-access-key-stdin,--password-stdin) so they don't appear in shell history or process listings. - Source IDs containing
/are fine in CLI flags — kdbl-control handles URL encoding for you when talking to the API. - Run
kdbl-control <command> --helpfor the full flag list on any subcommand.