Skip to content

CLI reference

kdbl-control is the command-line interface to KDBL Context Lake (K-Lake). Most commands call the same REST API as the UI (so they respect tenant isolation and the audit trail), but a few run in direct-DB mode — see the per-command notes below.

Installing

The CLI ships as a static binary inside the K-Lake container image at /usr/local/bin/kdbl-control. Pull it from the image you run in your cluster, or ask your administrator for a host binary.

Once on PATH:

kdbl-control --help

Authentication

The CLI has two modes:

Mode Flags When to use
API --api-url <URL> --api-token <PAT> Day-to-day. Goes through the REST API and respects tenant isolation. Required by init, onboard, doctor, source extract / security-trim / multichannel / schedule, mcp, and status.
Direct database --postgres-url <URL> (+ KDBL_MASTER_KEY for secrets) Bootstrap, disaster-recovery, and the directory and tenant families (the API process holds no master key, so storing encrypted directory secrets is direct-DB only).

For most work, set:

export KDBL_URL=https://kdbl.example.com
export KDBL_TOKEN=kdblpat_...

and pass --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" (or rely on the environment if your build supports it).

Subcommands

crawl

Enqueue a crawl for a source. Workers pick up the work and expand it from the root prefix.

kdbl-control crawl --source-id 's3://my-bucket'
kdbl-control crawl --source-id 's3://my-bucket' --prefix 'reports/2026/'

Flags:

  • --source-id <ID> or --bucket <NAME> — one is required (--bucket is a shorthand for an S3 source)
  • --prefix <PATH> — narrow the crawl to a subtree (default: full source)
  • --mode <hierarchical|flat> — listing strategy (default hierarchical)
  • --label <NAME> — tag the run for later auditing
  • --force-reextract — re-extract every eligible file, even unchanged ones (default off)

Recrawls skip unchanged content. By default a recrawl only extracts files that are new or changed since they were last extracted — unchanged content is not re-run, because extraction is GPU-heavy. "Changed" is decided per file by its etag (S3) or, for sources without one (nfs/smbfs/smb), its mtime+size. Use --force-reextract for a deliberate full re-run, e.g. after upgrading the extractor or changing the extract policy. The kdbl_extract_skipped_unchanged_total metric counts how many files a recrawl skips.

status

Show queue depth and per-source rollup.

kdbl-control status
kdbl-control status --include-cluster   # cluster-wide, requires admin

doctor

Health-check a deployment end-to-end and print a // checklist: API liveness (/healthz) and readiness (/readyz), token validity + kind, the queue/source rollup, the extractor fleet, and whether the MCP endpoint is enabled. Use it as the "you're done" gate right after scripts/bootstrap.sh, or any time to triage. API mode only.

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" doctor
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" doctor --mcp   # also smoke-test /mcp
kdbl doctor — https://kdbl.example.com

  ✓ API reachable — /healthz 200
  ✓ API ready (DB reachable) — /readyz 200
  ✓ token valid (cluster-admin)
  ✓ queue reachable — pending=0 running=0 done=0 failed=0
  • no sources indexed yet — next: add one with `source add-*` + `crawl`
  ✓ extractor fleet healthy (2/2)
  ✓ MCP endpoint enabled — PRM 200

✓ all checks passed

marks a critical failure (unreachable, not ready, bad token, MCP smoke failure) and makes doctor exit non-zero — so it's CI/scriptable. flags something degraded (failed queue tasks, sources with errors, a stale extractor heartbeat); is informational and expected on a fresh cluster (no sources yet, extraction not deployed, MCP off) — neither fails the gate. Works with a cluster-admin token (cluster-wide rollup) or a user PAT (tenant-scoped).

init

Interactive first-source wizard — the guided Day-1 path that does what source add-* + source extract enable + crawl do, but walks you through it and surfaces the bit newcomers miss: search needs extraction. It connects a source (S3 / SMB / SMBFS), offers to enable content extraction, and offers to start the first crawl. Runs as a tenant user PAT (a cluster-admin token has no tenant — onboard one first and use the PAT it prints).

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" init
kdbl init — connecting your first source as alice@acme.com.

What kind of source do you want to connect?
  1) S3 / object store (s3://)
  2) SMB / Windows share — kernel mount (smbfs://)
  3) SMB / Windows share — userspace (smb://)
choice [1]: 1
S3 bucket: acme-docs
source id [s3://acme-docs]:
✓ source created
Enable content extraction? (full-text search needs extracted text) [Y/n]: y
✓ extraction enabled — PDFs/Office docs etc. are parsed as they're crawled
Start the first crawl now? [Y/n]: y
✓ crawl started (task 1042)

It prompts on a TTY (passwords are read without echo); for scripting you can pipe answers on stdin in order. NFS sources aren't offered here yet — register those with source add-nfs (direct-DB mode). When you'd rather not be prompted, the non-interactive equivalents are source add-*, source extract enable, and crawl.

source

Manage the source registry.

Subcommand What it does
add-s3 Register an S3 source
add-smb Register an SMB (userspace) source
add-smbfs Register an SMB source mounted via the kernel
add-nfs Register an NFS source
list List sources in your tenant
show Show a single source in detail
enable / disable Toggle the enabled flag
remove Delete a source and its indexed files
bulk-ingest Toggle the bulk-ingest fast path
meta-caps Choose which optional enrichments to gather (incl. ntfs_acl / nfs4_acl ACL capture)
backfill-meta Enqueue enrichment for files indexed before a cap was enabled
meta-coverage Show how many files have each enrichment populated
subtree Tune per-source concurrency hints
security-trim Set the per-file trimming policy (--mode per_file\|source_only\|open)
multichannel SMB3 multi-channel on an smbfs source
extract Content-extraction control: enable / disable / show / backfill / progress
schedule Scheduled crawls/backfills: add / list / rm / pause / resume / run
progress Live crawl/extract progress for a source

See Sources for add-* examples and Enablement for extract / security-trim / schedule.

Tokens and users are managed through the REST API and the UI, not the CLI — there is no kdbl-control tokens or users command. Mint a PAT and manage tenant users from the web console (or POST /api/users, the user-create response returns the one-time token).

directory (direct-DB)

Build the alias graph that ties IdP identities to file-native principals (for per-file trimming). Runs in direct-DB mode and needs KDBL_MASTER_KEY to encrypt secrets; secrets come from env vars, never flags. See Directory enrichment.

kdbl-control directory set-graph --tenant acme --entra-tenant <id> --client-id <id>   # KDBL_GRAPH_CLIENT_SECRET in env
kdbl-control directory set-ldap  --tenant acme --url ldaps://dc:636 --bind-user 'CORP\svc' --base-dn 'DC=corp' --upn-rewrite corp.local=corp.com
kdbl-control directory sync-graph --tenant acme   # one-shot run
kdbl-control directory sync-ldap  --tenant acme

files

Browse a source's files and mint signed links that open the original file (re-fetched on demand from the source, access re-checked, audited). API mode only; the calling token's tenant scopes what's visible.

kdbl-control files list --source-id s3://docs-bucket --limit 50          # paginated by key
kdbl-control files show --source-id s3://docs-bucket --key reports/q3.pdf # detail + signed links
kdbl-control files link --source-id s3://docs-bucket --key reports/q3.pdf # print an inline preview link
kdbl-control files link --source-id s3://docs-bucket --key reports/q3.pdf --download   # attachment link

files link prints just the URL (so it's pipe-friendly); files show returns the full detail including preview_url / download_url. The links require the server to have signed downloads enabled (KDBL_DOWNLOAD_SIGNING_SECRET / KDBL_API_PUBLIC_URL / KDBL_INTERNAL_FETCH_TOKEN); they expire in ~15 minutes.

mcp

kdbl-control mcp smoke   # fetch PRM + run initialize + tools/list against /mcp
kdbl-control mcp audit --tool search_content --limit 50   # query the audit trail

extractors

kdbl-control extractors list   # extractor pods and their health

onboard (cluster admin)

Stand up a new customer in one step: create the tenant (idempotent — reuses an existing one), create its first tenant-admin user, and mint that user's initial PAT. The token is printed once — hand it to the customer over a secure channel. This is the secure self-service path: it goes through the API's auth + audit boundary, so you don't need direct DB or kubectl access to onboard.

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_CLUSTER_ADMIN_TOKEN" \
  onboard acme --name "ACME Corp" --admin-name "Jane Admin" --admin-email jane@acme.com
✓ created tenant acme (id=019eb528-…)
✓ created tenant-admin user Jane Admin (id=375e848c-…)

Admin PAT for acme (shown ONCE — store it securely):
kdblpat_…

Flags:

  • <slug> — tenant slug (positional); also the default display name. Keep it DNS/URL-safe.
  • --name <NAME> — tenant display name (defaults to the slug)
  • --admin-name <NAME> — the first tenant-admin user's display name (default admin)
  • --admin-email <EMAIL> — the admin user's email (optional)
  • --oidc-config-json <JSON> — inline OIDC config so the customer can federate sign-in from day one (optional)

API mode only, with a cluster-admin token. onboard needs --api-url/--api-token and the token must be the KDBL_CLUSTER_ADMIN_TOKEN. The admin API is sealed from the public MCP tunnel by the path-scoping gateway, so run it against the in-cluster API — e.g. kubectl -n kdbl port-forward svc/kdbl-api 18080:80 and --api-url http://localhost:18080.

The token is shown once and is never recoverable. Re-running onboard for an existing tenant is safe — it reuses the tenant and mints a new admin user.

tenant (cluster admin)

For the common case (new tenant + first admin + token in one shot) use onboard. The tenant subcommands below are the lower-level CRUD primitives.

kdbl-control tenant create --slug acme --name "ACME Corp"
kdbl-control tenant list
kdbl-control tenant show --slug acme
kdbl-control tenant retention --slug acme            # read the current override
kdbl-control tenant retention --slug acme --days 90  # set
kdbl-control tenant retention --slug acme --clear    # revert to the cluster default

secret keygen

Generate a fresh master key for credential encryption (used during cluster bootstrap).

kdbl-control secret keygen

bench, bench-sink, bench-queries

Operator-side benchmarking helpers. See --help for each. These are diagnostic — they exercise the live system. Run them against a non-production tenant.

Exit codes

Code Meaning
0 Success
1 Generic error
2 Authentication failed
3 Authorization denied
4 Resource not found
5 Validation error (bad arguments)

Tips

  • Pipe secrets in over stdin (--secret-access-key-stdin, --password-stdin) so they don't appear in shell history or process listings.
  • Source IDs containing / are fine in CLI flags — kdbl-control handles URL encoding for you when talking to the API.
  • Run kdbl-control <command> --help for the full flag list on any subcommand.