CLI reference¶

kdbl-control is the command-line interface to KDBL Context Lake (K-Lake). Most commands call the same REST API as the UI (so they respect tenant isolation and the audit trail), but a few run in direct-DB mode — see the per-command notes below.

Installing¶

The CLI ships as a static binary inside the K-Lake container image at /usr/local/bin/kdbl-control. Pull it from the image you run in your cluster, or ask your administrator for a host binary.

Once on PATH:

kdbl-control --help

Authentication¶

The CLI has two modes:

Mode	Flags	When to use
API	`--api-url <URL> --api-token <PAT>`	Day-to-day. Goes through the REST API and respects tenant isolation. Required by `init`, `onboard`, `doctor`, `source extract` / `security-trim` / `multichannel` / `schedule`, `mcp`, and `status`.
Direct database	`--postgres-url <URL>` (+ `KDBL_MASTER_KEY` for secrets)	Bootstrap, disaster-recovery, and the `directory` and `tenant` families (the API process holds no master key, so storing encrypted directory secrets is direct-DB only).

For most work, set:

export KDBL_URL=https://kdbl.example.com
export KDBL_TOKEN=kdblpat_...

and pass --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" (or rely on the environment if your build supports it).

Subcommands¶

`crawl`¶

Enqueue a crawl for a source. Workers pick up the work and expand it from the root prefix.

kdbl-control crawl --source-id 's3://my-bucket'
kdbl-control crawl --source-id 's3://my-bucket' --prefix 'reports/2026/'

Flags:

--source-id <ID> or --bucket <NAME> — one is required (--bucket is a shorthand for an S3 source)
--prefix <PATH> — narrow the crawl to a subtree (default: full source)
--mode <hierarchical|flat> — listing strategy (default hierarchical)
--label <NAME> — tag the run for later auditing
--force-reextract — re-extract every eligible file, even unchanged ones (default off)

Recrawls skip unchanged content. By default a recrawl only extracts files that are new or changed since they were last extracted — unchanged content is not re-run, because extraction is GPU-heavy. "Changed" is decided per file by its etag (S3) or, for sources without one (nfs/smbfs/smb), its mtime+size. Use --force-reextract for a deliberate full re-run, e.g. after upgrading the extractor or changing the extract policy. The kdbl_extract_skipped_unchanged_total metric counts how many files a recrawl skips.

`status`¶

Show queue depth and per-source rollup.

kdbl-control status
kdbl-control status --include-cluster   # cluster-wide, requires admin

`doctor`¶

Health-check a deployment end-to-end and print a ✓/⚠/• checklist: API liveness (/healthz) and readiness (/readyz), token validity + kind, the queue/source rollup, the extractor fleet, and whether the MCP endpoint is enabled. Use it as the "you're done" gate right after scripts/bootstrap.sh, or any time to triage. API mode only.

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" doctor
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" doctor --mcp   # also smoke-test /mcp

kdbl doctor — https://kdbl.example.com

  ✓ API reachable — /healthz 200
  ✓ API ready (DB reachable) — /readyz 200
  ✓ token valid (cluster-admin)
  ✓ queue reachable — pending=0 running=0 done=0 failed=0
  • no sources indexed yet — next: add one with `source add-*` + `crawl`
  ✓ extractor fleet healthy (2/2)
  ✓ MCP endpoint enabled — PRM 200

✓ all checks passed

✗ marks a critical failure (unreachable, not ready, bad token, MCP smoke failure) and makes doctor exit non-zero — so it's CI/scriptable. ⚠ flags something degraded (failed queue tasks, sources with errors, a stale extractor heartbeat); • is informational and expected on a fresh cluster (no sources yet, extraction not deployed, MCP off) — neither fails the gate. Works with a cluster-admin token (cluster-wide rollup) or a user PAT (tenant-scoped).

`init`¶

Interactive first-source wizard — the guided Day-1 path that does what source add-* + source extract enable + crawl do, but walks you through it and surfaces the bit newcomers miss: search needs extraction. It connects a source (S3 / SMB / SMBFS), offers to enable content extraction, and offers to start the first crawl. Runs as a tenant user PAT (a cluster-admin token has no tenant — onboard one first and use the PAT it prints).

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" init

kdbl init — connecting your first source as alice@acme.com.

What kind of source do you want to connect?
  1) S3 / object store (s3://)
  2) SMB / Windows share — kernel mount (smbfs://)
  3) SMB / Windows share — userspace (smb://)
choice [1]: 1
S3 bucket: acme-docs
source id [s3://acme-docs]:
…
✓ source created
Enable content extraction? (full-text search needs extracted text) [Y/n]: y
✓ extraction enabled — PDFs/Office docs etc. are parsed as they're crawled
Start the first crawl now? [Y/n]: y
✓ crawl started (task 1042)

It prompts on a TTY (passwords are read without echo); for scripting you can pipe answers on stdin in order. NFS sources aren't offered here yet — register those with source add-nfs (direct-DB mode). When you'd rather not be prompted, the non-interactive equivalents are source add-*, source extract enable, and crawl.

`source`¶

Manage the source registry.

Subcommand	What it does
`add-s3`	Register an S3 source
`add-smb`	Register an SMB (userspace) source
`add-smbfs`	Register an SMB source mounted via the kernel
`add-nfs`	Register an NFS source
`list`	List sources in your tenant
`show`	Show a single source in detail
`enable` / `disable`	Toggle the `enabled` flag
`remove`	Delete a source and its indexed files
`bulk-ingest`	Toggle the bulk-ingest fast path
`meta-caps`	Choose which optional enrichments to gather (incl. `ntfs_acl` / `nfs4_acl` ACL capture)
`backfill-meta`	Enqueue enrichment for files indexed before a cap was enabled
`meta-coverage`	Show how many files have each enrichment populated
`subtree`	Tune per-source concurrency hints
`security-trim`	Set the per-file trimming policy (`--mode per_file\\|source_only\\|open`)
`multichannel`	SMB3 multi-channel on an `smbfs` source
`extract`	Content-extraction control: `enable` / `disable` / `show` / `backfill` / `progress`
`schedule`	Scheduled crawls/backfills: `add` / `list` / `rm` / `pause` / `resume` / `run`
`progress`	Live crawl/extract progress for a source

See Sources for add-* examples and Enablement for extract / security-trim / schedule.

Tokens and users are managed through the REST API and the UI, not the CLI — there is no kdbl-control tokens or users command. Mint a PAT and manage tenant users from the web console (or POST /api/users, the user-create response returns the one-time token).

`directory` (direct-DB)¶

Build the alias graph that ties IdP identities to file-native principals (for per-file trimming). Runs in direct-DB mode and needs KDBL_MASTER_KEY to encrypt secrets; secrets come from env vars, never flags. See Directory enrichment.

kdbl-control directory set-graph --tenant acme --entra-tenant <id> --client-id <id>   # KDBL_GRAPH_CLIENT_SECRET in env
kdbl-control directory set-ldap  --tenant acme --url ldaps://dc:636 --bind-user 'CORP\svc' --base-dn 'DC=corp' --upn-rewrite corp.local=corp.com
kdbl-control directory sync-graph --tenant acme   # one-shot run
kdbl-control directory sync-ldap  --tenant acme

`files`¶

Browse a source's files and mint signed links that open the original file (re-fetched on demand from the source, access re-checked, audited). API mode only; the calling token's tenant scopes what's visible.

kdbl-control files list --source-id s3://docs-bucket --limit 50          # paginated by key
kdbl-control files show --source-id s3://docs-bucket --key reports/q3.pdf # detail + signed links
kdbl-control files link --source-id s3://docs-bucket --key reports/q3.pdf # print an inline preview link
kdbl-control files link --source-id s3://docs-bucket --key reports/q3.pdf --download   # attachment link

files link prints just the URL (so it's pipe-friendly); files show returns the full detail including preview_url / download_url. The links require the server to have signed downloads enabled (KDBL_DOWNLOAD_SIGNING_SECRET / KDBL_API_PUBLIC_URL / KDBL_INTERNAL_FETCH_TOKEN); they expire in ~15 minutes.

`mcp`¶

kdbl-control mcp smoke   # fetch PRM + run initialize + tools/list against /mcp
kdbl-control mcp audit --tool search_content --limit 50   # query the audit trail

`extractors`¶

kdbl-control extractors list   # extractor pods and their health

`onboard` (cluster admin)¶

Stand up a new customer in one step: create the tenant (idempotent — reuses an existing one), create its first tenant-admin user, and mint that user's initial PAT. The token is printed once — hand it to the customer over a secure channel. This is the secure self-service path: it goes through the API's auth + audit boundary, so you don't need direct DB or kubectl access to onboard.

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_CLUSTER_ADMIN_TOKEN" \
  onboard acme --name "ACME Corp" --admin-name "Jane Admin" --admin-email jane@acme.com

✓ created tenant acme (id=019eb528-…)
✓ created tenant-admin user Jane Admin (id=375e848c-…)

Admin PAT for acme (shown ONCE — store it securely):
kdblpat_…

Flags:

<slug> — tenant slug (positional); also the default display name. Keep it DNS/URL-safe.
--name <NAME> — tenant display name (defaults to the slug)
--admin-name <NAME> — the first tenant-admin user's display name (default admin)
--admin-email <EMAIL> — the admin user's email (optional)
--oidc-config-json <JSON> — inline OIDC config so the customer can federate sign-in from day one (optional)

API mode only, with a cluster-admin token. onboard needs --api-url/--api-token and the token must be the KDBL_CLUSTER_ADMIN_TOKEN. The admin API is sealed from the public MCP tunnel by the path-scoping gateway, so run it against the in-cluster API — e.g. kubectl -n kdbl port-forward svc/kdbl-api 18080:80 and --api-url http://localhost:18080.

The token is shown once and is never recoverable. Re-running onboard for an existing tenant is safe — it reuses the tenant and mints a new admin user.

`tenant` (cluster admin)¶

For the common case (new tenant + first admin + token in one shot) use onboard. The tenant subcommands below are the lower-level CRUD primitives.

kdbl-control tenant create --slug acme --name "ACME Corp"
kdbl-control tenant list
kdbl-control tenant show --slug acme
kdbl-control tenant retention --slug acme            # read the current override
kdbl-control tenant retention --slug acme --days 90  # set
kdbl-control tenant retention --slug acme --clear    # revert to the cluster default

`secret keygen`¶

Generate a fresh master key for credential encryption (used during cluster bootstrap).

kdbl-control secret keygen

`bench`, `bench-sink`, `bench-queries`¶

Operator-side benchmarking helpers. See --help for each. These are diagnostic — they exercise the live system. Run them against a non-production tenant.

Exit codes¶

Code	Meaning
0	Success
1	Generic error
2	Authentication failed
3	Authorization denied
4	Resource not found
5	Validation error (bad arguments)

Tips¶

Pipe secrets in over stdin (--secret-access-key-stdin, --password-stdin) so they don't appear in shell history or process listings.
Source IDs containing / are fine in CLI flags — kdbl-control handles URL encoding for you when talking to the API.
Run kdbl-control <command> --help for the full flag list on any subcommand.

CLI reference¶

Installing¶

Authentication¶

Subcommands¶

crawl¶

status¶

doctor¶

init¶

source¶

directory (direct-DB)¶

files¶

mcp¶

extractors¶

onboard (cluster admin)¶

tenant (cluster admin)¶

secret keygen¶

bench, bench-sink, bench-queries¶