Quick start¶
This guide takes you from a running KDBL Context Lake (K-Lake) deployment to your first grounded, verifiable answer — indexing a source, extracting its content, searching it, and opening the original file behind any result to confirm it.
It assumes your administrator has deployed K-Lake and given you two things: the
URL of your K-Lake instance (for example https://kdbl.example.com) and a
personal access token (PAT). If you are the administrator and still need to
deploy, contact your KDBL representative for the installation bundle, or stand up
an evaluation stack on a single VM in one command (see the
single-VM evaluation guide); the rest of this guide picks up once K-Lake
is running.
Administrators: once the cluster is up, create a customer's tenant + first admin user + PAT in one step with
kdbl-control onboard <slug> --name "<Display Name>"(cluster-admin token; see CLI reference →onboard). Hand the printed PAT to the user — that's the token this guide expects below.
You will use one or more of three interchangeable interfaces — the web
console, the kdbl-control CLI, and the REST API. They act on the
same data, so pick whichever suits you. Every action is scoped to your tenant
and audited.
1. Sign in¶
Open your K-Lake URL in a browser. Sign in with your PAT (paste it into the login form) or, if your tenant uses single sign-on, the SSO button. You land on the dashboard, which shows queue depth and a per-source summary.
To use the CLI or API, set these once in your shell:
Need a fresh token? In the console, open the menu under your name → Tokens → New token. The value is shown once — copy it immediately.
2. Add a source¶
A source is one location K-Lake tracks — an S3 bucket, an SMB/CIFS share, or an NFS export. The quickest start is a small S3 bucket.
Guided (CLI): kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN"
init walks you through this whole section — connect a source, enable
extraction, and start the first crawl — in one interactive flow. The manual
steps below are the same thing, broken out.
Console: Sources → Add source, choose s3, give it a stable ID (e.g.
s3://docs-bucket), enter the bucket, region, and credentials, and Save.
CLI — S3:
printf '%s' "$AWS_SECRET_ACCESS_KEY" | kdbl-control \
--api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source add-s3 \
--source-id 's3://docs-bucket' \
--bucket docs-bucket --region us-east-1 \
--access-key-id AKIA... --secret-access-key-stdin
CLI — SMB/CIFS: two backends — smb (userspace, no special privileges) and
smbfs (kernel CIFS mount; adds NTFS ACLs, SMB3 multichannel, and
backup-operator intent). The password is read from stdin, like the S3 secret.
# userspace backend
printf '%s' "$SMB_PASSWORD" | kdbl-control \
--api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source add-smb \
--source-id 'smb://fileserver/share' \
--server fileserver --share share --username svc-kdbl
# kernel-mount backend (NTFS ACLs etc.)
printf '%s' "$SMB_PASSWORD" | kdbl-control \
--api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source add-smbfs \
--source-id 'smbfs://fileserver/share' \
--server fileserver --share share --username svc-kdbl --backup-intent
The secret (S3) or password (SMB) is read from stdin so it never lands in your shell history. Credentials are encrypted at rest. See Sources for NFS, and for narrowing a source to a subtree.
The web console's Add source form and the
source add-*CLI commands turn content extraction on by default (any file type), so the source you just added is already set to become searchable after its first crawl. Pass--no-extract(CLI) or untick the box (console) to opt out. The raw RESTPOST /sourcesdoes not auto-enable — use step 3 for that.
3. Tune extraction (optional)¶
Extraction reads each file's content so it becomes searchable. It's on by
default from step 2, so you can skip straight to the crawl — use this step
only to restrict by file type, or to enable extraction if you added the
source with --no-extract or via the raw API.
Console: on the source, open the Extraction tab; toggle it and
(optionally) set an extension allowlist such as pdf, docx.
CLI:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
source extract enable --source-id 's3://docs-bucket' --extensions pdf,docx
An extraction config change (whether from
add-*or this command) takes effect within ~30 s. If you crawl within that window the files are listed but not yet queued for extraction — just recrawl with--force-reextract(see the next step) and they'll be picked up.
4. Crawl the source¶
A crawl walks the source, records every file it finds, and — with extraction enabled above — queues each eligible file for content extraction.
Console: open the source and click Crawl.
CLI / API:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" crawl --source-id 's3://docs-bucket'
curl -X POST -H "Authorization: Bearer $KDBL_TOKEN" \
"$KDBL_URL/api/sources/s3%3A%2F%2Fdocs-bucket/crawl"
The source page shows live file and byte counts as the crawl runs, and extracted
content becomes searchable as soon as each file finishes. (Source IDs are
URL-encoded in API paths: s3://docs-bucket → s3%3A%2F%2Fdocs-bucket.)
Enabled extraction on an already-crawled source? A normal recrawl skips unchanged files, so it won't pick them up for extraction. Force a one-time full re-extraction:
5. Search your content¶
Console: open Content search, pick the source, and enter a query. Each result shows the file, an in-document locator (page / timestamp), and a highlighted snippet.
API:
curl -H "Authorization: Bearer $KDBL_TOKEN" \
"$KDBL_URL/api/sources/s3%3A%2F%2Fdocs-bucket/content/search?q=invoice"
Queries use a familiar syntax: bare words are AND, OR for alternatives,
"quoted phrases", and -exclude. Lead with the most distinctive term.
6. Open the original to verify¶
Every result is traceable to its source. In the console, each search hit and the
file-detail page carry an Open original link (and Download); the API
returns the same as a signed url on each hit. Clicking it re-fetches the
original file on demand so you can confirm exactly where an answer came from.
Access is always enforced: the link is short-lived, re-checks your permissions at open time, and is audited — you only ever see files you're authorized to see.
7. (Optional) Connect an AI assistant¶
K-Lake can serve your content to an AI assistant over the Model Context Protocol (MCP), so the assistant answers only from your knowledge base and cites a source for every fact — with the same per-user security and auditing.
Ask your administrator to enable MCP on your instance, then point your client at it. See Connecting MCP clients for Claude Desktop and other clients, and the MCP skills reference for the available tools.
Next steps¶
- Sources — protocol-specific configuration and lifecycle
- Security trimming — per-file access control
- MCP server — grounded AI retrieval over your content
- Sizing guide — plan resources as your fleet grows