Quick start¶

This guide takes you from a running KDBL Context Lake (K-Lake) deployment to your first grounded, verifiable answer — indexing a source, extracting its content, searching it, and opening the original file behind any result to confirm it.

It assumes your administrator has deployed K-Lake and given you two things: the URL of your K-Lake instance (for example https://kdbl.example.com) and a personal access token (PAT). If you are the administrator and still need to deploy, contact your KDBL representative for the installation bundle, or stand up an evaluation stack on a single VM in one command (see the single-VM evaluation guide); the rest of this guide picks up once K-Lake is running.

Administrators: once the cluster is up, create a customer's tenant + first admin user + PAT in one step with kdbl-control onboard <slug> --name "<Display Name>" (cluster-admin token; see CLI reference → onboard). Hand the printed PAT to the user — that's the token this guide expects below.

You will use one or more of three interchangeable interfaces — the web console, the kdbl-control CLI, and the REST API. They act on the same data, so pick whichever suits you. Every action is scoped to your tenant and audited.

Open your K-Lake URL in a browser. Sign in with your PAT (paste it into the login form) or, if your tenant uses single sign-on, the SSO button. You land on the dashboard, which shows queue depth and a per-source summary.

To use the CLI or API, set these once in your shell:

export KDBL_URL=https://kdbl.example.com
export KDBL_TOKEN=kdblpat_<your-token>

Need a fresh token? In the console, open the menu under your name → Tokens → New token. The value is shown once — copy it immediately.

2. Add a source¶

A source is one location K-Lake tracks — an S3 bucket, an SMB/CIFS share, or an NFS export. The quickest start is a small S3 bucket.

Guided (CLI): kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" init walks you through this whole section — connect a source, enable extraction, and start the first crawl — in one interactive flow. The manual steps below are the same thing, broken out.

Console: Sources → Add source, choose s3, give it a stable ID (e.g. s3://docs-bucket), enter the bucket, region, and credentials, and Save.

CLI — S3:

printf '%s' "$AWS_SECRET_ACCESS_KEY" | kdbl-control \
  --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source add-s3 \
  --source-id 's3://docs-bucket' \
  --bucket docs-bucket --region us-east-1 \
  --access-key-id AKIA... --secret-access-key-stdin

CLI — SMB/CIFS: two backends — smb (userspace, no special privileges) and smbfs (kernel CIFS mount; adds NTFS ACLs, SMB3 multichannel, and backup-operator intent). The password is read from stdin, like the S3 secret.

# userspace backend
printf '%s' "$SMB_PASSWORD" | kdbl-control \
  --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source add-smb \
  --source-id 'smb://fileserver/share' \
  --server fileserver --share share --username svc-kdbl

# kernel-mount backend (NTFS ACLs etc.)
printf '%s' "$SMB_PASSWORD" | kdbl-control \
  --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source add-smbfs \
  --source-id 'smbfs://fileserver/share' \
  --server fileserver --share share --username svc-kdbl --backup-intent

The secret (S3) or password (SMB) is read from stdin so it never lands in your shell history. Credentials are encrypted at rest. See Sources for NFS, and for narrowing a source to a subtree.

The web console's Add source form and the source add-* CLI commands turn content extraction on by default (any file type), so the source you just added is already set to become searchable after its first crawl. Pass --no-extract (CLI) or untick the box (console) to opt out. The raw REST POST /sources does not auto-enable — use step 3 for that.

3. Tune extraction (optional)¶

Extraction reads each file's content so it becomes searchable. It's on by default from step 2, so you can skip straight to the crawl — use this step only to restrict by file type, or to enable extraction if you added the source with --no-extract or via the raw API.

Console: on the source, open the Extraction tab; toggle it and (optionally) set an extension allowlist such as pdf, docx.

CLI:

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  source extract enable --source-id 's3://docs-bucket' --extensions pdf,docx

An extraction config change (whether from add-* or this command) takes effect within ~30 s. If you crawl within that window the files are listed but not yet queued for extraction — just recrawl with --force-reextract (see the next step) and they'll be picked up.

4. Crawl the source¶

A crawl walks the source, records every file it finds, and — with extraction enabled above — queues each eligible file for content extraction.

Console: open the source and click Crawl.

CLI / API:

kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" crawl --source-id 's3://docs-bucket'

curl -X POST -H "Authorization: Bearer $KDBL_TOKEN" \
  "$KDBL_URL/api/sources/s3%3A%2F%2Fdocs-bucket/crawl"

The source page shows live file and byte counts as the crawl runs, and extracted content becomes searchable as soon as each file finishes. (Source IDs are URL-encoded in API paths: s3://docs-bucket → s3%3A%2F%2Fdocs-bucket.)

Enabled extraction on an already-crawled source? A normal recrawl skips unchanged files, so it won't pick them up for extraction. Force a one-time full re-extraction:
kdbl-control --api-url "$KDBL_URL" --api-token "$KDBL_TOKEN" \
  crawl --source-id 's3://docs-bucket' --force-reextract

5. Search your content¶

Console: open Content search, pick the source, and enter a query. Each result shows the file, an in-document locator (page / timestamp), and a highlighted snippet.

API:

curl -H "Authorization: Bearer $KDBL_TOKEN" \
  "$KDBL_URL/api/sources/s3%3A%2F%2Fdocs-bucket/content/search?q=invoice"

Queries use a familiar syntax: bare words are AND, OR for alternatives, "quoted phrases", and -exclude. Lead with the most distinctive term.

6. Open the original to verify¶

Every result is traceable to its source. In the console, each search hit and the file-detail page carry an Open original link (and Download); the API returns the same as a signed url on each hit. Clicking it re-fetches the original file on demand so you can confirm exactly where an answer came from.

Access is always enforced: the link is short-lived, re-checks your permissions at open time, and is audited — you only ever see files you're authorized to see.

7. (Optional) Connect an AI assistant¶

K-Lake can serve your content to an AI assistant over the Model Context Protocol (MCP), so the assistant answers only from your knowledge base and cites a source for every fact — with the same per-user security and auditing.

Ask your administrator to enable MCP on your instance, then point your client at it. See Connecting MCP clients for Claude Desktop and other clients, and the MCP skills reference for the available tools.

Next steps¶

Sources — protocol-specific configuration and lifecycle
Security trimming — per-file access control
MCP server — grounded AI retrieval over your content
Sizing guide — plan resources as your fleet grows