Introduction¶

KDBL Context Lake (K-Lake) is a Kubernetes-native indexer that catalogs the files in your object stores and filesystems so they can be searched, audited, and reasoned about as a single inventory.

It runs as a small set of services in your cluster, pulls metadata from the sources you configure, stores that metadata in a managed database, and presents it through a web UI, a command-line tool, and a REST API.

What K-Lake indexes¶

A source is one location K-Lake is told to track. Today K-Lake supports:

Protocol	Typical use
`s3`	S3-compatible object stores (AWS S3, MinIO, Wasabi, on-prem gateways)
`azblob`	Azure Blob Storage containers (incl. ADLS Gen2 and the Azurite emulator)
`smb`	SMB / CIFS shares accessed over userspace SMB
`smbfs`	SMB / CIFS shares mounted into the worker via the kernel CIFS client
`nfs`	NFSv3 / NFSv4 exports mounted into the worker

Each source has its own credentials (encrypted at rest), its own enable/disable flag, and its own metadata-enrichment policy.

Core concepts¶

Tenant. Every source, user, and indexed file belongs to a tenant. Tenants are fully isolated from each other — a user in one tenant cannot see another tenant's sources or files.

Crawl. A crawl walks a source from a root prefix and records what it finds. Crawls can run on demand or be triggered by the API.

File record. For each file K-Lake discovers, it stores a record with path, size, timestamps, content hash (when available), and optional protocol-specific metadata (S3 tags, Azure blob tags / content-type, NTFS ACLs, NFSv4 ACLs, extended attributes).

Metadata caps. Each source declares which kinds of enrichment it wants (tags, ACLs, xattrs). Cheap fields are recorded inline during the crawl; heavier fields are enriched in the background. You can adjust caps per source at any time.

Token. Calls to the API and the CLI are authenticated with a personal access token (PAT) or an OIDC bearer token. PATs are minted from the UI or API and are tied to the user that created them.

What you'll see running¶

A typical K-Lake deployment includes:

API — serves the UI and answers REST requests
Web UI — the operator console
Workers — stateless pods that crawl sources and write metadata
Database — a managed database used as both the metadata store and the work queue

You manage sources, users, and tokens through any of the three interfaces — UI, CLI, or API. They all act on the same data.

Next steps¶

Quick start — sign in, add your first source, watch it index
Sources — protocol-specific configuration
Sizing guide — plan resources for your fleet size