Introduction¶
KDBL Context Lake (K-Lake) is a Kubernetes-native indexer that catalogs the files in your object stores and filesystems so they can be searched, audited, and reasoned about as a single inventory.
It runs as a small set of services in your cluster, pulls metadata from the sources you configure, stores that metadata in a managed database, and presents it through a web UI, a command-line tool, and a REST API.
What K-Lake indexes¶
A source is one location K-Lake is told to track. Today K-Lake supports:
| Protocol | Typical use |
|---|---|
s3 |
S3-compatible object stores (AWS S3, MinIO, Wasabi, on-prem gateways) |
smb |
SMB / CIFS shares accessed over userspace SMB |
smbfs |
SMB / CIFS shares mounted into the worker via the kernel CIFS client |
nfs |
NFSv3 / NFSv4 exports mounted into the worker |
Each source has its own credentials (encrypted at rest), its own enable/disable flag, and its own metadata-enrichment policy.
Core concepts¶
Tenant. Every source, user, and indexed file belongs to a tenant. Tenants are fully isolated from each other — a user in one tenant cannot see another tenant's sources or files.
Crawl. A crawl walks a source from a root prefix and records what it finds. Crawls can run on demand or be triggered by the API.
File record. For each file K-Lake discovers, it stores a record with path, size, timestamps, content hash (when available), and optional protocol-specific metadata (S3 tags, NTFS ACLs, NFSv4 ACLs, extended attributes).
Metadata caps. Each source declares which kinds of enrichment it wants (tags, ACLs, xattrs). Cheap fields are recorded inline during the crawl; heavier fields are enriched in the background. You can adjust caps per source at any time.
Token. Calls to the API and the CLI are authenticated with a personal access token (PAT) or an OIDC bearer token. PATs are minted from the UI or API and are tied to the user that created them.
What you'll see running¶
A typical K-Lake deployment includes:
- API — serves the UI and answers REST requests
- Web UI — the operator console
- Workers — stateless pods that crawl sources and write metadata
- Database — a managed database used as both the metadata store and the work queue
You manage sources, users, and tokens through any of the three interfaces — UI, CLI, or API. They all act on the same data.
Next steps¶
- Quick start — sign in, add your first source, watch it index
- Sources — protocol-specific configuration
- Sizing guide — plan resources for your fleet size