Introduction¶
KDBL Context Lake (K-Lake) is a Kubernetes-native indexer that catalogs the files in your object stores and filesystems so they can be searched, audited, and reasoned about as a single inventory.
It runs as a small set of services in your cluster, pulls metadata from the sources you configure, stores that metadata in a managed database, and presents it through a web UI, a command-line tool, and a REST API.
What K-Lake indexes¶
A source is one location K-Lake is told to track. Today K-Lake supports:
| Protocol | Typical use |
|---|---|
s3 |
S3-compatible object stores (AWS S3, MinIO, Wasabi, on-prem gateways) |
azblob |
Azure Blob Storage containers (incl. ADLS Gen2 and the Azurite emulator) |
smb |
SMB / CIFS shares accessed over userspace SMB |
smbfs |
SMB / CIFS shares mounted into the worker via the kernel CIFS client |
nfs |
NFSv3 / NFSv4 exports mounted into the worker |
Each source has its own credentials (encrypted at rest), its own enable/disable flag, and its own metadata-enrichment policy.
Core concepts¶
Tenant. Every source, user, and indexed file belongs to a tenant. Tenants are fully isolated from each other — a user in one tenant cannot see another tenant's sources or files.
Crawl. A crawl walks a source from a root prefix and records what it finds. Crawls can run on demand or be triggered by the API.
File record. For each file K-Lake discovers, it stores a record with path, size, timestamps, content hash (when available), and optional protocol-specific metadata (S3 tags, Azure blob tags / content-type, NTFS ACLs, NFSv4 ACLs, extended attributes).
Metadata caps. Each source declares which kinds of enrichment it wants (tags, ACLs, xattrs). Cheap fields are recorded inline during the crawl; heavier fields are enriched in the background. You can adjust caps per source at any time.
Token. Calls to the API and the CLI are authenticated with a personal access token (PAT) or an OIDC bearer token. PATs are minted from the UI or API and are tied to the user that created them.
What you'll see running¶
A typical K-Lake deployment includes:
- API — serves the UI and answers REST requests
- Web UI — the operator console
- Workers — stateless pods that crawl sources and write metadata
- Database — a managed database used as both the metadata store and the work queue
You manage sources, users, and tokens through any of the three interfaces — UI, CLI, or API. They all act on the same data.
Next steps¶
- Quick start — sign in, add your first source, watch it index
- Sources — protocol-specific configuration
- Sizing guide — plan resources for your fleet size