Skip to content

Introduction

KDBL Context Lake (K-Lake) is a Kubernetes-native indexer that catalogs the files in your object stores and filesystems so they can be searched, audited, and reasoned about as a single inventory.

It runs as a small set of services in your cluster, pulls metadata from the sources you configure, stores that metadata in a managed database, and presents it through a web UI, a command-line tool, and a REST API.

What K-Lake indexes

A source is one location K-Lake is told to track. Today K-Lake supports:

Protocol Typical use
s3 S3-compatible object stores (AWS S3, MinIO, Wasabi, on-prem gateways)
azblob Azure Blob Storage containers (incl. ADLS Gen2 and the Azurite emulator)
smb SMB / CIFS shares accessed over userspace SMB
smbfs SMB / CIFS shares mounted into the worker via the kernel CIFS client
nfs NFSv3 / NFSv4 exports mounted into the worker

Each source has its own credentials (encrypted at rest), its own enable/disable flag, and its own metadata-enrichment policy.

Core concepts

Tenant. Every source, user, and indexed file belongs to a tenant. Tenants are fully isolated from each other — a user in one tenant cannot see another tenant's sources or files.

Crawl. A crawl walks a source from a root prefix and records what it finds. Crawls can run on demand or be triggered by the API.

File record. For each file K-Lake discovers, it stores a record with path, size, timestamps, content hash (when available), and optional protocol-specific metadata (S3 tags, Azure blob tags / content-type, NTFS ACLs, NFSv4 ACLs, extended attributes).

Metadata caps. Each source declares which kinds of enrichment it wants (tags, ACLs, xattrs). Cheap fields are recorded inline during the crawl; heavier fields are enriched in the background. You can adjust caps per source at any time.

Token. Calls to the API and the CLI are authenticated with a personal access token (PAT) or an OIDC bearer token. PATs are minted from the UI or API and are tied to the user that created them.

What you'll see running

A typical K-Lake deployment includes:

  • API — serves the UI and answers REST requests
  • Web UI — the operator console
  • Workers — stateless pods that crawl sources and write metadata
  • Database — a managed database used as both the metadata store and the work queue

You manage sources, users, and tokens through any of the three interfaces — UI, CLI, or API. They all act on the same data.

Next steps

  • Quick start — sign in, add your first source, watch it index
  • Sources — protocol-specific configuration
  • Sizing guide — plan resources for your fleet size