Sizing guide¶
KDBL Context Lake (K-Lake) scales horizontally. Workers are stateless — when you need more crawl throughput, add worker replicas. The database is the floor on how fast metadata can be persisted, so size it for your peak ingest rate, not your steady state.
This page gives starting points. Tune from there based on the metrics in Telemetry.
Defaults¶
The shipped manifests set conservative requests with headroom in the limits.
Worker¶
| Request | Limit | |
|---|---|---|
| CPU | 1 core | 4 cores |
| Memory | 512 Mi | 2 Gi |
Each worker handles many in-flight crawl tasks concurrently. Increase replicas to increase throughput; the workers coordinate through the work queue and will not duplicate work.
API¶
| Request | Limit | |
|---|---|---|
| CPU | 100 m | 1 core |
| Memory | 128 Mi | 512 Mi |
The API is mostly thin — it reads and writes the database on behalf of the UI and CLI. Two replicas is a sensible default for availability; scale up only if you push it with heavy programmatic API traffic.
UI¶
| Request | Limit | |
|---|---|---|
| CPU | 50 m | 250 m |
| Memory | 32 Mi | 128 Mi |
The UI is static assets served by a lightweight web server. One or two replicas is enough.
Database¶
K-Lake persists everything in a managed database. Recommended starting points:
- CPU: 2 cores, scale up under heavy ingest
- Memory: at least 8 Gi for ingests above a few million files
- Disk: provision for the eventual file count — figure tens of bytes per file record, plus indexes
A connection pooler is recommended in front of the database if you run more than a handful of workers.
When to scale up¶
| Symptom | Action |
|---|---|
Queue depth (kdbl_queue_depth{state="pending"}) sustained high |
Add worker replicas |
| Worker CPU at limit, queue still growing | Bump worker CPU limit, then add replicas |
| Database CPU pinned, queue stable | Scale the database vertically |
| API responses slow under heavy CLI/API use | Add API replicas |
| OOMKills on workers during very wide directory listings | Bump worker memory limit |
Source-level tuning¶
Two source-level toggles affect throughput. Both are exposed via the UI, the CLI (source bulk-ingest, source meta-caps), and the API.
- Bulk ingest — defaults to on. Optimizes the write path for first-time crawls and large catch-up runs. Leave on unless you know you're doing many small incremental updates and have benchmarked the alternative.
- Metadata caps — controls which optional enrichments (S3 tags, NTFS / NFSv4 ACLs, xattrs) are gathered. Enabling more enrichment costs more crawl time and storage. Start narrow, widen as needed.
Estimating headroom¶
A worker pod can sustain steady-state ingest from a single source at network-bound rates for most object stores and NAS protocols. Real throughput depends heavily on:
- Object size distribution — many small files is harder than fewer large ones
- Source latency — same-region S3 is much faster than a remote SMB share
- Whether metadata enrichment is enabled
Plan for capacity using a representative source: run a crawl on it, watch the metrics, and extrapolate.