Skip to content

Security overview

This page is written for security, risk, and compliance teams evaluating KDBL Context Lake (K-Lake). It explains the trust model, how isolation and access control are enforced, and the controls that protect data in transit, at rest, and in use.

At a glance

  • Self-hosted in your Kubernetes cluster — data never leaves your boundary, and the product does not phone home.
  • Tenant isolation and per-file access control are enforced in the database with native row-level security (RLS), not in application code.
  • Memory-safe core written in Rust, eliminating entire classes of memory-corruption vulnerabilities.
  • Identity-first: every caller is a named principal authenticated via your own identity provider (OIDC / OAuth 2.1).
  • Auditable: every operation is logged with the principal, tenant, and resource for observability and forensics.
  • Continuously scanned: a rolling daily security report publishes vulnerability findings across all shipped container images.

Security architecture

K-Lake runs entirely inside your infrastructure. The only components are the API, the web console, a fleet of stateless workers, the MCP server (optional), and a managed database. Your identity provider and your content sources remain under your control; K-Lake reads from sources read-only and federates sign-in to your IdP.

flowchart TB
    subgraph client["Client zone"]
        U["Operators and AI assistants"]
    end

    subgraph cluster["Your Kubernetes cluster — primary trust boundary"]
        direction TB
        IN["Ingress / TLS termination"]
        API["K-Lake API"]
        WC["Web console"]
        MCP["MCP server<br/>(OAuth 2.1)"]
        W["Workers<br/>(stateless)"]
        subgraph data["Data layer"]
            DB[("Managed database<br/>PostgreSQL · RLS")]
            Q[["Work queue<br/>(task references only)"]]
        end
    end

    subgraph ext["External — customer-controlled"]
        IDP["Identity provider<br/>OIDC / OAuth / SAML"]
        SRC[("Content sources<br/>S3 / SMB / NFS")]
    end

    U -->|HTTPS| IN
    IN --> API
    IN --> WC
    IN --> MCP
    API --> DB
    MCP --> DB
    W --> DB
    API --> Q
    W --> Q
    W -->|read-only| SRC
    API -. "federated sign-in" .-> IDP
    MCP -. "token validation" .-> IDP

Trust boundaries. The cluster is the primary trust boundary. All ingress is TLS-terminated. The database is the policy decision and enforcement point — no component can read tenant or file data without satisfying the database's row-level security policies. The work queue holds only task references (source ids, paths), never file content or credentials.


Improvements on prior approaches

K-Lake was designed to close gaps that are common in conventional indexing and enterprise-search products.

Concern Conventional approach K-Lake approach
Access control Filtering applied in application code — a single missed WHERE clause leaks data Row-level security in the database — enforced on every query, fail-closed by default
Multi-tenancy Shared tables with app-side tenant filters Tenant isolation enforced by RLS in the database
Memory safety C/C++ connectors prone to buffer overflows and use-after-free Rust — memory-safe by construction, no garbage collector
Identity Bolt-on auth or shared service accounts Native per-user OIDC / OAuth 2.1, federated to your IdP
AI access Bulk export of data to a model MCP with per-user trimming and citations — data stays in your cluster
Deployment SaaS; data leaves your boundary Self-hosted in your cluster; air-gap capable; no phone-home

Memory-safe foundation (Rust)

The core services and all source connectors are written in Rust. This is a deliberate security decision:

  • Memory safety by construction. Rust's ownership model eliminates buffer overflows, use-after-free, and data races at compile time — the vulnerability classes behind a large share of CVEs in systems software.
  • No garbage collector, so performance is predictable and there is no GC-pause attack surface; resource cleanup is deterministic.
  • Strong typing and exhaustive error handling reduce the logic errors that lead to unsafe states.

The result is a small, statically-analysable codebase with a reduced exploitable surface compared to memory-unsafe alternatives.


Multi-tenant isolation

Every source, user, token, and indexed file belongs to exactly one tenant. Isolation is not a matter of careful application coding — it is enforced by the database itself.

flowchart TB
    T1["Tenant A request<br/>(authenticated principal)"]
    T2["Tenant B request<br/>(authenticated principal)"]
    RLS{{"Row-level security<br/>enforced in PostgreSQL"}}
    T1 --> RLS
    T2 --> RLS
    RLS -->|"tenant = A"| DA[("Tenant A rows only")]
    RLS -->|"tenant = B"| DBB[("Tenant B rows only")]
    RLS -. "cross-tenant access" .-> X["Denied by default"]

Each request runs in a database session bound to its authenticated tenant and principal. RLS policies attached to every table restrict visibility to that tenant's rows. A bug in application code cannot widen this boundary, because the boundary lives below the application, in the database.


Authorization & security trimming

Beyond tenant isolation, K-Lake enforces per-file access control. A caller sees a file only if the file's source-native permissions (NTFS / NFSv4 ACLs, POSIX ownership, S3 policy) grant their identity access. These native principals are correlated to your IdP identities by directory enrichment, then enforced — again — by row-level security at the database layer.

sequenceDiagram
    participant C as "Client (user / AI)"
    participant API as "K-Lake API / MCP"
    participant IDP as "Identity provider"
    participant DB as "Database (RLS)"

    C->>API: Request + OIDC bearer token (or PAT)
    API->>IDP: Validate token, resolve identity
    IDP-->>API: Verified principal + group memberships
    API->>DB: Query in a session bound to tenant + principal
    Note over DB: RLS policies filter by<br/>tenant AND per-file ACL grants
    DB-->>API: Only rows the principal is authorized to see
    API-->>C: Authorized results (every access audited)

Because trimming is enforced where the data lives, the same guarantee applies uniformly across the web console, CLI, REST API, and AI access over MCP. There is no "admin" code path that bypasses it. See Per-file security trimming and Directory enrichment for detail.

Why database-layer enforcement is stronger

Application-layer filtering must be applied correctly in every code path that touches data; one oversight leaks records. Database-layer RLS inverts this: access is denied by default and granted only by an explicit policy, so new code paths inherit the controls automatically and fail closed.


Identity & access management

  • Federated sign-in. Users authenticate through your own identity provider via OIDC / OAuth 2.1 (and SAML where applicable). K-Lake does not become a parallel identity store; it consumes verified identities from providers such as Microsoft Entra ID, Okta, Google, or Keycloak.
  • Personal access tokens (PATs). For programmatic and CLI use, scoped PATs are minted per user, shown once, and revocable.
  • AI access over MCP. The MCP server is a full OAuth 2.1 authorization server (per the relevant RFCs for protected resources and audience restriction). AI assistants connect as the individual user, inherit that user's per-file trimming, and receive a citation for every fact — so answers are grounded and verifiable, and no data is exported to the model provider in bulk.

See MCP server and Connect your AI.


Encryption

Layer Control
In transit TLS for all client-to-service traffic; TLS for service-to-database connections.
At rest — secrets Source credentials, IdP client secrets, and tokens are encrypted with authenticated encryption (AEAD) before they reach the database, using a master key held outside the API process.
At rest — datastore Deployable on encrypted volumes for full at-rest encryption of the managed database and its backups.
Key handling Encryption keys are supplied via Kubernetes Secrets or an external secret manager; certain operations require the master key to be held by the worker rather than the API, narrowing exposure.

Credentials are never written to logs or shell history (secrets are read from stdin or the environment, never command-line flags).


Auditability & observability

  • Audit trail. Every privileged operation — sign-in, source change, crawl, search, file open, token mint, and AI query — is recorded with the acting principal, tenant, and resource, enabling forensic review and access reporting.
  • Structured logs & metrics. Services emit structured logs and metrics in standard formats, so they integrate with your existing SIEM and monitoring stack (e.g. Prometheus-compatible metrics, log shippers to your aggregator). See Telemetry.
  • Verifiable retrieval. Every search result and AI citation links back to the original file, re-checking the caller's permissions at open time — so an auditor can confirm exactly where any answer came from.

Supply-chain & vulnerability management

  • Daily security report. K-Lake publishes a rolling daily security report with automated vulnerability scan results across every shipped container image.
  • Signed images. Release images are cryptographically signed so you can verify provenance and integrity before deployment.
  • Minimal images. Containers ship only what each service needs, reducing the attack surface and patch burden.

Deployment & data residency

  • Runs in your cluster. K-Lake deploys into your own Kubernetes environment; all content and metadata stay within your boundary.
  • Air-gap capable. K-Lake can run with no internet egress — including fully on-prem AI retrieval over MCP — so regulated and disconnected environments are first-class. See the air-gapped demo.
  • Offline licensing. Licensing is enforced offline; there is no license-server call-home.

Data protection & compliance

KDBL Consulting is registered with the UK Information Commissioner's Office (ICO) under registration ICO:000014563614. Because K-Lake is self-hosted in your own infrastructure, your content and metadata remain under your control and within your data-residency boundary — KDBL does not process or receive your indexed data.

Reporting a vulnerability

If you believe you have found a security issue, please email security@kdbl.co.uk or contact your KDBL representative. Please do not disclose publicly until we have confirmed a fix.