Architecture

Introduction to

The () provides an enterprise-grade Kubernetes-based platform that enables organizations to build, deploy, and manage applications consistently across hybrid and multi-cloud environments. integrates core Kubernetes capabilities with enhanced management, observability, and security services, offering a unified control plane and flexible workload clusters.

The architecture follows a hub-and-spoke model, consisting of a global cluster and multiple workload clusters. This design provides centralized governance while allowing independent workload execution and scalability.

For canonical definitions of platform-wide terms such as global cluster, workload cluster, and cluster plugin, see Glossary.

Core Architectural Components

Global Cluster

The global cluster serves as the centralized management and control hub of . It provides platform-wide services such as authentication, policy management, cluster lifecycle operations, and observability. It's also a central hub for multi-cluster management and provides cross-cluster functionality.

Key components include:

  • Gateway Acts as the main entry point to the platform. It manages API requests from the UI, CLI (kubectl), and automation tools, routing them to appropriate backend services.
  • Authentication and Authorization (Auth) Integrates with external Identity Providers (IdPs) to provide Single Sign-On (SSO) and RBAC-based access control.
  • Web Console Provides a web-based interface for . It interfaces with platform APIs through the gateway.
  • Cluster Management Handles the registration, provisioning, and lifecycle management of workload clusters.
  • Services
  • Operator Lifecycle Manager (OLM) and Cluster Plugins Manages the installation, updates, and lifecycle of operators and cluster extensions.
  • Internal Image Registry Offers an out-of-box integrated container image repository with role-based access.
  • Observability Provides centralized logging, metrics, and tracing for both the global and workload clusters.
  • Cluster Proxy Enables secure communication between the global cluster and workload clusters.

Workload Cluster

Workload clusters are Kubernetes-based environments managed by the global cluster. Each workload cluster runs isolated application workloads and inherits governance and configuration from the central control plane.

External Integrations

  • Identity Provider (IdP) Supports federated authentication via standard protocols (OIDC, SAML) for unified user management.
  • API and CLI Access Users can interact with through RESTful APIs, the web console, or command-line tools like kubectl and ac.
  • Load Balancer (VIP/DNS/SLB) Provides high availability and traffic distribution to the Gateway and ingress endpoints of the global and workload Clusters.

Scalability and High Availability

is designed for horizontal scalability and high availability:

  • Each component can be deployed redundantly to eliminate single points of failure.
  • The global cluster supports managing dozens to hundreds of workload clusters.
  • Workload clusters can scale independently according to workload demand.
  • The use of VIP/DNS/Ingress ensures seamless routing and failover.

Functional Perspective

()'s complete functionality consists of Core and extensions based on two technical stacks: Operator and Cluster Plugin.

  • Core

    The minimal deliverable unit of , providing core capabilities such as cluster management, container orchestration, projects, and user administration.

    • Meets the highest security standards
    • Delivers maximum stability
    • Offers the longest support lifecycle
  • Extensions

    Extensions in both the Operator and Cluster Plugin stacks can be classified into:

    • Aligned – Life cycle strategy consisting of multiple maintenance streams, with alignment to .
    • Agnostic – Life cycle strategy consisting of multiple maintenance streams, released independently from .

    For more details about extensions, see Extend.

Technical Perspective

Platform Component Runtime All platform components run as containers within a Kubernetes management cluster (the global cluster).

High Availability Architecture

  • The global cluster typically consists of at least three control plane nodes and multiple worker nodes
  • High availability of etcd is central to cluster HA; see Key Component High Availability Mechanisms for details
  • Load balancing can be provided by an external load balancer or a self-built VIP inside the cluster

Request Routing

  • Client requests first pass through the load balancer or self-built VIP
  • Requests are forwarded to ALB (the platform's default Kubernetes Ingress Gateway) running on designated ingress nodes (or control-plane nodes if configured)
  • ALB routes traffic to the target component pods according to configured rules

Replica Strategy

  • Core components run with at least two replicas
  • Key components (such as registry, MinIO, ALB) run with three replicas

Fault Tolerance & Self-healing

  • Achieved through cooperation between kubelet, kube-controller-manager, kube-scheduler, kube-proxy, ALB, and other components
  • Includes health checks, failover, and traffic redirection

Data Storage & Recovery

  • Control-plane configuration and platform state are stored in etcd as Kubernetes resources
  • In catastrophic failures, recovery can be performed from etcd snapshots

Primary / Standby Disaster Recovery

  • Two separate global clusters: Primary Cluster and Standby Cluster
  • The disaster recovery mechanism is based on real-time synchronization of etcd data from the Primary Cluster to the Standby Cluster.
  • If the Primary Cluster becomes unavailable due to a failure, services can quickly switch to the Standby Cluster.

Key Component High Availability Mechanisms

etcd

  • Deployed on three (or five) control plane nodes
  • Uses the RAFT protocol for leader election and data replication
  • Three-node deployments tolerate up to one node failure; five-node deployments tolerate up to two
  • Supports local and remote S3 snapshot backups

Monitoring Components

  • Prometheus: Multiple instances, deduplication with Thanos Query, and cross-region redundancy
  • VictoriaMetrics: Cluster mode with distributed VMStorage, VMInsert, and VMSelect components

Logging Components

  • Nevermore collects logs and audit data
  • Kafka / Elasticsearch / Razor / Lanaya are deployed in distributed and multi-replica modes

Networking Components (CNI)

  • Kube-OVN / Calico / Flannel: Achieve HA via stateless DaemonSets or triple-replica control plane components

ALB

  • Operator deployed with three replicas, leader election enabled
  • Instance-level health checks and load balancing

Self-built VIP

  • High-availability virtual IP based on Keepalived
  • Supports heartbeat detection and active-standby failover

Harbor

  • ALB-based load balancing
  • PostgreSQL with Patroni HA
  • Redis Sentinel mode
  • Stateless services deployed in multiple replicas

Registry and MinIO

  • Registry deployed with three replicas
  • MinIO in distributed mode with erasure coding, data redundancy, and automatic recovery