Kubernetes Deployment Guide
Target Audience: DevOps Engineers & Kubernetes Administrators
Purpose: Infrastructure prerequisites and sizing guidance for deploying Solace Agent Mesh (SAM) Enterprise in customer-managed Kubernetes environments.
1. Kubernetes Platform Support
SAM is designed to run on standard, CNCF-compliant Kubernetes clusters. While we adhere to open standards, our Quality Assurance (QA) validation focuses on the managed services of major cloud providers.
Supported Versions
We support the three (3) most recent minor versions of upstream Kubernetes.
- For Cloud Managed (EKS, AKS, GKE): We validate against the provider's default release channels.
- For On-Premise (OpenShift, Rancher, etc.): Compatibility is determined by the underlying Kubernetes API version, not the vendor's product version. Ensure your distribution's K8s version falls within the supported upstream window.
Distribution Support Matrix
| Category | Distributions | Support Level |
|---|---|---|
| Validated | • AWS EKS • Azure AKS • Google GKE | Tier 1 Support. We explicitly validate SAM releases against these environments. |
| Compatible | • Red Hat OpenShift • VMware Tanzu (TKG) • SUSE Rancher (RKE2) • Oracle Container Engine (OKE) • Canonical Charmed Kubernetes • Upstream K8s (kubeadm) | Tier 2 Support. SAM is compatible with standard Kubernetes APIs. For distributions with proprietary security constraints (e.g., OpenShift SCCs, Tanzu PSPs), Solace support is limited to confirming API compatibility only. Solace does not provide setup, configuration, or troubleshooting assistance for customer-specific security policies or proprietary features—these remain the customer's responsibility. |
Constraints & Limitations
To prevent deployment failures, ensure your cluster meets the following constraints:
-
Node Architecture: SAM requires Standard Worker Nodes backed by VMs or Bare Metal.
- Not Supported: Serverless or Virtual Nodes (e.g., AWS Fargate, GKE Autopilot, Azure Virtual Nodes) are not supported due to local storage and networking limitations.
-
Security Context:
- SAM containers run as non-root users (UID 999) by default.
- SAM does NOT require
privileged: truecapabilities or root access. - OpenShift Note: You may need to add the service account to the
nonrootSCC if your cluster enforcesrestricted-v2by default.
-
Monitoring:
- SAM does NOT deploy DaemonSets for monitoring.
- Observability/Monitoring is the customer's responsibility.
2. Compute Resource Guidance
SAM workloads utilize a microservices architecture. Resource requirements scale based on the number of concurrent Agents you intend to run.
Processor Architecture Support
SAM container images are built for multi-architecture support. You may provision nodes using either architecture based on your organization's standards:
- ARM64 (Recommended): Offers the best price/performance ratio (e.g., AWS Graviton, Azure Cobalt, Google Axion).
- x86_64 (Intel/AMD): Fully supported for standard deployments.
Recommended Node Sizing
For Production environments, we recommend using latest-generation General Purpose worker nodes to balance CPU and Memory (with a 1:4 ratio).
-
Recommended Specification: 4 vCPU / 16 GB RAM
- ARM Examples: AWS
m8g.xlarge, AzureStandard_D4ps_v6, GCPc4a-standard-4 - x86 Examples: AWS
m8i.xlarge, AzureStandard_D4s_v6, GCPn2-standard-4
- ARM Examples: AWS
-
Minimum Specification: 2 vCPU / 8 GB RAM (Note: smaller nodes will limit agent density).
- ARM Examples: AWS
m8g.large, AzureStandard_D2ps_v6, GCPc4a-standard-2 - x86 Examples: AWS
m8i.large, AzureStandard_D2s_v6, GCPn2-standard-2
- ARM Examples: AWS
Note: For AWS, Azure, and GCP, should any of these instance types be unavailable in your region of choice, we recommend choosing the next closest equivalent (e.g. m7g.large instead of m8g.large).
Component Resource Specifications
To assist with Quota planning and, if in use, Cluster Autoscaler configuration, the following table details the default Resource Requests and Limits for the mandatory core components.
Note: These values represent the application container only. If your environment injects sidecars (e.g., Istio, Dapr, Splunk), ensure you calculate additional overhead to prevent scheduling failures.
| Component | Description | CPU Request | CPU Limit | RAM Request | RAM Limit | QoS Class |
|---|---|---|---|---|---|---|
| Agent Mesh | Includes Core services, Orchestrator Agent, and Web UI Gateway. | 175m | 200m | 625 MiB | 1 GiB | Burstable |
| Deployer | Responsible for dynamically deploying SAM-managed Agents, Gateways, and mesh components. | 100m | 100m | 100 MiB | 100 MiB | Guaranteed |
| Agent | The runtime for a single Agent instance (scales horizontally). | 175m | 200m | 625 MiB | 768 MiB | Burstable |
Custom Mesh Components (Customer-Managed)
For Custom Agents or external components that are not managed/provisioned by the Deployer:
- Responsibility: The customer is responsible for defining the Deployment manifests and resource requirements.
- Sizing: We recommend starting with the
SAM Agentbaseline (175m / 625 MiB) and adjusting based on the specific logic or model inference requirements of your custom code.
Capacity Planning (Per Agent)
When sizing your cluster, budget the following reservations for each concurrent Solace Agent you plan to deploy:
- Memory Request: 625 MiB
- Memory Limit: 768 MiB
- CPU Request: 175m (0.175 vCPU)
- CPU Limit: 200m (0.2 vCPU)
3. Persistence Layer Strategy
SAM requires a relational database (PostgreSQL) and an object store (S3-compatible) to function.
A. Production Deployments (Mandatory)
For production environments, you must provide your own managed external persistence services. Solace does not support running stateful databases inside the SAM cluster for production traffic.
- Database: PostgreSQL 17+ (e.g., AWS RDS, Azure Database for PostgreSQL, Cloud SQL).
- Object Store: S3-Compatible API (e.g., AWS S3, Azure Blob, Google Cloud Storage).
- Configuration: Refer to the Session Storage and Artifact Storage to configure connection strings and secrets for your installation.
- Authentication: Standard Username/Password authentication via Kubernetes Secret is supported.
B. Dev / POC Deployments (Optional Starter Layer)
For convenience, the SAM Helm Quickstart chart includes an optional "Starter Persistence Layer" (Containerized PostgreSQL + SeaweedFS).
- Use Case: Strictly for Evaluation, Development, and Proof of Concept (POC).
- Support Policy: Unsupported. Solace provides these components "as-is" for quick startup. We do not provide patches, backups, or data recovery support for embedded persistence pods.
- Data Persistence: If the pods restart, data is preserved only if a valid StorageClass is configured.
Starter Layer Resource Requirements:
| Component | Description | CPU Request | CPU Limit | RAM Request | RAM Limit | Recommended Volume Size | QoS Class |
|---|---|---|---|---|---|---|---|
| Postgres (Starter) | Embedded database for configuration state (Dev/POC only). | 175m | 175m | 625 MiB | 625 MiB | 30 GiB | Guaranteed |
| SeaweedFS (Starter) | Embedded S3-compatible object storage for artifacts (Dev/POC only). | 175m | 175m | 625 MiB | 625 MiB | 50 GiB | Guaranteed |
Storage Class Recommendations (Starter Layer Only):
If you choose to use the Starter Persistence Layer for development, performance is heavily dependent on the underlying disk I/O. Using slow standard disks (HDD) will cause Agent timeouts.
Warning:
- Default StorageClasses often have
reclaimPolicy: Delete. If you're using SAM Helm Quickstart, uninstalling the Helm release will permanently delete your Dev data. If data persistence is required across re-installs, please configure a StorageClass withreclaimPolicy: Retain.- For managed Kubernetes clusters (e.g., EKS, AKS, GKE), use a
StorageClasswithvolumeBindingMode: WaitForFirstConsumerand ensure the underlying disk is single-zone. This avoids initial scheduling and later re-scheduling failures due to cross-zone volumes.
We recommend using SSD-backed Storage Classes:
| Provider | Recommended StorageClass | Underlying Tech (Disk Type) |
|---|---|---|
| AWS EKS | Any storage class using the gp3 disk type. | EBS General Purpose SSD. EBS volumes are implicitly zoned. |
| Azure AKS | Any storage class that uses zoned SSDs. | Azure Zoned Premium SSD (Premium_LRS). |
| Google GKE | Variable, depends on chosen instance type. | Variable, support depends on instance type. Search for Supported disk types in https://docs.cloud.google.com/compute/docs/general-purpose-machines. Examples: hyperdisk-balanced pd-ssd |
| Generic | Any CSI driver supporting SSD | NVMe / SSD |
Node Pool Topology Recommendations (Starter Layer Only):
In AKS, EKS, and GKE, if nodes are available in more than one availability zone for a region, one node pool (e.g. node group, or provider-specific equivalent) must be provisioned for each availability zone. The simplest approach with the starter layer is to provision node instances for SAM deployments in one availability zone only to avoid this complexity. Official recommendations from cloud providers are as follows:
- AKS: https://learn.microsoft.com/en-us/azure/aks/cluster-autoscaler?tabs=azure-cli#re-enable-the-cluster-autoscaler-on-a-node-pool
- EKS: https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html#managed-node-group-concepts
- GKE: We recommend following the above pattern for simplicity and consistency.
We recommend a similar approach for other cloud providers as applicable. This does not apply when using external persistence solutions (e.g. managed Postgres and S3-compatible storage) as all SAM workloads will be stateless.
5. Network Connectivity & Prerequisites
SAM operates as a connected application mesh. To ensure proper functionality, your network environment must allow specific inbound and outbound traffic flows.
A. Inbound Traffic (Web Gateway)
SAM provisions a Web Gateway service to handle incoming API traffic and UI access.
- Ingress Controller: An Ingress Controller (e.g., NGINX, ALB) must be present in the cluster to route traffic to this Gateway.
- TLS Termination: Production deployments should terminate TLS at the Ingress layer. You must supply your TLS Certificate and Key as a standard Kubernetes Secret and reference it in your Helm
values.yaml.
B. Outbound Platform Access
The core SAM platform requires outbound connectivity to specific infrastructure services.
-
Container Registry:
- Direct Access: Outbound access to
gcr.io/gcp-maas-prod. Requires a Pull Secret obtained from the Solace Cloud Console. - Private Mirror (Air-Gapped): If using a private registry (e.g., Artifactory), you must mirror images from
gcr.ioand configureglobal.imageRegistry.
- Direct Access: Outbound access to
-
Solace Event Broker Access:
- Solace Cloud Event Broker Service: Allow connectivity to
*.messaging.solace.cloudor your specific Solace Cloud region CNAMEs. - Self-Hosted Brokers: Allow traffic to the SMF (55555) or SMF+TLS (55443) ports of your on-premise appliances/software brokers.
- Solace Cloud Event Broker Service: Allow connectivity to
-
LLM Providers:
- The core platform (and Agents) requires access to your configured Model Providers (e.g.,
api.openai.com,your-azure-endpoint.openai.azure.com).
- The core platform (and Agents) requires access to your configured Model Providers (e.g.,
-
Identity Provider (IdP) Access:
- The SAM Control Plane requires outbound network connectivity to your organization's IdP (e.g., Microsoft Entra ID, AWS Cognito, or any SAML/OIDC-compliant provider) for authentication and authorization.
C. Application & Mesh Component Connectivity
Beyond the core platform, the specific Agents and Gateways you deploy will require their own network paths.
- Agent Integrations: If you deploy Agents designed to interact with external enterprise systems (e.g., Salesforce, Jira, Snowflake, Oracle DB), you must ensure the Kubernetes worker nodes have network reachability to these target services.
- Gateway Exposure: If you deploy additional Mesh Gateways for specific domains or protocols, ensure your Ingress configuration allows for the necessary inbound routes, ports, and protocols.
D. Corporate Proxy Configuration
For environments with strict egress filtering, SAM supports routing outbound traffic through a corporate HTTP/HTTPS Proxy.
E. Tooling & Guides
-
Installation Tooling:
- Helm v3 is the recommended installer.
- Alternative: You may use
helm templateto render manifests for directkubectlapplication or integration with GitOps tools (ArgoCD, Flux).
-
Documentation: Please refer to the SAM Kubernetes Deployment Guide for detailed configuration steps regarding the Helm chart, secrets, proxies, and network rules.