githubEdit

Kubernetes with Helm

This guide covers deploying Mermin to a Kubernetes cluster using Helm, the recommended method for production deployments.

Prerequisites

Before you begin, ensure you have:

  • Kubernetes cluster: Version 1.20 or newer, with kubectl configured

  • Helm: Version 3.x installed (installation guidearrow-up-right)

  • Cluster permissions: Ability to create ClusterRole, ClusterRoleBinding, and DaemonSets

  • OTLP endpoint: An OpenTelemetry Collector or compatible backend to receive flows

Installation

Step 1: Add the Helm Repository

circle-info

If installing from a local clone of the Mermin repository, skip this step and use the local chart path instead.

# Add the Mermin Helm repository (when available)
helm repo add mermin https://elastiflow.github.io/mermin
helm repo update

Step 2: Create a Configuration File

Create an HCL configuration file for Mermin. Start with this minimal production configuration:

Replace http://otel-collector:4317 with your actual OTLP collector endpoint.

Step 3: Deploy with Helm

Install Mermin using the Helm chart:

The --wait flag ensures Helm waits for all pods to be ready before returning.

Step 4: Verify the Deployment

Check that Mermin pods are running:

You should see one pod per node:

Check the logs:

Verify health endpoints:

Both should return ok.

Configuration via values.yaml

Alternatively, you can configure Mermin using Helm values. Create a values.yaml file:

Deploy with values file:

Configuration via HCL File

For complex configurations, using a dedicated HCL file is cleaner:

The HCL file takes precedence over inline configuration in values.yaml.

DaemonSet Deployment Pattern

Mermin is deployed as a DaemonSet, which means:

  • Automatic Node Coverage: Every node gets a Mermin pod

  • Node Addition: New nodes automatically get Mermin pods

  • Node Removal: Pods are removed when nodes are drained

  • Rolling Updates: Updates happen one node at a time (configurable)

The DaemonSet spec includes:

This ensures zero downtime during updates, with only one node's Mermin pod down at a time.

Resource Configuration

Set appropriate resource limits based on your traffic:

Low Traffic (< 1,000 flows/second):

Medium Traffic (1,000-10,000 flows/second):

High Traffic (> 10,000 flows/second):

Monitor actual usage via metrics endpoint and adjust accordingly.

Upgrading Mermin

Upgrade Helm Chart and Application

Upgrade Only Configuration

To update just the configuration without changing the version:

With config.restartOnConfigChange: true, pods will restart automatically with new configuration.

Rollback

If an upgrade causes issues, rollback to the previous release:

Uninstalling Mermin

To remove Mermin from your cluster:

This removes all Mermin resources except:

  • Custom resource definitions (if any)

  • Persistent volumes (if any)

  • Namespace (if created by you)

To fully clean up:

Advanced Configuration

Custom Image Repository

Use a private registry:

Node Affinity

Deploy only to specific nodes:

Priority Class

Set pod priority:

Host PID Namespace

Enable process enrichment (requires hostPidEnrichment: true):

This allows Mermin to map network flows to specific processes on the host.

Troubleshooting

Pods Not Starting

Check events:

Common issues:

  • Insufficient privileges: Ensure privileged: true is set

  • Image pull errors: Check imagePullSecrets and registry access

  • Resource limits: Ensure nodes have sufficient CPU/memory

No Flow Traces

Check logs for errors:

Common issues:

  • No matching interfaces: Check discovery.instrument.interfaces configuration

  • eBPF load failure: Ensure kernel version >= 4.18 with eBPF support

  • OTLP connection failure: Verify collector endpoint and network policies

  • TCX pin warnings (kernel >= 6.6): See TCX Mode and BPF Filesystem for mounting /sys/fs/bpf

High Resource Usage

Monitor metrics:

Adjust configuration:

  • Increase flow timeouts to reduce flow table size

  • Decrease batch frequency to reduce CPU

  • Add flow filters to reduce processed flows

See Troubleshooting Guide for more solutions.

Next Steps

Last updated