Kubernetes with Helm

This guide covers deploying Mermin to a Kubernetes cluster using Helm, the recommended method for production deployments.

Prerequisites

Before you begin, ensure you have:

Kubernetes cluster: Version 1.20 or newer, with kubectl configured
Helm: Version 3.x installed (installation guide)
Cluster permissions: Ability to create ClusterRole, ClusterRoleBinding, and DaemonSets
OTLP endpoint: An OpenTelemetry Collector or compatible backend to receive flows

Installation

Step 1: Add the Helm Repository

If installing from a local clone of the Mermin repository, skip this step and use the local chart path instead.

# Add the Mermin Helm repository (when available)
helm repo add mermin https://elastiflow.github.io/mermin
helm repo update

Step 2: Create a Configuration File

Create an HCL configuration file for Mermin. Start with this minimal production configuration:

# mermin-config.hcl

# Logging level
log_level = "info"

# Network interfaces to monitor
discovery "instrument" {
  interfaces = ["eth*", "ens*"]
}

# Configure Kubernetes informer and resources to watch
discovery "informer" "k8s" {
  # K8s API connection configuration
  informers_sync_timeout = "30s"   # Sync timeout for initial load

  selectors = [
    { kind = "Service" },
    { kind = "Endpoint" },
    { kind = "EndpointSlice" },
    { kind = "Pod" },
    { kind = "ReplicaSet" },
    { kind = "Deployment" },
    { kind = "DaemonSet" },
    { kind = "StatefulSet" },
    { kind = "Job" },
    { kind = "CronJob" },
  ]

  # Owner reference walking configuration
  owner_relations = {
    max_depth = 5
    include_kinds = []  # Empty = include all
    exclude_kinds = []
  }
}

# Flow attributes configuration (source)
attributes "source" "k8s" {
  extract {
    metadata = [
      "[*].metadata.name",
      "[*].metadata.namespace",
      "[*].metadata.uid",
    ]
  }

  association {
    pod = {
      sources = [
        {
          from = "flow", name = "source.ip",
          to = ["status.podIP", "status.podIPs[*]"]
        }
      ]
    }
    service = {
      sources = [
        {
          from = "flow", name = "source.ip",
          to = ["spec.clusterIP", "spec.clusterIPs[*]", "spec.externalIPs[*]"]
        },
        { from = "flow", name = "source.port", to = ["spec.ports[*].port"] }
      ]
    }
  }
}

# Flow attributes configuration (destination)
attributes "destination" "k8s" {
  extract {
    metadata = [
      "[*].metadata.name",
      "[*].metadata.namespace",
      "[*].metadata.uid",
    ]
  }

  association {
    pod = {
      sources = [
        {
          from = "flow", name = "destination.ip",
          to = ["status.podIP", "status.podIPs[*]"]
        }
      ]
    }
    service = {
      sources = [
        {
          from = "flow", name = "destination.ip",
          to = ["spec.clusterIP", "spec.clusterIPs[*]", "spec.externalIPs[*]"]
        },
        { from = "flow", name = "destination.port", to = ["spec.ports[*].port"] }
      ]
    }
  }
}

# Flow span configuration
span {
  max_record_interval = "60s"
  generic_timeout = "30s"
  icmp_timeout = "10s"
  tcp_timeout = "20s"
  tcp_fin_timeout = "5s"
  tcp_rst_timeout = "5s"
  udp_timeout = "60s"
  community_id_seed = 0
}

# OTLP exporter configuration
export "traces" {
  otlp = {
    endpoint = "http://otel-collector:4317"
    protocol = "grpc"
    timeout = "10s"
    max_batch_size = 512
    max_batch_interval = "5s"
    max_queue_size = 32768
    max_concurrent_exports = 1
    max_export_timeout = "10s"
  }
}

# HTTP server configuration (health checks)
internal "server" {
  enabled = true
  listen_address = "0.0.0.0"
  port = 8080
}

# Metrics server configuration (Prometheus)
internal "metrics" {
  enabled = true
  listen_address = "0.0.0.0"
  port = 10250
}

Replace http://otel-collector:4317 with your actual OTLP collector endpoint.

Step 3: Deploy with Helm

Install Mermin using the Helm chart:

# Using remote chart (when available)
helm install mermin mermin/mermin \
  --set-file config.content=mermin-config.hcl \
  --wait \
  --timeout 10m

# Or using local chart from repository
helm install mermin ./charts/mermin \
  --set-file config.content=mermin-config.hcl \
  --wait \
  --timeout 10m

The --wait flag ensures Helm waits for all pods to be ready before returning.

Step 4: Verify the Deployment

Check that Mermin pods are running:

kubectl get pods -l app.kubernetes.io/name=mermin

You should see one pod per node:

NAME           READY   STATUS    RESTARTS   AGE
mermin-abc123  1/1     Running   0          2m
mermin-def456  1/1     Running   0          2m
mermin-ghi789  1/1     Running   0          2m

Check the logs:

kubectl logs -l app.kubernetes.io/name=mermin --tail=50

Verify health endpoints:

POD=$(kubectl get pod -l app.kubernetes.io/name=mermin -o jsonpath='{.items[0].metadata.name}')
kubectl exec $POD -- wget -q -O- http://localhost:8080/livez
kubectl exec $POD -- wget -q -O- http://localhost:8080/readyz

Both should return ok.

Configuration via values.yaml

Alternatively, you can configure Mermin using Helm values. Create a values.yaml file:

# values.yaml

image:
  repository: ghcr.io/elastiflow/mermin
  tag: "latest"
  pullPolicy: IfNotPresent

# Resource limits
resources:
  limits:
    cpu: 1
    memory: 512Mi
  requests:
    cpu: 500m
    memory: 256Mi

# Configuration
config:
  # Restart pods when config changes
  restartOnConfigChange: true

  # Enable host PID namespace for process enrichment
  hostPidEnrichment: true

  # Inline HCL configuration
  content: |
    log_level = "info"

    discovery "instrument" {
      interfaces = ["eth*", "ens*"]
    }

    export "traces" {
      otlp = {
        endpoint = "http://otel-collector:4317"
        protocol = "grpc"
      }
    }

# Tolerations for scheduling
tolerations:
  - effect: NoSchedule
    operator: Exists

# Node selector
nodeSelector: {}
  # Example: Only deploy to specific nodes
  # node-role.kubernetes.io/worker: "true"

# Pod annotations
podAnnotations: {}
  # Example: Prometheus scraping
  # prometheus.io/scrape: "true"
  # prometheus.io/port: "10250"

# Service account
serviceAccount:
  create: true
  annotations: {}
  name: ""

Deploy with values file:

helm install mermin mermin/mermin -f values.yaml --wait

Configuration via HCL File

For complex configurations, using a dedicated HCL file is cleaner:

helm install mermin mermin/mermin \
  --set-file config.content=mermin-config.hcl \
  -f values.yaml \
  --wait

The HCL file takes precedence over inline configuration in values.yaml.

DaemonSet Deployment Pattern

Mermin is deployed as a DaemonSet, which means:

Automatic Node Coverage: Every node gets a Mermin pod
Node Addition: New nodes automatically get Mermin pods
Node Removal: Pods are removed when nodes are drained
Rolling Updates: Updates happen one node at a time (configurable)

The DaemonSet spec includes:

updateStrategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 1  # Update one node at a time
    maxSurge: 0

This ensures zero downtime during updates, with only one node's Mermin pod down at a time.

Resource Configuration

Set appropriate resource limits based on your traffic:

Low Traffic (< 1,000 flows/second):

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 256Mi

Medium Traffic (1,000-10,000 flows/second):

resources:
  requests:
    cpu: 500m
    memory: 256Mi
  limits:
    cpu: 1
    memory: 512Mi

High Traffic (> 10,000 flows/second):

resources:
  requests:
    cpu: 1
    memory: 512Mi
  limits:
    cpu: 2
    memory: 1Gi

Monitor actual usage via metrics endpoint and adjust accordingly.

Upgrading Mermin

Upgrade Helm Chart and Application

# Update Helm repository
helm repo update

# Upgrade to latest version
helm upgrade mermin mermin/mermin \
  --set-file config.content=mermin-config.hcl \
  --wait \
  --timeout 10m

# Or upgrade from local chart
helm upgrade mermin ./charts/mermin \
  --set-file config.content=mermin-config.hcl \
  --wait

Upgrade Only Configuration

To update just the configuration without changing the version:

helm upgrade mermin mermin/mermin \
  --reuse-values \
  --set-file config.content=mermin-config.hcl

With config.restartOnConfigChange: true, pods will restart automatically with new configuration.

Rollback

If an upgrade causes issues, rollback to the previous release:

# View release history
helm history mermin

# Rollback to previous version
helm rollback mermin

# Rollback to specific revision
helm rollback mermin 3

Uninstalling Mermin

To remove Mermin from your cluster:

helm uninstall mermin

This removes all Mermin resources except:

Custom resource definitions (if any)
Persistent volumes (if any)
Namespace (if created by you)

To fully clean up:

# Remove any leftover resources
kubectl delete clusterrole mermin
kubectl delete clusterrolebinding mermin
kubectl delete serviceaccount mermin -n default

Advanced Configuration

Custom Image Repository

Use a private registry:

image:
  repository: my-registry.com/mermin
  tag: "v1.0.0"
  pullPolicy: Always

imagePullSecrets:
  - name: my-registry-secret

Node Affinity

Deploy only to specific nodes:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: node-role.kubernetes.io/worker
              operator: In
              values:
                - "true"

Priority Class

Set pod priority:

priorityClassName: system-node-critical

Host PID Namespace

Enable process enrichment (requires hostPidEnrichment: true):

config:
  hostPidEnrichment: true

This allows Mermin to map network flows to specific processes on the host.

Troubleshooting

Pods Not Starting

Check events:

kubectl describe pod <pod-name>

Common issues:

Insufficient privileges: Ensure privileged: true is set
Image pull errors: Check imagePullSecrets and registry access
Resource limits: Ensure nodes have sufficient CPU/memory

No Flow Traces

Check logs for errors:

kubectl logs <pod-name> | grep -i error

Common issues:

No matching interfaces: Check discovery.instrument.interfaces configuration
eBPF load failure: Ensure kernel version >= 5.14 with eBPF and BTF support
OTLP connection failure: Verify collector endpoint and network policies
TCX pin warnings (kernel >= 6.6): See TCX Mode and BPF Filesystem for mounting /sys/fs/bpf

High Resource Usage

Monitor metrics:

kubectl port-forward <pod-name> 10250:10250
curl http://localhost:10250/metrics

Adjust configuration:

Increase flow timeouts to reduce flow table size
Decrease batch frequency to reduce CPU
Add flow filters to reduce processed flows

See Troubleshooting Guide for more solutions.

Next Steps

Secure Your OTLP Connection: Set up TLS certificates and authentication
Connect to Your Backend: Send Flow Traces to Grafana, Elastic, or Jaeger

Need Help?

Troubleshoot Deployment Issues: Resolve pod startup and permission errors
GitHub Discussions: Ask questions and share experiences

Last updated 1 month ago

hashtagPrerequisites

hashtagInstallation

hashtagStep 1: Add the Helm Repository

hashtagStep 2: Create a Configuration File

hashtagStep 3: Deploy with Helm

hashtagStep 4: Verify the Deployment

hashtagConfiguration via values.yaml

hashtagConfiguration via HCL File

hashtagDaemonSet Deployment Pattern

hashtagResource Configuration

hashtagUpgrading Mermin

hashtagUpgrade Helm Chart and Application

hashtagUpgrade Only Configuration

hashtagRollback

hashtagUninstalling Mermin

hashtagAdvanced Configuration

hashtagCustom Image Repository

hashtagNode Affinity

hashtagPriority Class

hashtagHost PID Namespace

hashtagTroubleshooting

hashtagPods Not Starting

hashtagNo Flow Traces

hashtagHigh Resource Usage

hashtagNext Steps

hashtagNeed Help?