githubEdit

Advanced Scenarios

This guide covers advanced Mermin deployment scenarios including custom CNI configurations, multi-cluster deployments, high-availability setups, and performance tuning for high-throughput environments.

Custom CNI Configurations

Different Container Network Interfaces (CNIs) create different network interface patterns. Mermin must be configured to monitor the correct interfaces.

Cilium

Cilium uses cilium_* interfaces for pod networking:

discovery "instrument" {
  # Capture both physical and Cilium interfaces
  interfaces = ["eth*", "ens*", "cilium_*"]
}

Considerations:

  • Cilium's eBPF datapath is separate from Mermin's monitoring

  • Monitor physical interfaces for inter-node traffic

  • Monitor cilium_* for intra-node pod-to-pod traffic

  • May see duplicate flows for traffic that crosses nodes

Cilium-specific configuration:

discovery "instrument" {
  # Physical interfaces for inter-node traffic
  interfaces = ["eth*", "ens*"]

  # Add Cilium interfaces only if you need intra-node visibility
  # interfaces = ["eth*", "ens*", "cilium_*"]
}

# Cilium uses its own NetworkPolicies
discovery "informer" "k8s" {
  selectors = [
    { kind = "CiliumNetworkPolicy" },
    { kind = "Pod" },
    { kind = "Service" },
    # ... other resources
  ]
}

Calico

Calico uses cali* interfaces for pod networking:

Considerations:

  • Calico interfaces are califxxxxxxxx format

  • Monitor physical interfaces for most traffic

  • Add cali* for intra-node pod-to-pod visibility

  • Be aware of potential flow duplication

Flannel

Flannel uses CNI bridge interfaces:

Weave Net

Weave Net uses weave interface:

Canal (Flannel + Calico)

Canal combines Flannel for networking and Calico for policies:

Multi-Cluster Deployments

For observability across multiple Kubernetes clusters:

Strategy 1: Cluster-Specific OTLP Endpoints

Deploy Mermin in each cluster with cluster-specific configuration:

Cluster 1 (us-west):

Cluster 2 (eu-west):

Strategy 2: Central OTLP Collector

All clusters send to a central collector:

Strategy 3: Hierarchical Collectors

Regional collectors aggregate to central collector:

Each cluster points to its regional collector, which aggregates and forwards to central.

High-Availability Configurations

OTLP Collector Redundancy

Configure multiple OTLP endpoints for failover:

For true HA, deploy multiple OpenTelemetry Collectors behind a load balancer:

Mermin Agent Resilience

Mermin agents are resilient by design:

  • DaemonSet: Automatically restarts failed pods

  • Node-local: Failure of one agent doesn't affect others

  • Stateless: No data loss on restart (flows are regenerated)

  • Queue-based: Buffers flows during temporary collector outages

Configure aggressive restart policy:

Resource Tuning for High-Throughput Environments

High-Traffic Configuration

For environments with very high network traffic (> 10,000 flows/second), such as public ingress nodes or edge deployments:

Resource allocation:

Low-Latency Configuration

For environments requiring low export latency:

Memory-Constrained Environments

For nodes with limited memory:

Resource limits:

Network Interface Selection Strategies

Inter-Node Traffic Only (Default)

Capture only traffic crossing node boundaries:

Advantages:

  • No flow duplication

  • Lower resource usage

  • Clearer network topology

Limitations:

  • Misses pod-to-pod traffic on same node

  • Misses loopback traffic

Complete Visibility (All Traffic)

Capture all traffic including intra-node:

Advantages:

  • Complete network visibility

  • Captures all pod-to-pod traffic

Limitations:

  • Flow duplication for inter-node traffic

  • Higher resource usage

  • Requires deduplication in backend

Selective Monitoring

Monitor specific interface patterns:

Dynamic Interface Discovery

Use glob patterns that adapt to host configuration:

Performance Monitoring and Tuning

Metrics to Monitor

Expose Mermin metrics to Prometheus:

See Application Metrics for complete metrics documentation and Prometheus query examples.

Key metrics to monitor:

  • mermin_flow_spans_created_total - Total flow spans created

  • mermin_packets_total - Total packets processed

  • mermin_flow_events_total{status="dropped_backpressure"} - Events dropped due to overload

  • mermin_export_flow_spans_total{exporter_type="otlp",status="error"} - OTLP export failures

  • mermin_flow_spans_active_total - Current number of active flows

Tuning Guidelines

If you see packet drops:

The appropriate fix depends on where drops occur in the pipeline:

  1. Worker queue drops (eBPF events dropped before reaching workers):

    • Increase pipeline.ebpf_ringbuf_worker_capacity (per-worker buffer)

    • Increase pipeline.worker_count (more parallel processing)

    • Add more CPU resources

  2. Flow span channel drops (drops between workers and K8s decorator):

    • Increase pipeline.flow_producer_channel_capacity

    • Increase pipeline.k8s_decorator_threads (faster decoration)

  3. Decorated span channel drops (drops between decorator and exporter):

    • Increase pipeline.k8s_decorator_channel_capacity

    • Optimize exporter configuration (larger batches, more concurrent exports)

  4. General recommendations:

    • Reduce monitored interfaces if drops persist

    • Check metrics to identify the specific bottleneck stage

If you see high memory usage:

  1. Decrease flow timeouts

  2. Increase export frequency

  3. Add flow filters to reduce processed flows

  4. Add more memory resources

If you see export errors:

  1. Check collector connectivity

  2. Increase max_queue_size

  3. Increase max_export_timeout

  4. Check collector capacity

Security Hardening

Network Policies

Restrict Mermin's network access:

Pod Security Standards

Apply Pod Security Standards:

Note: Mermin requires privileged policy due to eBPF requirements.

Secrets Management

Use Kubernetes secrets for sensitive configuration:

Mount secrets in pods:

Reference in HCL:

Next Steps

Last updated