githubEdit

Troubleshooting Overview

Diagnose and resolve common issues when deploying and operating Mermin.

Quick Diagnostic Checklist

Start with these quick checks to identify issues:

  1. Pod Status: Check if pods are running with kubectl get pods -n mermin

  2. Pod Logs: Review logs using kubectl logs -l app.kubernetes.io/name=mermin -n mermin

  3. Configuration: Verify your HCL syntax and configuration values

  4. Connectivity: Test network access to your OTLP endpoints

  5. Permissions: Confirm RBAC roles and Linux capabilities are properly set

  6. eBPF Support: Verify your kernel version supports eBPF

Common Issue Categories

Troubleshooting guides are organized into three categories:

Covers pod startup failures, permission errors, CNI conflicts, and TC/TCX priority configuration when Mermin fails to start or crashes.

circle-exclamation

Symptoms:

  • Pods stuck in Pending, CrashLoopBackOff, or Error states

  • eBPF programs that fail to load

  • Permission or capability errors

  • TC priority conflicts with your CNI plugin

  • Flow gaps after pod restarts

Diagnose verifier failures, program loading errors, and kernel compatibility issues.

Symptoms:

  • Verifier instruction limit exceeded errors

  • Invalid memory access errors

  • Kernel version incompatibilities

  • BTF (BPF Type Format) support issues

Explains traffic visibility at different network layers and correct interface monitoring configuration when expected traffic is missing.

Note: If a configured interface is missing, Mermin logs a warning but continues monitoring other valid interfaces.

Symptoms:

  • Missing or incomplete traffic capture

  • Partial flow visibility

  • CNI-specific interface configuration questions

  • Understanding tunnel encapsulation behavior

Diagnostic Commands

Use these commands to gather information and diagnose issues:

View Pod Logs

Check what Mermin is reporting:

Enable Debug Logging

Enable debug mode in your configuration for detailed information:

Health Check Endpoints

With the API server enabled, check Mermin's health status:

Metrics Monitoring

Mermin exposes Prometheus metrics to identify performance issues and verify operations:

See the Internal Metrics guide for complete metrics documentation and Prometheus query examples.

Key metrics to monitor include:

  • mermin_flow_spans_created_total - Total flow spans created

  • mermin_packets_total - Total packets processed

  • mermin_export_flow_spans_total{exporter_type="otlp",status="error"} - OTLP export failures (investigate if increasing)

  • mermin_export_flow_spans_total{exporter_type="stdout",status="error"} - Stdout export failures (investigate if increasing)

Diagnosing Flow Span Drops

When flow spans are dropped, inspect internal metrics to identify the bottleneck stage:

  • Worker queue drops: The kernel is producing events faster than userspace can consume them. Increase pipeline.ebpf_ringbuf_worker_capacity or pipeline.worker_count.

  • Flow span channel drops: The enrichment stage is lagging. Increase pipeline.flow_producer.flow_span_queue_capacity or add CPU resources (the decorator runs as a cooperative task on the main runtime; see Worker threads).

  • Decorated span channel drops: There is backpressure from the export stage. Increase pipeline.k8s_decorator_channel_capacity or optimize your OTLP exporter settings.

If tuning does not resolve the issue, reduce the number of monitored interfaces or increase the CPU limits allocated to the agent.

Test eBPF Capabilities

Use the diagnose bpf subcommand to validate eBPF support and test attach/detach operations:

This validates:

  • Required Linux capabilities

  • eBPF program loading and attach/detach operations

  • BPF filesystem writeability

  • Kernel version compatibility

For detailed usage, interpreting results, and troubleshooting failures, see Deployment Issues: Test eBPF Attach/Detach Operations.

Getting Help


Next Steps

  1. Fine-Tune Your Configuration: Optimize for your environment

  2. Set Up Monitoring: Track performance and health

Last updated