Troubleshooting Overview
Diagnose and resolve common issues when deploying and operating Mermin.
Quick Diagnostic Checklist
Start with these quick checks to identify issues:
Pod Status: Check if pods are running with
kubectl get pods -n merminPod Logs: Review logs using
kubectl logs -l app.kubernetes.io/name=mermin -n merminConfiguration: Verify your HCL syntax and configuration values
Connectivity: Test network access to your OTLP endpoints
Permissions: Confirm RBAC roles and Linux capabilities are properly set
eBPF Support: Verify your kernel version supports eBPF
Common Issue Categories
Troubleshooting guides are organized into three categories:
Covers pod startup failures, permission errors, CNI conflicts, and TC/TCX priority configuration when Mermin fails to start or crashes.
eBPF load failures prevent startup. Verify your kernel version (5.14+) and confirm eBPF capabilities are enabled. For quick diagnosis, see the Quick Reference Table in Common eBPF Errors.
Symptoms:
Pods stuck in
Pending,CrashLoopBackOff, orErrorstateseBPF programs that fail to load
Permission or capability errors
TC priority conflicts with your CNI plugin
Flow gaps after pod restarts
Diagnose verifier failures, program loading errors, and kernel compatibility issues.
Symptoms:
Verifier instruction limit exceeded errors
Invalid memory access errors
Kernel version incompatibilities
BTF (BPF Type Format) support issues
Explains traffic visibility at different network layers and correct interface monitoring configuration when expected traffic is missing.
Note: If a configured interface is missing, Mermin logs a warning but continues monitoring other valid interfaces.
Symptoms:
Missing or incomplete traffic capture
Partial flow visibility
CNI-specific interface configuration questions
Understanding tunnel encapsulation behavior
Diagnostic Commands
Use these commands to gather information and diagnose issues:
View Pod Logs
Check what Mermin is reporting:
Enable Debug Logging
Enable debug mode in your configuration for detailed information:
Health Check Endpoints
With the API server enabled, check Mermin's health status:
Metrics Monitoring
Mermin exposes Prometheus metrics to identify performance issues and verify operations:
See the Internal Metrics guide for complete metrics documentation and Prometheus query examples.
Key metrics to monitor include:
mermin_flow_spans_created_total- Total flow spans createdmermin_packets_total- Total packets processedmermin_export_flow_spans_total{exporter_type="otlp",status="error"}- OTLP export failures (investigate if increasing)mermin_export_flow_spans_total{exporter_type="stdout",status="error"}- Stdout export failures (investigate if increasing)
Diagnosing Flow Span Drops
When flow spans are dropped, inspect internal metrics to identify the bottleneck stage:
Worker queue drops: The kernel is producing events faster than userspace can consume them. Increase
pipeline.ebpf_ringbuf_worker_capacityorpipeline.worker_count.Flow span channel drops: The enrichment stage is lagging. Increase
pipeline.flow_producer.flow_span_queue_capacityor add CPU resources (the decorator runs as a cooperative task on the main runtime; see Worker threads).Decorated span channel drops: There is backpressure from the export stage. Increase
pipeline.k8s_decorator_channel_capacityor optimize your OTLP exporter settings.
If tuning does not resolve the issue, reduce the number of monitored interfaces or increase the CPU limits allocated to the agent.
Test eBPF Capabilities
Use the diagnose bpf subcommand to validate eBPF support and test attach/detach operations:
This validates:
Required Linux capabilities
eBPF program loading and attach/detach operations
BPF filesystem writeability
Kernel version compatibility
For detailed usage, interpreting results, and troubleshooting failures, see Deployment Issues: Test eBPF Attach/Detach Operations.
Getting Help
Search Existing Issues: Check if someone else encountered the same problem
GitHub Discussions: Ask questions and discuss best practices
When creating an issue, include:
Mermin version and Kubernetes version
Your CNI plugin (e.g., Calico, Cilium, Flannel)
Complete error logs from affected pods
Your configuration (with sensitive values removed)
Steps to reproduce the issue
Next Steps
Fine-Tune Your Configuration: Optimize for your environment
Set Up Monitoring: Track performance and health
Diagnose eBPF Errors: Detailed verifier error solutions
Resolve Deployment Issues: Pod startup and permission problems
Understand Interface Visibility: Why traffic might not appear
Last updated