Architecture

This page explains how Mermin works, its architecture, and the flow of data from network packets to Flow Traces in your observability backend.

What are Flow Traces?

Flow Traces are OpenTelemetry traces, which are combined from multiple Flow Trace Spans and represent a long lived connection. Flow Trace Spans are OpenTelemetry trace spans that represent network flows with NetFlow-like semantics. Unlike traditional NetFlow or IPFIX:

OpenTelemetry Native: Flow Traces are OTLP trace spans, not proprietary flow protocols
Bidirectional: A single span represents both directions of a flow
Rich Metadata: Includes Kubernetes context (pods, services, deployments, labels)
Standardized Format: Works with any OTLP-compatible observability platform

Mermin generates Flow Trace Spans by capturing network packets, aggregating them into flows, decorating with Kubernetes metadata, and exporting as OpenTelemetry spans.

network packet → Mermin → flow span (network flow) → flow trace (network connection)

High-Level Architecture

Mermin is deployed as a DaemonSet in Kubernetes, with one agent instance running on each node in your cluster. Each agent independently captures and processes network traffic from its host node.

┌─────────────────────────────────────────────┐
│             Kubernetes Cluster              │
│                                             │
│  ┌──────────────┐         ┌──────────────┐  │
│  │    Node 1    │         │    Node 2    │  │
│  │              │         │              │  │
│  │  ┌────────┐  │         │  ┌────────┐  │  │
│  │  │ Mermin │  │         │  │ Mermin │  │  │
│  │  │ Agent  │  │         │  │ Agent  │  │  │
│  │  └───┬────┘  │         │  └───┬────┘  │  │
│  │      │ eBPF  │         │      │ eBPF  │  │
│  │      ↓       │         │      ↓       │  │
│  │  [Network]   │         │  [Network]   │  │
│  │  [Packets]   │         │  [Packets]   │  │
│  └──────────────┘         └──────────────┘  │
│         │                        │          │
│         └────────────┬───────────┘          │
│                      │ OTLP                 │
└──────────────────────┼──────────────────────┘
                       ↓
              ┌─────────────────┐
              │ OpenTelemetry   │
              │   Collector     │
              └────────┬────────┘
                       │
        ┌──────────────┼──────────────┐
        ↓              ↓              ↓
   ┌────────┐    ┌─────────┐    ┌────────┐
   │Elastic │    │ Grafana │    │ Jaeger │
   │ Stack  │    │  Tempo  │    │        │
   └────────┘    └─────────┘    └────────┘

Components

Data pipeline overview, more details on the pipeline are documented in the data-flow block

            <kernel space>
              eBPF TC
                  ↓
          Network Interface
            ↓           ↓
            <kernel space>
Flow Stats (hashmap)   Flow Events (ring buffer)   Listening Ports (hashmap)
            <user space>
                  ↓
            Flow Producer
                  ↓
            K8s Decorator
                  ↓
            OTLP Export

eBPF Programs

Mermin uses eBPF (extended Berkeley Packet Filter) programs loaded into the Linux kernel to capture network packets with minimal overhead. These programs:

Attach to network interfaces specified in your configuration
Capture packets at the TC (Traffic Control) layer
Aggregate packet data into flow statistics within the FLOW_STATS eBPF HashMap
Notify userspace of new flows via the FLOW_EVENTS ring buffer
Track listening ports (servers) in the LISTENING_PORTS eBPF HashMap for client/server direction inference
For encapsulated or tunneled packets, send inner packet headers to userspace via FLOW_EVENTS for decoding

eBPF provides several advantages

High Performance: Executes directly in the kernel, avoiding context switches
Low Overhead: Processes only necessary packet headers, not full payloads
Safety: Verified by the kernel to ensure it cannot crash or hang the system
No Kernel Modules: No need to compile or load custom kernel modules

Flow Span Generation Engine

The userspace Mermin agent receives packets from eBPF and aggregates them into network flow trace spans:

Bidirectional Flow Spans: Groups packets by 5-tuple (source IP/port, dest IP/port, protocol)
State Tracking: Maintains connection state for TCP (SYN, FIN, RST flags)
Timeout Management: Expires inactive flows based on configurable timeouts
Protocol Parsing: Deep packet inspection for tunneling protocols (VXLAN, Geneve, WireGuard)
Community ID: Generates standard Community ID hashes for flow correlation

A Flow Trace Span includes:

Source and destination IP addresses and ports
Network protocol (TCP, UDP, ICMP, etc.)
Packet and byte counters (bidirectional)
TCP flags and connection state
Flow start and end timestamps
Community ID hash

State Persistence

Mermin preserves flow state across pod restarts through eBPF map pinning, ensuring continuous visibility without data loss:

Map Pinning: FLOW_STATS, FLOW_EVENTS, and LISTENING_PORTS maps are pinned to /sys/fs/bpf/ when writable (requires /sys/fs/bpf mount, refer to the security-considerations document)
Schema Versioning: Maps use versioned paths (e.g., mermin_flow_stats_map_v1) to prevent incompatible format reuse across upgrades
State Continuity: Flow statistics and listening port data persist across mermin restarts, eliminating visibility gaps during rolling updates
Format Validation: Pinned maps are reused only if schema version and format match current version
Graceful Degradation: If pinning fails, mermin continues with unpinned maps (logged as warning)
Upgrade Safety: When struct layouts change, increment EBPF_MAP_SCHEMA_VERSION to create new versioned maps

This ensures:

No flow data loss during pod restarts or rolling updates
Existing flows continue to accumulate statistics across restarts
Listening port information is preserved for accurate direction inference
Safe upgrades without corrupt data reuse
Easy rollbacks (old map versions remain available)

Kubernetes Integration

Mermin deeply integrates with Kubernetes to decorate flows with contextual metadata:

Informers

Mermin uses Kubernetes informers (watch APIs) to maintain an in-memory cache of cluster resources:

Pods, Services, Deployments, ReplicaSets, StatefulSets, DaemonSets
Jobs, CronJobs, NetworkPolicies
Endpoints, EndpointSlices, Ingresses, Gateways

This cache is continuously updated as resources change, ensuring metadata is always current.

Flow Attribution

For each network flow, Mermin:

Identifies Pods: Matches source/destination IPs to pod IPs
Extracts Metadata: Retrieves pod name, namespace, labels, annotations
Walks Owner References: Follows ownerReferences from Pod, for example Pod → ReplicaSet → Deployment
Selector Matching: Finds Services and NetworkPolicies that select the pod via it's selectors.
Decorates Traces: Attaches all relevant metadata to the Flow Trace Span

This provides full context for each network flow, enabling powerful filtering and analysis.

OTLP Exporter

Mermin exports flows as Flow Traces using the OpenTelemetry Protocol (OTLP):

Flow Traces as Spans: Each network flow becomes an OpenTelemetry trace span
Standard Protocol: OTLP is an industry-standard telemetry protocol (OTel docs, vendors)
Flexible Transport: Supports both gRPC and HTTP protocols
Batching: Aggregates multiple Flow Trace Spans before sending to reduce network overhead
Backpressure Handling: Queues Flow Traces if the backend is unavailable
Authentication: Supports Basic Auth, TLS client certificates
Secure Transport: TLS encryption with custom CA certificate support

Flow Traces are exported as OTLP trace spans, allowing them to be processed by any OTLP-compatible backend without requiring NetFlow collectors.

Performance Characteristics

Resource Usage

Mermin is designed to be efficient in production environments:

CPU: Typically 0.1-0.5 cores (100-500 mCPUs) per agent, varies with traffic volume
Memory: Base usage ~100-200 MB, grows with flow table size
Network: Outbound OTLP traffic depends on flow rate and batching settings
Kernel: eBPF programs have minimal impact (< 1% CPU overhead)

Scalability

Flow Rate: Can handle 10,000+ flows/second per agent on modern hardware
Packet Rate: Processes 100,000+ packets/second with minimal packet loss
Cluster Size: Scales linearly – each node runs its own independent agent
Flow Table Size: Configurable, defaults support ~100,000 concurrent flows

Tunability

Mermin provides extensive configuration for performance tuning under the pipeline block, please refer the pipeline documentation for the details.

Failure Modes and Resilience

Agent Failure

If a Mermin agent crashes or is terminated:

Local Impact Only: Only flows from that node are affected
Kubernetes Restart: DaemonSet controller automatically restarts the pod
No Data Loss: Flow state is ephemeral; new flows are captured after restart
No Cluster Impact: Other nodes continue operating normally

Backend Unavailability

If the OTLP backend is unavailable:

Queuing: Flows are queued up to max_queue_size
Backpressure: If queue fills, oldest flows are dropped (not newest)
Automatic Retry: Mermin retries failed exports with exponential backoff
Graceful Degradation: Agent continues capturing flows

Comparison with Alternatives

vs. eBPF Observability Tools (Cilium Hubble, Pixie)

Mermin provides:

Flow-level granularity: Every individual network flow exported as a Flow Trace with full metadata
CNI Agnostic: Not tied to a specific CNI implementation (works with Cilium, Calico, Flannel, etc.)
Pure OTLP export to any OpenTelemetry-compatible backend
Lightweight, focused solely on network flow observability
No vendor lock-in or platform dependencies
Flexible backend choice (Elastic, Grafana, Jaeger, cloud providers)
Historical flow analysis and long-term storage in your observability backend

Cilium Hubble provides:

Aggregated network metrics (connection rates, error rates, latencies)
Deep integration with Cilium CNI and network policies
Service map visualization with Hubble UI
Layer 7 protocol visibility (HTTP, gRPC, Kafka, DNS)
Requires Cilium as the CNI
Limited historical data retention (ephemeral, in-memory)

Pixie provides:

Aggregated network metrics with short-term retention
Full application observability (traces, logs, metrics, profiling)
Auto-instrumentation for multiple languages
In-cluster data processing and querying
Requires Pixie platform deployment
Limited long-term storage (auto-deletes data after hours/days)

Key Insight: Mermin is the only tool that provides flow-level granularity - each individual network flow becomes a Flow Trace with complete metadata (source/dest pods, services, deployments, labels, packet/byte counts, TCP flags, etc.). Hubble and Pixie provide aggregated network metrics (requests/sec, error rates), which are useful for dashboards but don't give you the raw flow data needed for deep investigation, compliance, or security forensics.

Trade-off: Hubble and Pixie offer broader observability features (L7 protocols, application tracing) but with platform coupling and metric aggregation. Mermin prioritizes CNI/backend flexibility and flow-level detail, enabling long-term storage and granular analysis of every network connection.

vs. NetFlow/IPFIX Exporters

Mermin Flow Traces provide:

OpenTelemetry-native format (OTLP trace spans)
Kubernetes metadata: pods, services, deployments, labels, owner references
Modern observability backend integration (Tempo, Jaeger, Elastic, OpenSearch)
No specialized NetFlow collectors required
CNI Agnostic: Captures flows regardless of CNI implementation
Cloud-native architecture (DaemonSet, Helm charts)

Traditional NetFlow/IPFIX provides:

Established protocol with decades of tooling
Hardware switch/router support
Legacy network monitoring platform compatibility
SNMP integration for traditional network management

Trade-off: NetFlow/IPFIX is ideal for traditional network infrastructure. Mermin is purpose-built for cloud-native Kubernetes environments with modern observability stacks.

vs. Packet Capture Tools (tcpdump, Wireshark)

Mermin provides:

Continuous, automated flow capture without manual intervention
Bidirectional flow aggregation with packet/byte counters
Kubernetes metadata enrichment (pods, services, deployments)
Efficient OTLP export to any observability backend
Production-ready with minimal performance overhead

tcpdump/Wireshark provide:

Full packet payload capture for deep inspection
Interactive analysis and filtering (Wireshark GUI)
Protocol dissection for debugging specific issues
Manual, on-demand troubleshooting

Trade-off: Use Mermin for continuous observability; use packet capture tools for deep troubleshooting of specific issues.

vs. Service Mesh (Istio, Linkerd)

Note: These are fundamentally different tools for different jobs. Service meshes are for traffic management and security. Mermin is for network observability. They are complementary, not alternatives.

Mermin provides (Observability):

Network flow visibility across your entire cluster
Zero application changes or sidecar injection required
Captures all traffic: pod-to-pod, pod-to-external, host network, non-mesh workloads
CNI Agnostic: Works with any CNI (Cilium, Calico, Flannel, cloud-native CNIs)
Lower resource overhead (no sidecar per pod)
Network-layer (L3/L4) flow telemetry

Service Mesh provides (Traffic Management & Security):

Layer 7 (HTTP, gRPC) traffic control and policy enforcement
Traffic management (retries, timeouts, circuit breaking, canary deployments)
Mutual TLS encryption between services
Service-to-service authorization and authentication
Request routing and load balancing strategies
(Also includes L7 observability metrics as a side benefit)

Key Insight: You can run Mermin alongside a service mesh. Mermin observes network flows (L3/L4) across all workloads, while the service mesh manages application traffic (L7) for enrolled services. Many organizations use both together.

Last updated 15 days ago

hashtagWhat are Flow Traces?

hashtagHigh-Level Architecture

hashtagComponents

hashtageBPF Programs

hashtagFlow Span Generation Engine

hashtagState Persistence

hashtagKubernetes Integration

hashtagInformers

hashtagFlow Attribution

hashtagOTLP Exporter

hashtagPerformance Characteristics

hashtagResource Usage

hashtagScalability

hashtagTunability

hashtagFailure Modes and Resilience

hashtagAgent Failure

hashtagBackend Unavailability

hashtagComparison with Alternatives

hashtagvs. eBPF Observability Tools (Cilium Hubble, Pixie)

hashtagvs. NetFlow/IPFIX Exporters

hashtagvs. Packet Capture Tools (tcpdump, Wireshark)

hashtagvs. Service Mesh (Istio, Linkerd)