githubEdit

Debugging eBPF

This guide covers how to inspect, debug, and optimize eBPF programs in Mermin. It includes tools and techniques for understanding program behavior, performance characteristics, and troubleshooting issues.

Table of Contents


Debugging eBPF Programs with bpftool

This section covers how to use bpftool to inspect and debug your eBPF programs running in the cluster. This is essential for understanding program behavior, performance characteristics, and troubleshooting issues.

Prerequisites

To use bpftool for debugging, you'll need access to a container with bpftool installed. The mermin-builder image includes bpftool, so you can use it directly.

1. Build the containerized environment (if not already built)

docker build -t mermin-builder:latest --target builder .

2. Access the container with bpftool

docker run -it --privileged --mount type=bind,source=.,target=/app mermin-builder:latest /bin/bash

Basic eBPF Program Inspection

List all loaded eBPF programs

This shows all eBPF programs currently loaded in the kernel, including their IDs, types, names, and tags.

Find specific programs by name

This filters the list to show only programs with "mermin" in the name.

Get detailed information about a specific program

This provides comprehensive information including:

  • Program type and name

  • Load time and user ID

  • Translated bytecode size (xlated)

  • JIT-compiled size (jited)

  • Memory lock size (memlock)

  • Associated map IDs

  • BTF (BPF Type Format) ID

Analyzing Program Instructions

Count the number of instructions in an eBPF program

One of the most useful metrics for eBPF programs is the instruction count, which affects performance and complexity limits.

What this command does:

  • bpftool prog dump xlated id 167: Dumps the translated bytecode for program ID 167

  • grep -E '^[0-9]+:': Filters to only show lines that start with numbers (the actual instructions)

  • wc -l: Counts the total number of instruction lines

Example output:

This shows that your mermin eBPF program contains 2,584 instructions.

Alternative methods for instruction counting

Method 1: Raw line count (includes comments and headers):

Method 2: Size-based estimation:

Method 3: View actual instructions (first 20 lines):

Advanced eBPF Analysis

Inspect eBPF maps

Check program verification details

Monitor program performance

Troubleshooting Common Issues

Program loading failures

If your eBPF program fails to load, check the verification log:

Instruction limit exceeded

eBPF programs have instruction limits (typically 1 million for complex programs). If you hit this limit:

Memory issues

Check memory usage and limits:

Integration with Development Workflow

You can integrate bpftool analysis into your development process:

This command provides a comprehensive overview of all mermin programs and their instruction counts in a single execution.


Measuring eBPF Stack Usage

eBPF programs have a strict 512-byte stack limit. When exceeded, you'll see errors like:

Critical Concept: Individual vs. Cumulative Stack Usage

Individual Function Stack: Maximum stack used by any single function Cumulative Call Chain Stack: Total stack across all functions in a call chain

The verifier failure above shows CUMULATIVE usage: 144 + 328 + 0 = 544 bytes

Quick Analysis

1. Prerequisites

2. Stack Analysis Scripts

The project includes three analysis scripts in the scripts/ directory:

scripts/check_stack_usage.sh - Quick health check (30 seconds)

  • Purpose: Fast individual function stack analysis for daily development and CI/CD

  • Thresholds: Critical >320 bytes, Warning >192 bytes (64-byte aligned)

  • Output: Simple pass/fail with color-coded status

  • Features: Forces fresh builds, detects build failures, prevents stale results

scripts/analyze_call_chain.sh - Call chain overview (45 seconds)

  • Purpose: Shows function calls and stack usage levels for initial investigation

  • Output: Function call instructions and sorted stack usage levels

  • Use When: Investigating verifier failures or understanding call patterns

  • Features: Forces fresh builds, shows binary timestamps, handles no-call scenarios

scripts/cumulative_stack_calculator.sh - Educational deep dive (2 minutes)

  • Purpose: Step-by-step educational breakdown of cumulative stack calculation

  • Output: Detailed hex-to-decimal conversions, scenarios, and insights

  • Use When: Learning how verifier calculates stack, training new developers

  • Features: Forces fresh builds, comprehensive error handling

3. Running the Analysis

Interpreting Results

Understanding check_stack_usage.sh Output

  • Below 192 bytes: Safe for most call chains

  • 192-320 bytes: Monitor call depth - might exceed 512 in deep chains

  • Above 320 bytes: High risk - will likely cause verifier failures

Understanding analyze_call_chain.sh Output

How to interpret:

  • Multiple calls: Shows potential call chain depth

  • High stack values: Look for values >192 bytes

  • Combined risk: Add largest values to estimate cumulative usage

Understanding Verifier Error Messages

Translation:

  • 3 calls: Call chain is Function A → Function B → Function C

  • 544 bytes: Total cumulative stack (144 + 328 + 0 = 472 + ~72 bytes overhead)

  • 144, 328, 0: Individual stack usage per function in the chain

Critical Thresholds (64-byte aligned)

  • 192 bytes: Warning threshold - monitor for deep call chains

  • 320 bytes: Critical threshold - high probability of overflow

  • 512 bytes: Hard eBPF limit - verifier will reject

Quick Fixes

When you see high stack usage:

  1. Split Large Functions: Break functions >192 bytes into smaller ones

  2. Eliminate Large Variables: Avoid big structs on the stack

  3. Use #[inline(always)]: For small helper functions

  4. Check Call Depth: Minimize function call chains

Advanced Analysis Commands

For deeper investigation:

CI/CD Integration

For CI/CD pipelines, use the quick health check:

For debugging failed CI builds, run locally:

This approach gives you both quick diagnostics and deep analysis capabilities for eBPF stack issues.


Next Steps

Last updated