eBPF Tutorial: Energy Monitoring for Process-Level Power Analysis
Have you ever wondered how much energy your applications are consuming? As energy efficiency becomes increasingly critical in both data centers and edge devices, understanding power consumption at the process level is essential for optimization. In this tutorial, we'll build an eBPF-based energy monitoring tool that provides real-time insights into process-level power consumption with minimal overhead.
Introduction to Energy Monitoring and Power Analysis
Energy monitoring in computing systems has traditionally been challenging due to the lack of fine-grained measurement capabilities. While hardware counters like Intel RAPL (Running Average Power Limit) can measure total system or CPU package power, they don't tell you which processes are consuming that power. This is where software-based energy attribution comes into play.
When a process runs on a CPU, it consumes power proportional to its CPU time and the processor's power state. The challenge is accurately tracking this relationship in real-time without introducing significant overhead that would itself consume power and skew measurements. Traditional approaches using polling-based monitoring can miss short-lived processes and introduce measurement overhead that affects the very metrics being measured.
This is where eBPF shines! By hooking into the kernel's scheduler events, we can track process CPU time with nanosecond precision at every context switch. This gives us:
- Exact CPU time measurements for every process
- Zero sampling error for short-lived processes
- Minimal overhead compared to polling approaches
- Real-time energy attribution based on CPU time
- The ability to correlate energy usage with specific workloads
Understanding CPU Power Consumption
Before diving into the implementation, it's important to understand how CPU power consumption works. Modern processors consume power in several ways:
Dynamic Power Consumption
Dynamic power is consumed when transistors switch states during computation. It's proportional to: - Frequency: Higher clock speeds mean more switching per second - Voltage: Higher voltages require more energy per switch - Activity: More instructions executed means more transistor switching
The relationship is approximately: P_dynamic = C × V² × f × α
Where C is capacitance, V is voltage, f is frequency, and α is the activity factor.
Static Power Consumption
Static (or leakage) power is consumed even when transistors aren't switching, due to current leakage through the transistors. This has become increasingly significant in modern processors with billions of transistors.
Power States and DVFS
Modern CPUs use Dynamic Voltage and Frequency Scaling (DVFS) to balance performance and power consumption. The processor can operate at different P-states (performance states) with varying frequency/voltage combinations, and enter C-states (idle states) when not actively computing.
Our energy monitoring approach estimates energy consumption by multiplying CPU time by average power consumption. While this is a simplification (it doesn't account for frequency changes or idle states), it provides a useful approximation for comparing relative energy usage between processes.
Comparing Traditional vs eBPF Energy Monitoring
To understand why eBPF is superior for energy monitoring, let's compare it with traditional approaches:
Traditional /proc-based Monitoring
Traditional energy monitoring tools typically work by periodically reading /proc/stat
to sample CPU usage. Here's how our traditional monitor works:
# Read total CPU time for a process
cpu_time=$(awk '{print $14 + $15}' /proc/$pid/stat)
# Calculate energy based on time delta
energy = cpu_power * (current_time - previous_time)
This approach has several limitations:
- Sampling Error: Processes that start and stop between samples are missed entirely
- Fixed Overhead: Each sample requires reading and parsing
/proc
files - Limited Precision: Typical sampling intervals are 100ms or more
- Scalability Issues: Monitoring many processes requires reading many files
eBPF-based Monitoring
Our eBPF approach hooks directly into the kernel scheduler:
SEC("tp/sched/sched_switch")
int monitor_energy(struct trace_event_raw_sched_switch *ctx) {
u64 ts = bpf_ktime_get_ns();
// Track exact time when process stops running
u64 delta = ts - previous_timestamp;
update_runtime(prev_pid, delta);
}
The advantages are significant:
- Perfect Accuracy: Every context switch is captured
- Minimal Overhead: No polling or file parsing needed
- Nanosecond Precision: Exact CPU time measurements
- Scalable: Same overhead whether monitoring 1 or 1000 processes
Why eBPF for Energy Monitoring?
The landscape of energy monitoring has evolved significantly, as detailed in the comprehensive survey of eBPF energy projects. Let me incorporate the key insights from the energy monitoring ecosystem:
Current State of eBPF Energy Projects
The eBPF ecosystem for energy management is rapidly evolving across two main categories: mature telemetry solutions and emerging power control frameworks.
Energy Telemetry and Accounting (Production-Ready)
Project | Capabilities | Implementation Approach | Status |
---|---|---|---|
Kepler | Container/pod energy attribution for Kubernetes | eBPF tracepoints + RAPL + performance counters | CNCF sandbox project, production deployments |
Wattmeter | Per-process energy tracking | Context-switch eBPF programs reading RAPL MSRs | Research prototype (HotCarbon '24), <1μs overhead |
DEEP-mon | Container power monitoring | In-kernel eBPF aggregation of scheduler events | Proven academic approach, avoids userspace overhead |
Power Control via eBPF (Research and Development)
The emerging power control landscape represents the next frontier in eBPF energy management. cpufreq_ext stands as the first upstream-bound eBPF implementation that can actually modify CPU frequency through a bpf_struct_ops
interface, allowing frequency scaling policies to be written in eBPF rather than kernel C code.
Research prototypes include an eBPF CPU-Idle Governor that replaces traditional menu/TEO governors with eBPF hooks for dynamic idle state selection and idle injection. The conceptual BEAR (BPF Energy-Aware Runtime) framework aims to unify DVFS, idle, and thermal management under a single eBPF-based policy engine, though no public implementation exists yet.
Why Our Approach Matters
Our energy monitor fits into the telemetry category but with a unique focus on educational clarity and comparison with traditional methods. eBPF's event-driven architecture fundamentally differs from polling-based approaches by reacting to kernel events in real-time. When the scheduler switches processes, our code runs immediately, capturing the exact transition moment with nanosecond precision.
The in-kernel aggregation capability eliminates the overhead of sending every context switch event to userspace by maintaining per-CPU hash maps in the kernel. Only aggregated data or sampled events need to cross the kernel-user boundary, dramatically reducing monitoring overhead. Combined with eBPF's safety guarantees through program verification before loading, this creates a production-ready solution that can't crash the kernel or create infinite loops.
Perhaps most importantly, eBPF enables hot-pluggable analysis where you can attach and detach the energy monitor without restarting applications or rebooting the system. This capability enables ad-hoc analysis of production workloads, something impossible with traditional kernel modules or instrumentation approaches.
Real-World Impact
The practical benefits of eBPF energy monitoring are substantial across different deployment scenarios:
Use Case | Traditional Approach | eBPF Approach | Benefit |
---|---|---|---|
Short-lived processes | Often missed entirely | Every microsecond tracked | 100% visibility |
Container monitoring | High overhead per container | Shared kernel infrastructure | 10-100x less overhead |
Production systems | Risky kernel modules | Verified safe programs | Zero crash risk |
Dynamic workloads | Fixed sampling misses spikes | Event-driven captures all | Accurate spike detection |
When eBPF Energy Monitoring is Essential
eBPF energy monitoring becomes critical in scenarios where precision, low overhead, and real-time feedback are paramount.
Deployment Scenario | Key Requirements | Why eBPF Excels |
---|---|---|
Battery-Powered Devices | Every millijoule matters, minimal monitoring overhead | Low overhead means monitoring doesn't impact battery life |
Multi-Tenant Clouds | Accurate billing, power budget enforcement | Precise attribution enables fair energy accounting |
Thermal Management | Real-time feedback in thermally constrained environments | Event-driven updates provide immediate thermal response |
Sustainability Reporting | Audit-quality measurements for carbon footprint | Production-grade accuracy without traditional overhead |
Performance/Watt Optimization | Measure impact of code changes with minimal perturbation | A/B testing capabilities with near-zero measurement bias |
These use cases share common requirements that traditional polling-based approaches struggle to meet: the need for accurate, low-overhead, real-time energy attribution that can operate reliably in production environments.
The ecosystem is rapidly maturing, with projects like Kepler already deployed in production Kubernetes clusters and cpufreq_ext heading toward mainline kernel inclusion. Our tutorial provides a foundation for understanding and building upon these advanced capabilities.
Architecture Overview
Our energy monitoring solution provides a comprehensive comparison framework with two distinct implementations. The eBPF Energy Monitor delivers high-performance monitoring through kernel hooks, while the Traditional Energy Monitor uses bash-based /proc
sampling to represent conventional approaches. A Comparison Script enables direct evaluation of both methods under identical conditions.
The eBPF implementation architecture consists of three tightly integrated components:
Header File (energy_monitor.h)
Defines the shared data structure for kernel-user communication:
struct energy_event {
__u64 ts; // Timestamp of context switch
__u32 cpu; // CPU core where process ran
__u32 pid; // Process ID
__u64 runtime_ns; // How long process ran (nanoseconds)
char comm[16]; // Process name
};
eBPF Program (energy_monitor.bpf.c)
Implements the kernel-side logic with three key maps:
// Track when each process started running
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_HASH);
__uint(max_entries, 10240);
__type(key, u32); // PID
__type(value, u64); // Start timestamp
} time_lookup SEC(".maps");
// Accumulate total runtime per process
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_HASH);
__uint(max_entries, 10240);
__type(key, u32); // PID
__type(value, u64); // Total runtime in microseconds
} runtime_lookup SEC(".maps");
// Send events to userspace
struct {
__uint(type, BPF_MAP_TYPE_RINGBUF);
__uint(max_entries, 256 * 1024);
} rb SEC(".maps");
User-Space Application (energy_monitor.c)
Processes events and calculates energy consumption based on configured CPU power.
Implementation Deep Dive
Let's explore the key parts of our eBPF energy monitor implementation:
Hooking into the Scheduler
The core of our monitor is the scheduler tracepoint that fires on every context switch:
SEC("tp/sched/sched_switch")
int monitor_energy(struct trace_event_raw_sched_switch *ctx)
{
u64 ts = bpf_ktime_get_ns();
u32 cpu = bpf_get_smp_processor_id();
u32 prev_pid = ctx->prev_pid;
u32 next_pid = ctx->next_pid;
// Calculate runtime for the process that just stopped
u64 *old_ts_ptr = bpf_map_lookup_elem(&time_lookup, &prev_pid);
if (old_ts_ptr) {
u64 delta = ts - *old_ts_ptr;
update_runtime(prev_pid, delta);
// Send event to userspace for real-time monitoring
struct energy_event *e = bpf_ringbuf_reserve(&rb, sizeof(*e), 0);
if (e) {
e->ts = ts;
e->cpu = cpu;
e->pid = prev_pid;
e->runtime_ns = delta;
bpf_probe_read_kernel_str(e->comm, sizeof(e->comm), ctx->prev_comm);
bpf_ringbuf_submit(e, 0);
}
}
// Record when the next process starts running
bpf_map_update_elem(&time_lookup, &next_pid, &ts, BPF_ANY);
return 0;
}
This function captures the exact moment when the CPU switches from one process to another, allowing us to calculate precisely how long each process ran.
Efficient Time Calculation
To minimize overhead in the kernel, we use an optimized division function to convert nanoseconds to microseconds:
static inline u64 div_u64_by_1000(u64 n) {
u64 q, r, t;
t = (n >> 7) + (n >> 8) + (n >> 12);
q = (n >> 1) + t + (n >> 15) + (t >> 11) + (t >> 14);
q = q >> 9;
r = n - q * 1000;
return q + ((r + 24) >> 10);
}
This bit-shifting approach is much faster than regular division in the kernel context where floating-point operations aren't available.
Energy Calculation in Userspace
The userspace program receives runtime events and calculates energy consumption:
static int handle_event(void *ctx, void *data, size_t data_sz)
{
const struct energy_event *e = data;
// Calculate energy in nanojoules
// Energy (J) = Power (W) × Time (s)
// Energy (nJ) = Power (W) × Time (ns)
__u64 energy_nj = (__u64)(env.cpu_power_watts * e->runtime_ns);
if (env.verbose) {
printf("%-16s pid=%-6d cpu=%-2d runtime=%llu ns energy=%llu nJ\n",
e->comm, e->pid, e->cpu, e->runtime_ns, energy_nj);
}
return 0;
}
Final Statistics
When the monitoring session ends, we aggregate the data from all CPU cores:
static void print_stats(struct energy_monitor_bpf *skel)
{
int num_cpus = libbpf_num_possible_cpus();
__u64 *values = calloc(num_cpus, sizeof(__u64));
// Iterate through all processes
while (bpf_map_get_next_key(bpf_map__fd(skel->maps.runtime_lookup),
&key, &next_key) == 0) {
// Sum values from all CPUs (percpu map)
if (bpf_map_lookup_elem(bpf_map__fd(skel->maps.runtime_lookup),
&next_key, values) == 0) {
for (int i = 0; i < num_cpus; i++) {
runtime_us += values[i];
}
// Calculate energy
double energy_mj = (env.cpu_power_watts * runtime_us) / 1000000.0;
printf("%-10d %-16s %-15.2f %-15.4f\n",
next_key, comm, runtime_ms, energy_mj);
}
}
}
Building and Running the Energy Monitor
Prerequisites
Before building, ensure you have: - Linux kernel 5.4 or newer with BTF support - libbpf development files - clang and llvm for eBPF compilation - Basic build tools (make, gcc)
Compilation
Build all components with the provided Makefile:
This creates:
- energy_monitor
: The eBPF-based energy monitor
- energy_monitor_traditional.sh
: The traditional polling-based monitor
- compare_monitors.sh
: Script to compare both approaches
Running the eBPF Monitor
The eBPF monitor requires root privileges to attach to kernel tracepoints:
# Monitor all processes for 10 seconds with 15W CPU power
sudo ./energy_monitor -d 10 -p 15.0
# Monitor with verbose output
sudo ./energy_monitor -v -d 10
# Continuous monitoring (Ctrl+C to stop)
sudo ./energy_monitor
Example output:
Energy monitor started... Hit Ctrl-C to end.
CPU Power: 15.00 W
Running for 10 seconds
=== Energy Usage Summary ===
PID COMM Runtime (ms) Energy (mJ)
---------- ---------------- --------------- ---------------
39716 firefox 541.73 8.1260
19845 node 67.71 1.0157
39719 vscode 63.15 0.9472
29712 chrome 13.34 0.2000
...
Total CPU time: 2781.52 ms
Total estimated energy: 0.0417 J (41.7229 mJ)
CPU power setting: 15.00 W
Running the Traditional Monitor
The traditional monitor uses /proc
sampling and runs without special privileges:
# Monitor for 10 seconds with verbose output
./energy_monitor_traditional.sh -d 10 -v
# Adjust sampling interval (default 100ms)
./energy_monitor_traditional.sh -d 10 -i 0.05
Comparing Both Approaches
Use the comparison script to see the differences:
# Basic comparison
sudo ./compare_monitors.sh -d 10
# With a CPU workload
sudo ./compare_monitors.sh -d 10 -w "stress --cpu 2 --timeout 10"
Example comparison output:
Comparison Results
==================
Metric Traditional eBPF
------------------------- --------------- ---------------
Total Energy (J) 1.050000 0.0288
Monitoring Time (s) 5.112031 4.500215
Samples/Events 50 Continuous
Performance Analysis:
- Traditional monitoring overhead: 13.00% compared to eBPF
- eBPF provides per-context-switch granularity
- Traditional samples at fixed intervals (100ms)
Understanding Energy Monitoring Trade-offs
While our energy monitor provides valuable insights, it's important to understand its limitations and trade-offs:
Accuracy Considerations
Our energy monitoring model employs a simplified approach using the formula: Energy = CPU_Power × CPU_Time. While this provides valuable comparative insights, it doesn't account for several dynamic factors that affect real power consumption.
Frequency scaling represents a significant limitation as modern CPUs change frequency dynamically based on workload and thermal conditions. Different idle states (C-states) also consume varying amounts of power, from near-zero in deep sleep to significant standby power in shallow idle states. Additionally, workload characteristics matter because some instructions (particularly vector operations and memory-intensive tasks) consume more power per cycle than simple arithmetic operations.
The model also overlooks shared resource consumption from cache, memory controllers, and I/O subsystems that contribute to total system power but aren't directly attributable to CPU execution time.
For production deployments requiring higher accuracy, enhancements would include reading hardware performance counters for actual power measurements, tracking frequency changes through DVFS events, modeling different instruction types based on performance counters, and incorporating memory and I/O activity metrics from the broader system.
When to Use Each Approach
The choice between eBPF and traditional monitoring depends on your specific requirements and constraints.
eBPF monitoring excels when you need accurate CPU time tracking, particularly for short-lived processes that traditional sampling might miss entirely. Its minimal measurement overhead makes it ideal for production environments where the monitoring tool itself shouldn't impact the workload being measured. eBPF is particularly valuable for comparative analysis between processes, where relative accuracy matters more than absolute precision.
Traditional monitoring remains appropriate when eBPF isn't available due to permission restrictions or older kernel versions lacking BTF support. It provides a simple, portable solution that requires no special privileges and works across different platforms. For monitoring long-running, stable workloads where approximate measurements are sufficient, traditional approaches offer adequate insight with simpler deployment requirements.
Practical Use Cases and Deployment Scenarios
Understanding when and how to deploy eBPF energy monitoring helps maximize its value. Here are real-world scenarios where it excels:
Data Center Energy Optimization
Modern data centers operate under strict power budgets and cooling constraints where eBPF energy monitoring provides critical operational capabilities. Workload placement becomes intelligent when schedulers understand the energy profile of different applications, enabling balanced power consumption across racks while avoiding thermal hot spots and maximizing overall efficiency.
During peak demand periods, power capping systems can leverage real-time energy attribution to identify and selectively throttle the most power-hungry processes without impacting critical services. This surgical approach maintains service levels while staying within power infrastructure limits.
For cloud providers, billing and chargeback accuracy drives customer behavior toward more efficient code. When customers can see the actual energy cost of their workloads, they have direct financial incentives to optimize their applications for energy efficiency.
Mobile and Edge Computing
Battery-powered devices present unique energy constraints where precise monitoring becomes essential for user experience and device longevity. App energy profiling empowers developers with exact energy consumption data during different operations, enabling targeted optimizations that can significantly extend battery life without sacrificing functionality.
Operating systems benefit from background task management intelligence, where historical energy consumption patterns inform decisions about which background tasks to allow or defer. This prevents energy-hungry background processes from draining batteries while maintaining essential services.
In devices without active cooling, thermal management becomes critical as energy monitoring helps predict thermal buildup before throttling occurs. By understanding energy patterns, the system can proactively manage workloads to maintain consistent performance within thermal limits.
Development and CI/CD Integration
Integrating energy monitoring into development workflows creates a continuous feedback loop that prevents efficiency regressions from reaching production. Energy regression testing becomes automated through CI/CD pipelines that flag code changes increasing energy consumption beyond predefined thresholds, treating energy efficiency as a first-class software quality metric.
Performance/watt optimization provides developers with visibility into the true cost of performance improvements. Some optimizations may increase speed while dramatically increasing energy consumption, and others may achieve better efficiency with minimal performance impact. This visibility enables informed architectural decisions that balance speed and efficiency based on actual workload requirements.
Green software metrics integration allows organizations to track and report energy efficiency as part of sustainability initiatives. Regular measurement provides concrete data for environmental impact reporting while creating accountability for software teams to consider energy efficiency in their development practices.
Research and Education
eBPF energy monitoring serves as a powerful research and educational tool that bridges the gap between theoretical understanding and practical system behavior. Algorithm comparison becomes rigorous when researchers can measure energy efficiency differences between approaches under production-realistic conditions, providing empirical data that complements theoretical complexity analysis.
System behavior analysis reveals complex interactions between different components from an energy perspective, uncovering optimization opportunities that aren't apparent when looking at performance metrics alone. These insights drive system design decisions that consider the total cost of ownership, including operational energy costs.
As a teaching tool, energy monitoring makes abstract concepts tangible by showing students the immediate energy impact of their code. When algorithmic complexity discussions are paired with real energy measurements, students develop intuition about the practical implications of their design choices beyond just computational efficiency.
Extending the Energy Monitor
The current implementation provides a solid foundation for building more sophisticated energy monitoring capabilities. Several enhancement directions offer significant value for different deployment scenarios.
Extension Area | Implementation Approach | Value Proposition |
---|---|---|
Hardware Counter Integration | Integrate RAPL counters via PERF_TYPE_POWER events |
Replace estimation with actual hardware measurements |
Per-Core Power Modeling | Track core assignment and model P-core vs E-core differences | Accurate attribution on heterogeneous processors |
Workload Classification | Classify CPU-intensive, memory-bound, I/O-bound, and idle patterns | Enable workload-specific power optimization |
Container Runtime Integration | Aggregate energy by container/pod for Kubernetes environments | Cloud-native energy attribution and billing |
Real-time Visualization | Web dashboard with live energy consumption graphs | Immediate feedback for energy optimization |
Hardware counter integration represents the most impactful enhancement, replacing our simplified estimation model with actual hardware measurements through RAPL (Running Average Power Limit) interfaces. Modern processors provide detailed energy counters that can be read via performance events, offering precise energy measurements down to individual CPU packages.
// Read RAPL counters for actual energy measurements
struct perf_event_attr attr = {
.type = PERF_TYPE_POWER,
.config = PERF_COUNT_HW_POWER_PKG,
};
Per-core power modeling becomes essential on heterogeneous processors where performance cores and efficiency cores have dramatically different power characteristics. Tracking which core each process runs on enables accurate energy attribution:
// Different cores may have different power characteristics
double core_power[MAX_CPUS] = {15.0, 15.0, 10.0, 10.0}; // P-cores vs E-cores
Workload classification enhances energy monitoring by recognizing different computational patterns and their associated energy costs:
enum workload_type {
WORKLOAD_CPU_INTENSIVE,
WORKLOAD_MEMORY_BOUND,
WORKLOAD_IO_BOUND,
WORKLOAD_IDLE
};
Troubleshooting Common Issues
When deploying eBPF energy monitoring, you might encounter these common issues:
Permission Denied
If you see permission errors when running the eBPF monitor:
# Check if BPF is enabled
sudo sysctl kernel.unprivileged_bpf_disabled
# Enable BPF for debugging (not recommended for production)
sudo sysctl kernel.unprivileged_bpf_disabled=0
Missing BTF Information
If the kernel lacks BTF (BPF Type Format) data:
# Check for BTF support
ls /sys/kernel/btf/vmlinux
# On older kernels, you may need to generate BTF
# or use a kernel with CONFIG_DEBUG_INFO_BTF=y
High CPU Usage
If the monitor itself causes high CPU usage:
- Reduce the ring buffer size in the eBPF program
- Increase the batch size for reading events
- Filter events in the kernel to reduce volume
Missing Processes
If some processes aren't being tracked:
- Check if they're running in a different PID namespace
- Ensure the monitor starts before the processes
- Verify the hash map size is sufficient
Future Directions
The field of eBPF-based energy monitoring is rapidly evolving. Here are exciting developments on the horizon:
Integration with Hardware Accelerators
As GPUs, TPUs, and other accelerators become common, extending eBPF monitoring to track their energy consumption will provide complete system visibility.
Machine Learning for Power Prediction
Using eBPF-collected data to train models that predict future power consumption based on workload patterns, enabling proactive power management.
Standardization Efforts
Work is underway to standardize eBPF energy monitoring interfaces, making it easier to build portable tools that work across different platforms.
Carbon-Aware Computing
Combining energy monitoring with real-time carbon intensity data to automatically shift workloads to times and locations with cleaner energy.
References and Further Reading
To dive deeper into the topics covered in this tutorial:
Energy and Power Management
- Intel Running Average Power Limit (RAPL): https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/running-average-power-limit-energy-reporting.html
- Linux Power Management: https://www.kernel.org/doc/html/latest/admin-guide/pm/index.html
- ACPI Specification: https://uefi.org/specifications
Related Projects
- Kepler (Kubernetes Efficient Power Level Exporter): https://sustainable-computing.io/
- Scaphandre Power Measurement: https://github.com/hubblo-org/scaphandre
- PowerTOP: https://github.com/fenrus75/powertop
- cpufreq_ext eBPF Governor: https://lwn.net/Articles/991991/
- Wattmeter (HotCarbon '24): https://www.asafcidon.com/uploads/5/9/7/0/59701649/energy-aware-ebpf.pdf
Academic Papers
- "Energy-Aware Process Scheduling in Linux" (HotCarbon '24)
- "DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based infrastructures"
- "eBPF-based Energy-Aware Scheduling" research papers
The complete code for this tutorial is available at: https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/48-energy
For more eBPF tutorials and projects, visit: https://eunomia.dev/tutorials/