eBPF_primer

Introduction

In this post, we will learn about a useful tool called the bpftool which is a command-line utility in Linux for interacting with eBPF programs and maps. It allows you to perform various operations such as loading and attaching programs, manipulating maps, and retrieving information about eBPF objects.

I highly recommend reading this post first, which explains the details discussed here, since this post is intended to serve as more of a cheat sheet rather than a very detailed one, in hopes that you can find most of the commands you need without searching all over the internet.

I will provide a sample output of the commands run in this post to accommodate to people who cannot replicate these to get a sense of them, nonetheless.

Let’s dig in.

The bpftool

The bpftool can be utilized in various ways. Here are some common use cases of it:

  1. Listing BPF Programs and Maps: View a list of loaded eBPF programs and maps on your system. It provides information such as program IDs, names, types, and associated maps.

  2. Loading and Unloading BPF Programs: Allows you to load eBPF programs into the kernel. You can specify the program type, BPF bytecode, and associated maps. It also provides options to unload and detach programs from the kernel.

  3. Inspecting BPF Objects: Provides detailed information about eBPF programs, maps, and other BPF objects. It allows you to retrieve attributes, statistics, and configuration parameters of these objects.

  4. Managing BPF Maps: Provides functionality to create, update, and delete eBPF maps. You can specify the map type, key size, value size, and other relevant parameters while creating or modifying a map.

  5. Debugging and Tracing: Offers features for debugging and tracing eBPF programs. It enables you to attach to BPF programs and monitor their execution, i.e. printing debug output and tracing events.

For more information, you can always refer to bpftool’s documentation or run man bpftool in your terminal.

Installation

There are two ways to install the bpftool; the painful way and the proper way.

The Painful Way

bpftool is apt-installable and comes with linux-tools-generic-$(uname -r) and linux-cloud-tools-$(uname -r) specific to the kernel you are running, but after each kernel upgrade, you will see these errors:

> bpftool -V
WARNING: bpftool not found for kernel 5.15.0-88

  You may need to install the following packages for this specific kernel:
    linux-tools-5.15.0-88-generic
    linux-cloud-tools-5.15.0-88-generic

  You may also want to install one of the following packages to keep up to date:
    linux-tools-generic
    linux-cloud-tools-generic

So, you have to keep removing and installing the linux-tools-generic-* and linux-cloud-tools-* packages.

Another problem with this method is that it installs very old versions of the bpftool that lack a lot of features.

The Proper Way

The proper way of installing it is to compile it from source.

Please make sure your system already has make, libelf-dev, and build-essential packages so that the compilation succeeds. Otherwise, you can just sudo apt install them (or through other package managers).

Here is a Bash script that you can use or just run these line by line:

#!/bin/bash

# exit the script if there is an error
set -e

# remove the existing binary, to be safe
rm $(which bpftool)

# go to the tmp directory
cd /tmp

# clone the repository
git clone --recurse-submodules https://github.com/libbpf/bpftool.git

# head to the src directory
cd bpftool/src

# compile and install the bpftool as root since
# it needs to copy the binary to /usr/local/sbin/ directory
sudo make install

# make a symbolic link from <source> to <destination>
# so that all other programs can find the bpftool
sudo ln -s /usr/local/sbin/bpftool /usr/sbin/bpftool

A Hurdle on Older Machines

Initially, running sudo make install on my machine would show this:

TheGrayNode.io~$ make install
...             libbfd: [ on  ]
...    clang-bpf-co-re: [ OFF ]
...               llvm: [ on  ]
...             libcap: [ on  ]

Since I needed the CO-RE feature and was sure that my machine comes with BTF debug information, I had to dig into the bpftool’s Makefile to see why it had decided to turn off this feature.

The reason was that my machine had come with Clang version 10. Although, according to the internet, Clang 10 should be fine, which was not true in my case.

So, I had to use a newer version (clang-12 in this case) and run it like:

cd bpftool/src/
sudo su
CLANG=clang-12 make install

Clang 12 is just what I had installed already. Now (Jan 2024), Clang 17 is the latest version. Feel free to use newer versions of it as you see fit.

Hopefully, your run will be smoother than mine.

The bpftool in Action

Let’s go over a few examples to showcase how the bpftool can be leveraged for tc and XDP programs.

In specific, we will go over:

  • Show commands
  • Load and attach commands
  • Map update commands
  • Unload and detach commands

I use flat as the program name to comply with the eBPF program we developed in previous posts. However, flat uses tc and some examples below are XDP. This is just for demonstration purposes.

I assume, flat or an eBPF program you have is already running.

Show Commands

Here are some of the common commands that display information about BPF programs.

Show eBPF trace logs

This is useful if we are hacking around or do not have a frontend (user space) program that displays the information.

sudo bpftool prog trace log

eBPF tracing output gets sent to a pseudofile at /sys/kernel/debug/tracing/trace_pipe - an alternative to using bpftool is to simply cat this file.

Sample output:

ping-277466  [000] d.s1. 184275.151823: bpf_trace_printk: Got a packet!
ping-277466  [000] d.s1. 184275.151849: bpf_trace_printk: Got a packet!
ping-277466  [000] d.s1. 184276.170139: bpf_trace_printk: Got a packet!
ping-277466  [000] d.s1. 184276.170164: bpf_trace_printk: Got a packet!

Learn more about the meaning of this output here.

Show all the eBPF programs currently loaded into the kernel

sudo bpftool prog list

Sample output:

35: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2024-01-04T18:10:27+0100  uid 0
xlated 64B  jited 54B  memlock 4096B

[...]

58: sched_cls  name flat  tag 416ec964222ebc17  gpl
loaded_at 2024-01-04T22:43:05+0100  uid 0
xlated 1008B  jited 602B  memlock 4096B  map_ids 3
btf_id 188

Show information about a specific eBPF program with id, name or tag

sudo bpftool prog show id 58
sudo bpftool prog show name flat
sudo bpftool prog show tag 416ec964222ebc17

All of the commands above produce the same output:

  58: sched_cls  name flat  tag 416ec964222ebc17  gpl
  loaded_at 2024-01-04T22:43:05+0100  uid 0
  xlated 1008B  jited 602B  memlock 4096B  map_ids 3
  btf_id 188
sudo bpftool net list

Sample output:

xdp:

tc:
eth0(2) clsact/ingress id 58
eth0(2) clsact/ingress id 58
eth0(2) clsact/egress id 58
eth0(2) clsact/egress id 58

flow_dissector:

netfilter:

Show all eBPF maps

sudo bpftool map list
# is the same as
sudo bpftool map show

Sample output:

622: array  name event_storage_m  flags 0x0
        key 4B  value 8392B  max_entries 512  memlock 4300800B
        pids wdavdaemon(625)

649: ringbuf  name pipe  flags 0x0
        key 0B  value 0B  max_entries 524288  memlock 0B
        pids cmd(862305)

716: array  name pid_iter.rodata  flags 0x480
        key 4B  value 4B  max_entries 1  memlock 4096B
        btf_id 270  frozen
        pids bpftool(867021)

Show information of an eBPF map

sudo bpftool map show id $MAP_ID

Sample output:

649: ringbuf  name pipe  flags 0x0
        key 0B  value 0B  max_entries 524288  memlock 0B
        pids cmd(862305)

Dump contents of an eBPF map

sudo bpftool map dump name pipe
# or with the ID
sudo bpftool map dump id 123

Sample output:

key:
5c 00 00 00
value:
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
[...]

Show the value that corresponds to a particular key in a map

Note that you have to specify each of the 8 bytes of the key individually, starting with the least significant. You can use hex notation if you prefer. In case if using decimal, it will be converted to hex automatically as depicted in the example below.

sudo bpftool map lookup id $MAP_ID key 59 00 00 00

Sample output:

key:
3b 00 00 00
value:
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00

Show the bytecode instructions for eBPF programs

sudo bpftool prog dump xlated name flat

Sample output:

  int flat(struct __sk_buff * skb):
; int flat(struct __sk_buff* skb) {
  0: (bf) r6 = r1
  1: (b7) r7 = 0
; if (bpf_skb_pull_data(skb, 0) < 0) {
  2: (b7) r2 = 0
  3: (85) call bpf_skb_pull_data#9158080
; if (bpf_skb_pull_data(skb, 0) < 0) {
  4: (6d) if r7 s> r0 goto pc+119
; if (skb->pkt_type == PACKET_BROADCAST || skb->pkt_type == PACKET_MULTICAST) {
  5: (71) r1 = *(u8 *)(r6 +128)
  6: (54) w1 &= 7
; if (skb->pkt_type == PACKET_BROADCAST || skb->pkt_type == PACKET_MULTICAST) {
  7: (07) r1 += -1
  8: (67) r1 <<= 32
  9: (77) r1 >>= 32
10: (b7) r2 = 2
11: (2d) if r2 > r1 goto pc+112
; void* tail = (void*)(long)skb->data_end; // End of the packet data
12: (79) r2 = *(u64 *)(r6 +80)
; void* head = (void*)(long)skb->data;     // Start of the packet data
13: (79) r1 = *(u64 *)(r6 +200)
; if (head + sizeof(struct ethhdr) > tail) { // Not an Ethernet frame
14: (bf) r3 = r1
15: (07) r3 += 14
; if (head + sizeof(struct ethhdr) > tail) { // Not an Ethernet frame
16: (2d) if r3 > r2 goto pc+107
17: (b7) r3 = 0
; struct packet_t pkt = { 0 };
18: (7b) *(u64 *)(r10 -16) = r3
19: (7b) *(u64 *)(r10 -24) = r3
20: (7b) *(u64 *)(r10 -32) = r3
21: (7b) *(u64 *)(r10 -40) = r3
22: (7b) *(u64 *)(r10 -48) = r3
; switch (bpf_ntohs(eth->h_proto)) {
23: (71) r4 = *(u8 *)(r1 +12)
24: (71) r3 = *(u8 *)(r1 +13)
25: (67) r3 <<= 8
26: (4f) r3 |= r4
27: (dc) r3 = be16 r3
; switch (bpf_ntohs(eth->h_proto)) {
28: (15) if r3 == 0x86dd goto pc+19
29: (55) if r3 != 0x800 goto pc+94
; if (head + (*offset) > tail) { // If the next layer is not IP, let the packet pass
30: (bf) r3 = r1
31: (07) r3 += 34
; if (head + (*offset) > tail) { // If the next layer is not IP, let the packet pass
32: (2d) if r3 > r2 goto pc+91
; if (ip->protocol != IPPROTO_TCP && ip->protocol != IPPROTO_UDP) {
33: (71) r3 = *(u8 *)(r1 +23)
; if (ip->protocol != IPPROTO_TCP && ip->protocol != IPPROTO_UDP) {
34: (15) if r3 == 0x11 goto pc+1
35: (55) if r3 != 0x6 goto pc+88
;
36: (bf) r3 = r1
37: (07) r3 += 23
; pkt->src_ip.in6_u.u6_addr32[3] = ip->saddr;
38: (61) r4 = *(u32 *)(r1 +26)
; pkt->src_ip.in6_u.u6_addr32[3] = ip->saddr;
39: (63) *(u32 *)(r10 -36) = r4
; pkt->dst_ip.in6_u.u6_addr32[3] = ip->daddr;
40: (61) r4 = *(u32 *)(r1 +30)
41: (b7) r5 = 65535
; pkt->dst_ip.in6_u.u6_addr16[5] = 0xffff;
42: (6b) *(u16 *)(r10 -22) = r5
; pkt->src_ip.in6_u.u6_addr16[5] = 0xffff;
43: (6b) *(u16 *)(r10 -38) = r5
; pkt->dst_ip.in6_u.u6_addr32[3] = ip->daddr;
44: (63) *(u32 *)(r10 -20) = r4
45: (b7) r4 = 34
46: (b7) r5 = 22
47: (05) goto pc+30
; if (head + (*offset) > tail) {
48: (bf) r3 = r1
49: (07) r3 += 54
; if (head + (*offset) > tail) {
50: (2d) if r3 > r2 goto pc+73
; if (ipv6->nexthdr != IPPROTO_TCP && ipv6->nexthdr != IPPROTO_UDP) {
51: (71) r3 = *(u8 *)(r1 +20)
; if (ipv6->nexthdr != IPPROTO_TCP && ipv6->nexthdr != IPPROTO_UDP) {
52: (15) if r3 == 0x11 goto pc+1
53: (55) if r3 != 0x6 goto pc+70
;
54: (bf) r3 = r1
55: (07) r3 += 20
; pkt->src_ip = ipv6->saddr;
56: (61) r4 = *(u32 *)(r1 +34)
57: (67) r4 <<= 32
58: (61) r5 = *(u32 *)(r1 +30)
59: (4f) r4 |= r5
60: (7b) *(u64 *)(r10 -40) = r4
61: (61) r4 = *(u32 *)(r1 +26)
62: (67) r4 <<= 32
63: (61) r5 = *(u32 *)(r1 +22)
64: (4f) r4 |= r5
65: (7b) *(u64 *)(r10 -48) = r4
; pkt->dst_ip = ipv6->daddr;
66: (61) r4 = *(u32 *)(r1 +50)
67: (67) r4 <<= 32
68: (61) r5 = *(u32 *)(r1 +46)
69: (4f) r4 |= r5
70: (7b) *(u64 *)(r10 -24) = r4
71: (61) r4 = *(u32 *)(r1 +38)
72: (61) r5 = *(u32 *)(r1 +42)
73: (67) r5 <<= 32
74: (4f) r5 |= r4
75: (7b) *(u64 *)(r10 -32) = r5
76: (b7) r4 = 54
77: (b7) r5 = 21
;
78: (71) r3 = *(u8 *)(r3 +0)
79: (73) *(u8 *)(r10 -12) = r3
80: (bf) r0 = r1
81: (0f) r0 += r5
82: (71) r5 = *(u8 *)(r0 +0)
83: (73) *(u8 *)(r10 -11) = r5
; if (head + offset + sizeof(struct tcphdr) > tail || head + offset + sizeof(struct udphdr) > tail) {
84: (0f) r1 += r4
; if (head + offset + sizeof(struct tcphdr) > tail || head + offset + sizeof(struct udphdr) > tail) {
85: (bf) r4 = r1
86: (07) r4 += 20
; if (head + offset + sizeof(struct tcphdr) > tail || head + offset + sizeof(struct udphdr) > tail) {
87: (2d) if r4 > r2 goto pc+36
88: (bf) r4 = r1
89: (07) r4 += 8
90: (2d) if r4 > r2 goto pc+33
; switch (pkt->protocol) {
91: (15) if r3 == 0x11 goto pc+17
92: (55) if r3 != 0x6 goto pc+31
; if (tcp->syn) { // We have SYN or SYN/ACK
93: (69) r2 = *(u16 *)(r1 +12)
; if (tcp->syn) { // We have SYN or SYN/ACK
94: (57) r2 &= 512
; if (tcp->syn) { // We have SYN or SYN/ACK
95: (15) if r2 == 0x0 goto pc+13
; pkt->src_port = tcp->source;
96: (69) r2 = *(u16 *)(r1 +0)
; pkt->src_port = tcp->source;
97: (6b) *(u16 *)(r10 -16) = r2
; pkt->dst_port = tcp->dest;
98: (69) r2 = *(u16 *)(r1 +2)
; pkt->dst_port = tcp->dest;
99: (6b) *(u16 *)(r10 -14) = r2
; pkt->syn = tcp->syn;
100: (69) r2 = *(u16 *)(r1 +12)
; pkt->syn = tcp->syn;
101: (77) r2 >>= 9
102: (57) r2 &= 1
103: (73) *(u8 *)(r10 -10) = r2
; pkt->ack = tcp->ack;
104: (69) r1 = *(u16 *)(r1 +12)
; pkt->ack = tcp->ack;
105: (77) r1 >>= 12
106: (57) r1 &= 1
107: (73) *(u8 *)(r10 -9) = r1
108: (05) goto pc+4
; pkt->src_port = udp->source;
109: (69) r2 = *(u16 *)(r1 +0)
; pkt->src_port = udp->source;
110: (6b) *(u16 *)(r10 -16) = r2
; pkt->dst_port = udp->dest;
111: (69) r1 = *(u16 *)(r1 +2)
; pkt->dst_port = udp->dest;
112: (6b) *(u16 *)(r10 -14) = r1
;
113: (85) call bpf_ktime_get_ns#160896
114: (7b) *(u64 *)(r10 -8) = r0
115: (bf) r4 = r10
116: (07) r4 += -48
; if (bpf_perf_event_output(skb, &pipe, BPF_F_CURRENT_CPU, &pkt, sizeof(pkt)) < 0) {
117: (bf) r1 = r6
118: (18) r2 = map[id:3]
120: (18) r3 = 0xffffffff
122: (b7) r5 = 48
123: (85) call bpf_skb_event_output#9137616
; }
124: (b7) r0 = 0
125: (95) exit

TC Specific Show Commands

Here are the commands that are specific to tc.

Show qdiscs associated with an interface
sudo tc qdisc show dev eth0

Sample output:

qdisc mq 0: root
qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc clsact ffff: parent ffff:fff1
Show qdisc filters attached to an interface

For ingress flows:

sudo tc filter show dev eth0 ingress

Sample output:

filter protocol ipv6 pref 49151 bpf chain 0
filter protocol ipv6 pref 49151 bpf chain 0 handle 0xffff0000 direct-action not_in_hw id 8168 tag 52643a7d6c14b2c8 jited
filter protocol ip pref 49152 bpf chain 0
filter protocol ip pref 49152 bpf chain 0 handle 0xffff0000 direct-action not_in_hw id 8168 tag 52643a7d6c14b2c8 jited

For egress flows:

sudo tc filter show dev eth0 egress

Sample output:

filter protocol ipv6 pref 49151 bpf chain 0
filter protocol ipv6 pref 49151 bpf chain 0 handle 0xffff0000 direct-action not_in_hw id 8168 tag 52643a7d6c14b2c8 jited
filter protocol ip pref 49152 bpf chain 0
filter protocol ip pref 49152 bpf chain 0 handle 0xffff0000 direct-action not_in_hw id 8168 tag 52643a7d6c14b2c8 jited

XDP Specific Show Commands

Use the ip command to see the XDP programs that are attached to the an interface:

ip a show dev eth0

The output is similar to:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric/id:135 qdisc mq state UP group default qlen 1000
    link/ether 00:00:00:00:c6:5f brd ff:ff:ff:ff:ff:ff
    inet 192.168.96.7/24 brd 192.168.96.63 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6245:bdff:fe87:c65f/64 scope link
       valid_lft forever preferred_lft forever

Notice that in the first line of the output, you see xdpgeneric/id:135 where 135 is the eBPF program ID.

Load and Attach eBPF programs

This section explains the commands that help you load and attach tc and XDP eBPF programs manually.

If you would like to follow along, this section explains the prerequisites.

Load and Attach eBPF TC programs

  1. Create a qdisc for an interface, eth0 in this case:

    sudo tc qdisc add dev eth0 clsact
    
  2. Attach the eBPF program to ingress, egress or both traffic directions:

    sudo tc filter add dev eth0 ingress bpf direct-action obj flat.o sec tc
    sudo tc filter add dev eth0 egress bpf direct-action obj flat.o sec tc
    

flat.o is the compiled kernel space program aka the BPF bytecode.

Simple as that.

TC Hardware Offload

Similar to XDP, tc programs can also be offloaded to the NIC (if the NIC supports hardware offload).

We need to install the ethtool in order to leverage hardware offload.

  1. Install ethtool:

    sudo apt install ethtool
    
  2. Check hardware offload support on an interface:

    ethtool -k eth0 | grep -i offload
    

    Sample output:

    tcp-segmentation-offload: on
    generic-segmentation-offload: on
    generic-receive-offload: on
    large-receive-offload: off
    rx-vlan-offload: on [fixed]
    tx-vlan-offload: on [fixed]
    l2-fwd-offload: off [fixed]
    hw-tc-offload: off [fixed]
    esp-hw-offload: off [fixed]
    esp-tx-csum-hw-offload: off [fixed]
    rx-udp_tunnel-port-offload: off [fixed]
    tls-hw-tx-offload: off [fixed]
    tls-hw-rx-offload: off [fixed]
    macsec-hw-offload: off [fixed]
    hsr-tag-ins-offload: off [fixed]
    hsr-tag-rm-offload: off [fixed]
    hsr-fwd-offload: off [fixed]
    hsr-dup-offload: off [fixed]
    

    If you see fixed in front of a setting, it means the NIC driver does not support that feature. In this case, hw-tc-offload is off and fixed on this NIC and hardware offload cannot be utilized.

    In case you have the feature support, proceed with the next steps, otherwise, you can skip the hardware offload.

  3. Enable tc hardware offloading on your desired NIC:

    ethtool -K eth0 hw-tc-offload on
    
  4. Create a qdisc:

    sudo tc qdisc add dev eth0 clsact
    
  5. Attach the eBPF program to ingress, egress or both traffic directions and offload to NIC using skip_sw:

    sudo tc filter add dev eth0 ingress bpf skip_sw  direct-action obj flat.o sec tc
    sudo tc filter add dev eth0 egress bpf skip_sw direct-action obj flat.o sec tc
    

Note: By default tc will try to offload filters to hardware if possible but skip_sw forces the offload as described in the man page.

Load and Attach eBPF XDP programs

  1. Load the eBPF byte code into the kernel and link it to the filesystem so that it may be identified as /sys/fs/bpf/flat:

    sudo bpftool prog load flat.o /sys/fs/bpf/flat
    
  2. At this point the program is not associated with any events that would trigger it. Associate or attach eBPF program to eth0 network interface:

    sudo bpftool net attach xdp name flat dev eth0
    
  3. Additionally, we can also use the ip command to load our XDP eBPF command:

    sudo ip link set dev eth0 xdp obj flat.o sec xdp
    

Update Commands

The most common way that the running eBPF programs need updating is through maps which are basically a communication channel between these programs.

maps can be updated through user and kernel space programs as well as manually. Here is how you can update a map by hand:

sudo bpftool map update id $MAP_ID key 5 0 0 0 0 0 0 0 value 0 0 0 0 0 0 0 1

Keep in mind that the keys and values you provide will be converted to hex.

Unload and Detach eBPF Programs

This section depicts the commands to unload and detach tc and XDP eBPF programs manually.

Unload and Detach eBPF TC programs

  1. Remove tc filters:

    sudo tc filter del dev eth0 ingress
    sudo tc filter del dev eth0 egress
    
  2. Delete the clsact qdisc:

    sudo tc qdisc del dev eth0 clsact
    

Note: Deleting a qdisc will remove its filters as well, so, we could just delete the clsact qdisc and skip the first step.

Unload and Detach eBPF XDP programs

  1. Unload the program:

    sudo rm /sys/fs/bpf/flat
    
  2. Detach the eBPF program from the NIC:

    sudo bpftool net detach xdp dev eth0
    

Conclusion

In this post, we displayed various ways in which we can interact with eBPF programs by leveraging the bpftool. Although, the intention of this post was not to explain all the nitty gritty but to act as more of a cheat sheet to ease our lives.

I hope it has been useful for you and thanks for reading.