17 minutes
An Introduction to the bpftool
Introduction
In this post, we will learn about a useful tool called the bpftool which is a command-line utility in Linux for interacting with eBPF programs and maps. It allows you to perform various operations such as loading and attaching programs, manipulating maps, and retrieving information about eBPF objects.
I highly recommend reading this post first, which explains the details discussed here, since this post is intended to serve as more of a cheat sheet rather than a very detailed one, in hopes that you can find most of the commands you need without searching all over the internet.
I will provide a sample output of the commands run in this post to accommodate to people who cannot replicate these to get a sense of them, nonetheless.
Let’s dig in.
The bpftool
The bpftool
can be utilized in various ways. Here are some common use cases of it:
-
Listing BPF Programs and Maps: View a list of loaded eBPF programs and maps on your system. It provides information such as program IDs, names, types, and associated maps.
-
Loading and Unloading BPF Programs: Allows you to load eBPF programs into the kernel. You can specify the program type, BPF bytecode, and associated maps. It also provides options to unload and detach programs from the kernel.
-
Inspecting BPF Objects: Provides detailed information about eBPF programs, maps, and other BPF objects. It allows you to retrieve attributes, statistics, and configuration parameters of these objects.
-
Managing BPF Maps: Provides functionality to create, update, and delete eBPF maps. You can specify the map type, key size, value size, and other relevant parameters while creating or modifying a map.
-
Debugging and Tracing: Offers features for debugging and tracing eBPF programs. It enables you to attach to BPF programs and monitor their execution, i.e. printing debug output and tracing events.
For more information, you can always refer to bpftool
’s documentation or run man bpftool in your terminal.
Installation
There are two ways to install the bpftool; the painful way and the proper way.
The Painful Way
bpftool is apt
-installable and comes with linux-tools-generic-$(uname -r)
and linux-cloud-tools-$(uname -r)
specific to the kernel you are running, but after each kernel upgrade, you will see these errors:
> bpftool -V
WARNING: bpftool not found for kernel 5.15.0-88
You may need to install the following packages for this specific kernel:
linux-tools-5.15.0-88-generic
linux-cloud-tools-5.15.0-88-generic
You may also want to install one of the following packages to keep up to date:
linux-tools-generic
linux-cloud-tools-generic
So, you have to keep removing and installing the linux-tools-generic-*
and linux-cloud-tools-*
packages.
Another problem with this method is that it installs very old versions of the bpftool
that lack a lot of features.
The Proper Way
The proper way of installing it is to compile it from source.
Please make sure your system already has make, libelf-dev, and build-essential packages so that the compilation succeeds. Otherwise, you can just
sudo apt install
them (or through other package managers).
Here is a Bash script that you can use or just run these line by line:
#!/bin/bash
# exit the script if there is an error
set -e
# remove the existing binary, to be safe
rm $(which bpftool)
# go to the tmp directory
cd /tmp
# clone the repository
git clone --recurse-submodules https://github.com/libbpf/bpftool.git
# head to the src directory
cd bpftool/src
# compile and install the bpftool as root since
# it needs to copy the binary to /usr/local/sbin/ directory
sudo make install
# make a symbolic link from <source> to <destination>
# so that all other programs can find the bpftool
sudo ln -s /usr/local/sbin/bpftool /usr/sbin/bpftool
A Hurdle on Older Machines
Initially, running sudo make install
on my machine would show this:
TheGrayNode.io~$ make install
... libbfd: [ on ]
... clang-bpf-co-re: [ OFF ]
... llvm: [ on ]
... libcap: [ on ]
Since I needed the CO-RE feature and was sure that my machine comes with BTF debug information, I had to dig into the bpftool’s Makefile to see why it had decided to turn off this feature.
The reason was that my machine had come with Clang version 10. Although, according to the internet, Clang 10 should be fine, which was not true in my case.
So, I had to use a newer version (clang-12
in this case) and run it like:
cd bpftool/src/
sudo su
CLANG=clang-12 make install
Clang 12 is just what I had installed already. Now (Jan 2024), Clang 17 is the latest version. Feel free to use newer versions of it as you see fit.
Hopefully, your run will be smoother than mine.
The bpftool in Action
Let’s go over a few examples to showcase how the bpftool
can be leveraged for tc
and XDP
programs.
In specific, we will go over:
- Show commands
- Load and attach commands
- Map update commands
- Unload and detach commands
I use
flat
as the program name to comply with the eBPF program we developed in previous posts. However,flat
usestc
and some examples below areXDP
. This is just for demonstration purposes.
I assume, flat or an eBPF program you have is already running.
Show Commands
Here are some of the common commands that display information about BPF programs.
Show eBPF trace logs
This is useful if we are hacking around or do not have a frontend (user space) program that displays the information.
sudo bpftool prog trace log
eBPF tracing output gets sent to a pseudofile at
/sys/kernel/debug/tracing/trace_pipe
- an alternative to usingbpftool
is to simply cat this file.
Sample output:
ping-277466 [000] d.s1. 184275.151823: bpf_trace_printk: Got a packet!
ping-277466 [000] d.s1. 184275.151849: bpf_trace_printk: Got a packet!
ping-277466 [000] d.s1. 184276.170139: bpf_trace_printk: Got a packet!
ping-277466 [000] d.s1. 184276.170164: bpf_trace_printk: Got a packet!
Learn more about the meaning of this output here.
Show all the eBPF programs currently loaded into the kernel
sudo bpftool prog list
Sample output:
35: cgroup_skb tag 6deef7357e7b4530 gpl
loaded_at 2024-01-04T18:10:27+0100 uid 0
xlated 64B jited 54B memlock 4096B
[...]
58: sched_cls name flat tag 416ec964222ebc17 gpl
loaded_at 2024-01-04T22:43:05+0100 uid 0
xlated 1008B jited 602B memlock 4096B map_ids 3
btf_id 188
Show information about a specific eBPF program with id, name or tag
sudo bpftool prog show id 58
sudo bpftool prog show name flat
sudo bpftool prog show tag 416ec964222ebc17
All of the commands above produce the same output:
58: sched_cls name flat tag 416ec964222ebc17 gpl
loaded_at 2024-01-04T22:43:05+0100 uid 0
xlated 1008B jited 602B memlock 4096B map_ids 3
btf_id 188
Show all the network-related eBPF programs
sudo bpftool net list
Sample output:
xdp:
tc:
eth0(2) clsact/ingress id 58
eth0(2) clsact/ingress id 58
eth0(2) clsact/egress id 58
eth0(2) clsact/egress id 58
flow_dissector:
netfilter:
Show all eBPF maps
sudo bpftool map list
# is the same as
sudo bpftool map show
Sample output:
622: array name event_storage_m flags 0x0
key 4B value 8392B max_entries 512 memlock 4300800B
pids wdavdaemon(625)
649: ringbuf name pipe flags 0x0
key 0B value 0B max_entries 524288 memlock 0B
pids cmd(862305)
716: array name pid_iter.rodata flags 0x480
key 4B value 4B max_entries 1 memlock 4096B
btf_id 270 frozen
pids bpftool(867021)
Show information of an eBPF map
sudo bpftool map show id $MAP_ID
Sample output:
649: ringbuf name pipe flags 0x0
key 0B value 0B max_entries 524288 memlock 0B
pids cmd(862305)
Dump contents of an eBPF map
sudo bpftool map dump name pipe
# or with the ID
sudo bpftool map dump id 123
Sample output:
key:
5c 00 00 00
value:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[...]
Show the value that corresponds to a particular key in a map
Note that you have to specify each of the 8 bytes of the key individually, starting with the least significant. You can use hex notation if you prefer. In case if using decimal, it will be converted to hex automatically as depicted in the example below.
sudo bpftool map lookup id $MAP_ID key 59 00 00 00
Sample output:
key:
3b 00 00 00
value:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Show the bytecode instructions for eBPF programs
sudo bpftool prog dump xlated name flat
Sample output:
int flat(struct __sk_buff * skb):
; int flat(struct __sk_buff* skb) {
0: (bf) r6 = r1
1: (b7) r7 = 0
; if (bpf_skb_pull_data(skb, 0) < 0) {
2: (b7) r2 = 0
3: (85) call bpf_skb_pull_data#9158080
; if (bpf_skb_pull_data(skb, 0) < 0) {
4: (6d) if r7 s> r0 goto pc+119
; if (skb->pkt_type == PACKET_BROADCAST || skb->pkt_type == PACKET_MULTICAST) {
5: (71) r1 = *(u8 *)(r6 +128)
6: (54) w1 &= 7
; if (skb->pkt_type == PACKET_BROADCAST || skb->pkt_type == PACKET_MULTICAST) {
7: (07) r1 += -1
8: (67) r1 <<= 32
9: (77) r1 >>= 32
10: (b7) r2 = 2
11: (2d) if r2 > r1 goto pc+112
; void* tail = (void*)(long)skb->data_end; // End of the packet data
12: (79) r2 = *(u64 *)(r6 +80)
; void* head = (void*)(long)skb->data; // Start of the packet data
13: (79) r1 = *(u64 *)(r6 +200)
; if (head + sizeof(struct ethhdr) > tail) { // Not an Ethernet frame
14: (bf) r3 = r1
15: (07) r3 += 14
; if (head + sizeof(struct ethhdr) > tail) { // Not an Ethernet frame
16: (2d) if r3 > r2 goto pc+107
17: (b7) r3 = 0
; struct packet_t pkt = { 0 };
18: (7b) *(u64 *)(r10 -16) = r3
19: (7b) *(u64 *)(r10 -24) = r3
20: (7b) *(u64 *)(r10 -32) = r3
21: (7b) *(u64 *)(r10 -40) = r3
22: (7b) *(u64 *)(r10 -48) = r3
; switch (bpf_ntohs(eth->h_proto)) {
23: (71) r4 = *(u8 *)(r1 +12)
24: (71) r3 = *(u8 *)(r1 +13)
25: (67) r3 <<= 8
26: (4f) r3 |= r4
27: (dc) r3 = be16 r3
; switch (bpf_ntohs(eth->h_proto)) {
28: (15) if r3 == 0x86dd goto pc+19
29: (55) if r3 != 0x800 goto pc+94
; if (head + (*offset) > tail) { // If the next layer is not IP, let the packet pass
30: (bf) r3 = r1
31: (07) r3 += 34
; if (head + (*offset) > tail) { // If the next layer is not IP, let the packet pass
32: (2d) if r3 > r2 goto pc+91
; if (ip->protocol != IPPROTO_TCP && ip->protocol != IPPROTO_UDP) {
33: (71) r3 = *(u8 *)(r1 +23)
; if (ip->protocol != IPPROTO_TCP && ip->protocol != IPPROTO_UDP) {
34: (15) if r3 == 0x11 goto pc+1
35: (55) if r3 != 0x6 goto pc+88
;
36: (bf) r3 = r1
37: (07) r3 += 23
; pkt->src_ip.in6_u.u6_addr32[3] = ip->saddr;
38: (61) r4 = *(u32 *)(r1 +26)
; pkt->src_ip.in6_u.u6_addr32[3] = ip->saddr;
39: (63) *(u32 *)(r10 -36) = r4
; pkt->dst_ip.in6_u.u6_addr32[3] = ip->daddr;
40: (61) r4 = *(u32 *)(r1 +30)
41: (b7) r5 = 65535
; pkt->dst_ip.in6_u.u6_addr16[5] = 0xffff;
42: (6b) *(u16 *)(r10 -22) = r5
; pkt->src_ip.in6_u.u6_addr16[5] = 0xffff;
43: (6b) *(u16 *)(r10 -38) = r5
; pkt->dst_ip.in6_u.u6_addr32[3] = ip->daddr;
44: (63) *(u32 *)(r10 -20) = r4
45: (b7) r4 = 34
46: (b7) r5 = 22
47: (05) goto pc+30
; if (head + (*offset) > tail) {
48: (bf) r3 = r1
49: (07) r3 += 54
; if (head + (*offset) > tail) {
50: (2d) if r3 > r2 goto pc+73
; if (ipv6->nexthdr != IPPROTO_TCP && ipv6->nexthdr != IPPROTO_UDP) {
51: (71) r3 = *(u8 *)(r1 +20)
; if (ipv6->nexthdr != IPPROTO_TCP && ipv6->nexthdr != IPPROTO_UDP) {
52: (15) if r3 == 0x11 goto pc+1
53: (55) if r3 != 0x6 goto pc+70
;
54: (bf) r3 = r1
55: (07) r3 += 20
; pkt->src_ip = ipv6->saddr;
56: (61) r4 = *(u32 *)(r1 +34)
57: (67) r4 <<= 32
58: (61) r5 = *(u32 *)(r1 +30)
59: (4f) r4 |= r5
60: (7b) *(u64 *)(r10 -40) = r4
61: (61) r4 = *(u32 *)(r1 +26)
62: (67) r4 <<= 32
63: (61) r5 = *(u32 *)(r1 +22)
64: (4f) r4 |= r5
65: (7b) *(u64 *)(r10 -48) = r4
; pkt->dst_ip = ipv6->daddr;
66: (61) r4 = *(u32 *)(r1 +50)
67: (67) r4 <<= 32
68: (61) r5 = *(u32 *)(r1 +46)
69: (4f) r4 |= r5
70: (7b) *(u64 *)(r10 -24) = r4
71: (61) r4 = *(u32 *)(r1 +38)
72: (61) r5 = *(u32 *)(r1 +42)
73: (67) r5 <<= 32
74: (4f) r5 |= r4
75: (7b) *(u64 *)(r10 -32) = r5
76: (b7) r4 = 54
77: (b7) r5 = 21
;
78: (71) r3 = *(u8 *)(r3 +0)
79: (73) *(u8 *)(r10 -12) = r3
80: (bf) r0 = r1
81: (0f) r0 += r5
82: (71) r5 = *(u8 *)(r0 +0)
83: (73) *(u8 *)(r10 -11) = r5
; if (head + offset + sizeof(struct tcphdr) > tail || head + offset + sizeof(struct udphdr) > tail) {
84: (0f) r1 += r4
; if (head + offset + sizeof(struct tcphdr) > tail || head + offset + sizeof(struct udphdr) > tail) {
85: (bf) r4 = r1
86: (07) r4 += 20
; if (head + offset + sizeof(struct tcphdr) > tail || head + offset + sizeof(struct udphdr) > tail) {
87: (2d) if r4 > r2 goto pc+36
88: (bf) r4 = r1
89: (07) r4 += 8
90: (2d) if r4 > r2 goto pc+33
; switch (pkt->protocol) {
91: (15) if r3 == 0x11 goto pc+17
92: (55) if r3 != 0x6 goto pc+31
; if (tcp->syn) { // We have SYN or SYN/ACK
93: (69) r2 = *(u16 *)(r1 +12)
; if (tcp->syn) { // We have SYN or SYN/ACK
94: (57) r2 &= 512
; if (tcp->syn) { // We have SYN or SYN/ACK
95: (15) if r2 == 0x0 goto pc+13
; pkt->src_port = tcp->source;
96: (69) r2 = *(u16 *)(r1 +0)
; pkt->src_port = tcp->source;
97: (6b) *(u16 *)(r10 -16) = r2
; pkt->dst_port = tcp->dest;
98: (69) r2 = *(u16 *)(r1 +2)
; pkt->dst_port = tcp->dest;
99: (6b) *(u16 *)(r10 -14) = r2
; pkt->syn = tcp->syn;
100: (69) r2 = *(u16 *)(r1 +12)
; pkt->syn = tcp->syn;
101: (77) r2 >>= 9
102: (57) r2 &= 1
103: (73) *(u8 *)(r10 -10) = r2
; pkt->ack = tcp->ack;
104: (69) r1 = *(u16 *)(r1 +12)
; pkt->ack = tcp->ack;
105: (77) r1 >>= 12
106: (57) r1 &= 1
107: (73) *(u8 *)(r10 -9) = r1
108: (05) goto pc+4
; pkt->src_port = udp->source;
109: (69) r2 = *(u16 *)(r1 +0)
; pkt->src_port = udp->source;
110: (6b) *(u16 *)(r10 -16) = r2
; pkt->dst_port = udp->dest;
111: (69) r1 = *(u16 *)(r1 +2)
; pkt->dst_port = udp->dest;
112: (6b) *(u16 *)(r10 -14) = r1
;
113: (85) call bpf_ktime_get_ns#160896
114: (7b) *(u64 *)(r10 -8) = r0
115: (bf) r4 = r10
116: (07) r4 += -48
; if (bpf_perf_event_output(skb, &pipe, BPF_F_CURRENT_CPU, &pkt, sizeof(pkt)) < 0) {
117: (bf) r1 = r6
118: (18) r2 = map[id:3]
120: (18) r3 = 0xffffffff
122: (b7) r5 = 48
123: (85) call bpf_skb_event_output#9137616
; }
124: (b7) r0 = 0
125: (95) exit
TC Specific Show Commands
Here are the commands that are specific to tc
.
Show qdiscs associated with an interface
sudo tc qdisc show dev eth0
Sample output:
qdisc mq 0: root
qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc clsact ffff: parent ffff:fff1
Show qdisc filters attached to an interface
For ingress flows:
sudo tc filter show dev eth0 ingress
Sample output:
filter protocol ipv6 pref 49151 bpf chain 0
filter protocol ipv6 pref 49151 bpf chain 0 handle 0xffff0000 direct-action not_in_hw id 8168 tag 52643a7d6c14b2c8 jited
filter protocol ip pref 49152 bpf chain 0
filter protocol ip pref 49152 bpf chain 0 handle 0xffff0000 direct-action not_in_hw id 8168 tag 52643a7d6c14b2c8 jited
For egress flows:
sudo tc filter show dev eth0 egress
Sample output:
filter protocol ipv6 pref 49151 bpf chain 0
filter protocol ipv6 pref 49151 bpf chain 0 handle 0xffff0000 direct-action not_in_hw id 8168 tag 52643a7d6c14b2c8 jited
filter protocol ip pref 49152 bpf chain 0
filter protocol ip pref 49152 bpf chain 0 handle 0xffff0000 direct-action not_in_hw id 8168 tag 52643a7d6c14b2c8 jited
XDP Specific Show Commands
Use the ip
command to see the XDP programs that are attached to the an interface:
ip a show dev eth0
The output is similar to:
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric/id:135 qdisc mq state UP group default qlen 1000
link/ether 00:00:00:00:c6:5f brd ff:ff:ff:ff:ff:ff
inet 192.168.96.7/24 brd 192.168.96.63 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::6245:bdff:fe87:c65f/64 scope link
valid_lft forever preferred_lft forever
Notice that in the first line of the output, you see xdpgeneric/id:135
where 135
is the eBPF program ID.
Load and Attach eBPF programs
This section explains the commands that help you load and attach tc
and XDP
eBPF programs manually.
If you would like to follow along, this section explains the prerequisites.
Load and Attach eBPF TC programs
-
Create a
qdisc
for an interface, eth0 in this case:sudo tc qdisc add dev eth0 clsact
-
Attach the eBPF program to
ingress
,egress
or both traffic directions:sudo tc filter add dev eth0 ingress bpf direct-action obj flat.o sec tc sudo tc filter add dev eth0 egress bpf direct-action obj flat.o sec tc
flat.o
is the compiled kernel space program aka the BPF bytecode.
Simple as that.
TC Hardware Offload
Similar to XDP
, tc
programs can also be offloaded to the NIC (if the NIC supports hardware offload).
We need to install the ethtool
in order to leverage hardware offload.
-
Install
ethtool
:sudo apt install ethtool
-
Check hardware offload support on an interface:
ethtool -k eth0 | grep -i offload
Sample output:
tcp-segmentation-offload: on generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off rx-vlan-offload: on [fixed] tx-vlan-offload: on [fixed] l2-fwd-offload: off [fixed] hw-tc-offload: off [fixed] esp-hw-offload: off [fixed] esp-tx-csum-hw-offload: off [fixed] rx-udp_tunnel-port-offload: off [fixed] tls-hw-tx-offload: off [fixed] tls-hw-rx-offload: off [fixed] macsec-hw-offload: off [fixed] hsr-tag-ins-offload: off [fixed] hsr-tag-rm-offload: off [fixed] hsr-fwd-offload: off [fixed] hsr-dup-offload: off [fixed]
If you see
fixed
in front of a setting, it means the NIC driver does not support that feature. In this case,hw-tc-offload
is off and fixed on this NIC and hardware offload cannot be utilized.In case you have the feature support, proceed with the next steps, otherwise, you can skip the hardware offload.
-
Enable
tc
hardware offloading on your desired NIC:ethtool -K eth0 hw-tc-offload on
-
Create a
qdisc
:sudo tc qdisc add dev eth0 clsact
-
Attach the eBPF program to
ingress
,egress
or both traffic directions and offload to NIC usingskip_sw
:sudo tc filter add dev eth0 ingress bpf skip_sw direct-action obj flat.o sec tc sudo tc filter add dev eth0 egress bpf skip_sw direct-action obj flat.o sec tc
Note: By default tc
will try to offload filters to hardware if possible but skip_sw
forces the offload as described in the man page.
Load and Attach eBPF XDP programs
-
Load the eBPF byte code into the kernel and link it to the filesystem so that it may be identified as
/sys/fs/bpf/flat
:sudo bpftool prog load flat.o /sys/fs/bpf/flat
-
At this point the program is not associated with any events that would trigger it. Associate or attach eBPF program to
eth0
network interface:sudo bpftool net attach xdp name flat dev eth0
-
Additionally, we can also use the
ip
command to load ourXDP
eBPF command:sudo ip link set dev eth0 xdp obj flat.o sec xdp
Update Commands
The most common way that the running eBPF programs need updating is through maps which are basically a communication channel between these programs.
maps can be updated through user and kernel space programs as well as manually. Here is how you can update a map by hand:
sudo bpftool map update id $MAP_ID key 5 0 0 0 0 0 0 0 value 0 0 0 0 0 0 0 1
Keep in mind that the keys and values you provide will be converted to hex
.
Unload and Detach eBPF Programs
This section depicts the commands to unload and detach tc
and XDP
eBPF programs manually.
Unload and Detach eBPF TC programs
-
Remove
tc
filters:sudo tc filter del dev eth0 ingress sudo tc filter del dev eth0 egress
-
Delete the clsact
qdisc
:sudo tc qdisc del dev eth0 clsact
Note: Deleting a qdisc
will remove its filters as well, so, we could just delete the clsact qdisc and skip the first step.
Unload and Detach eBPF XDP programs
-
Unload the program:
sudo rm /sys/fs/bpf/flat
-
Detach the eBPF program from the NIC:
sudo bpftool net detach xdp dev eth0
Conclusion
In this post, we displayed various ways in which we can interact with eBPF programs by leveraging the bpftool. Although, the intention of this post was not to explain all the nitty gritty but to act as more of a cheat sheet to ease our lives.
I hope it has been useful for you and thanks for reading.