BPF Selftests: Add Cgroup V1 Local Storage Tests
Hey guys! Today, we're diving deep into a super interesting patch that landed in the BPF (Berkeley Packet Filter) subsystem. This patch is all about adding selftests for cgroup v1 local storage. Now, if you're scratching your head thinking, "What's cgroup? What's BPF? And what's local storage in this context?" Don't worry, we'll break it down in a way that's easy to understand. So, grab your favorite beverage, and let's get started!
This article explores the technical details, background, and community discussion around a recent patch aimed at enhancing the testing coverage for BPF's cgroup local storage feature, specifically for the cgroup v1 architecture. We'll dissect the problem the patch solves, the technical concepts involved, the code changes implemented, and the overall impact on the kernel's robustness. Let's dive into the fascinating world of BPF and cgroups!
Patch Overview
What Problem Did This Patch Solve?
The core challenge revolves around BPF's cgroup local storage, a powerful feature that lets BPF programs directly associate data with specific cgroups. Think of it like attaching a sticky note to a cgroup, but instead of a paper note, it's data stored in the kernel. However, the existing automated tests (selftests) in the kernel primarily focused on the modern cgroup v2 architecture. This left a significant gap: the older, yet widely used, cgroup v1 architecture lacked proper testing for this functionality. Because BPF programs retrieve cgroup v1 and v2 objects in fundamentally different ways, the correct operation of cgroup local storage in cgroup v1 environments couldn't be guaranteed. This patch steps in to fill that void and ensure robustness across both cgroup versions.
In a nutshell, cgroup local storage allows BPF programs to store data directly within a cgroup. This is incredibly useful for tracking resource usage, applying policies, and much more. The issue was that the existing tests only covered cgroup v2, leaving cgroup v1, which is still widely used, untested. This patch cleverly bridges that gap, ensuring that cgroup local storage works flawlessly in both cgroup v1 and v2 environments.
What Were the Core Changes?
Instead of creating a whole new set of tests, the patch cleverly expands the existing selftest suite to include cgroup v1 support. It parameterizes the existing test code, making it adaptable to both cgroup versions. This approach demonstrates an efficient and maintainable solution. The key changes are:
- User-space test program: The patch introduces global flags,
is_cgroup1andtarget_hid(cgroup v1 hierarchy ID), to inform the BPF program which cgroup version is being tested. Think of these flags as little switches that tell the BPF program, "Hey, we're testing cgroup v1 now!" - BPF program: Inside the BPF program, a conditional check is added. If
is_cgroup1is true, the program calls the dedicated helper functionbpf_task_get_cgroup1()to get the cgroup v1 object. Otherwise, it uses the old logic to access the cgroup v2 object (task->cgroups->dfl_cgrp).
This ingenious method reuses the core test logic for both cgroup architectures, significantly improving code maintainability and test completeness. It's like having a single key that unlocks two different doors, saving you the hassle of carrying two separate keys!
Technical Background and Concepts
Before we dive deeper into the code, let's establish some foundational knowledge. We need to understand what cgroups are, how they differ in versions 1 and 2, and how BPF cgroup local storage works.
cgroup v1 vs cgroup v2
Okay, so what exactly are cgroups? Cgroups (Control Groups) are a Linux kernel feature that lets you limit, track, and isolate how groups of processes use resources. Imagine you have a bunch of containers running on your system, and you want to make sure one container doesn't hog all the CPU. Cgroups to the rescue!
- cgroup v1: This is the older implementation. It lets a task (like a process) belong to multiple cgroup hierarchies simultaneously. Think of it like a person being a member of multiple clubs, each with its own rules. While flexible, this can lead to complexity in management.
- cgroup v2: This is the modern, unified implementation. It enforces a single cgroup hierarchy, where all controllers (CPU, memory, I/O, etc.) are mounted on this unified tree. This simplifies cgroup management significantly. The data structures and access methods in the kernel differ considerably between v1 and v2.
The key takeaway here is that cgroup v1 and v2 are different beasts under the hood. They have different ways of organizing and accessing cgroup information, which is why this patch is so important.
BPF cgroup Local Storage
This is where things get really interesting! BPF cgroup local storage is a special BPF map type (BPF_MAP_TYPE_CGRP_STORAGE) that lets BPF programs associate data with a cgroup kernel object. It's like having a special storage locker attached to each cgroup. Compared to regular hash maps that use cgroup IDs as keys, this approach has a significant advantage: when a cgroup is destroyed, the kernel automatically reclaims its associated local storage, preventing memory leaks. This is an efficient and lifecycle-safe way to store cgroup data.
Think of it this way: if you used a regular hash map, you'd have to manually clean up the data when a cgroup is deleted. With cgroup local storage, the kernel handles the cleanup for you, making your BPF programs more robust and less prone to memory leaks.
bpf_task_get_cgroup1()
This is a BPF helper function specifically designed for working in cgroup v1 environments. Because cgroup v1 allows multiple hierarchies, you can't just ask for "the cgroup" of a task. You need to specify which hierarchy you're interested in. This function takes a task pointer and a hierarchy ID (hierarchy_id) and returns the cgroup object pointer for that specific v1 hierarchy. Importantly, it increments the cgroup object's reference count, so you must call bpf_cgroup_release() to release the reference after use, or you'll end up with a resource leak!
Think of bpf_task_get_cgroup1() as a specialized tool for navigating the cgroup v1 landscape. It lets you pinpoint the exact cgroup you need within the complex hierarchy.
RCU (Read-Copy-Update)
RCU is a kernel synchronization mechanism optimized for scenarios where reads are much more frequent than writes. In BPF, certain kernel data (like task->cgroups) is protected by RCU. In non-preemptive contexts (like most BPF program types), you can access this data directly. However, in sleepable BPF programs, the task might be scheduled out, causing the RCU protection to become invalid. Therefore, to safely access this data in sleepable programs, you must explicitly enter the RCU read-side critical section (using bpf_rcu_read_lock() and bpf_rcu_read_unlock()) to ensure the data structure isn't freed while you're using it.
RCU is like a special lock that lets multiple readers access data simultaneously, but only allows one writer at a time. This makes it incredibly efficient for read-heavy operations, which is common in BPF programs.
In-depth Code Review & Analysis
Let's get our hands dirty and dive into the code! We'll examine the key changes and discuss their implications.
Review Results and Comments
The main action happens in the tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c file. Here's a breakdown of the relevant code diff:
--- a/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c
+++ b/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c
@@ -21,50 +21,100 @@ struct {
__type(value, long);
} map_b SEC(".maps");
+int target_hid = 0;
+bool is_cgroup1 = 0;
+
+struct cgroup *bpf_task_get_cgroup1(struct task_struct *task, int hierarchy_id) __ksym;
+void bpf_cgroup_release(struct cgroup *cgrp) __ksym;
+
+static void __on_lookup(struct cgroup *cgrp)
+{
+ bpf_cgrp_storage_delete(&map_a, cgrp);
+ bpf_cgrp_storage_delete(&map_b, cgrp);
+}
+
SEC("fentry/bpf_local_storage_lookup")
int BPF_PROG(on_lookup)
{
struct task_struct *task = bpf_get_current_task_btf();
+ struct cgroup *cgrp;
+
+ if (is_cgroup1) {
+ cgrp = bpf_task_get_cgroup1(task, target_hid);
+ if (!cgrp)
+ return 0;
+
+ __on_lookup(cgrp);
+ bpf_cgroup_release(cgrp);
+ return 0;
+ }
+
+ __on_lookup(task->cgroups->dfl_cgrp);
return 0;
}
-SEC("fentry/bpf_local_storage_update")
-int BPF_PROG(on_update)
+{
+static void __on_update(struct cgroup *cgrp)
{
- struct task_struct *task = bpf_get_current_task_btf();
long *ptr;
- ptr = bpf_cgrp_storage_get(&map_a, task->cgroups->dfl_cgrp, 0,
- BPF_LOCAL_STORAGE_GET_F_CREATE);
+ ptr = bpf_cgrp_storage_get(&map_a, cgrp, 0, BPF_LOCAL_STORAGE_GET_F_CREATE);
if (ptr)
*ptr += 1;
- ptr = bpf_cgrp_storage_get(&map_b, task->cgroups->dfl_cgrp, 0,
- BPF_LOCAL_STORAGE_GET_F_CREATE);
+ ptr = bpf_cgrp_storage_get(&map_b, cgrp, 0, BPF_LOCAL_STORAGE_GET_F_CREATE);
if (ptr)
*ptr += 1;
}
+SEC("fentry/bpf_local_storage_update")
+int BPF_PROG(on_update)
{
+ struct task_struct *task = bpf_get_current_task_btf();
+ struct cgroup *cgrp;
+
+ if (is_cgroup1) {
+ cgrp = bpf_task_get_cgroup1(task, target_hid);
+ if (!cgrp)
+ return 0;
+
+ __on_update(cgrp);
+ bpf_cgroup_release(cgrp);
+ return 0;
+ }
+
+ __on_update(task->cgroups->dfl_cgrp);
+ return 0;
+}
Code Interpretation
This diff shows the core idea of parameterizing the BPF program. The key changes are:
- Before: Functions like
on_lookupandon_updatedirectly accessed the default cgroup v2 object usingtask->cgroups->dfl_cgrp. The logic was hardcoded. - After:
- The
is_cgroup1andtarget_hidglobal variables are introduced. Their values are set by the user-space test loader. - The cgroup v1-specific ksyms,
bpf_task_get_cgroup1andbpf_cgroup_release, are declared. - The original core operations (like
bpf_cgrp_storage_deleteandbpf_cgrp_storage_get) are encapsulated into static helper functions,__on_lookupand__on_update. These helper functions take a genericstruct cgroup *pointer. - The original BPF entry functions (
on_lookup,on_update) now act as dispatchers. They check theis_cgroup1flag. If it's cgroup v1, they callbpf_task_get_cgroup1to get the cgroup pointer, call the helper function, and then must callbpf_cgroup_releaseto release the reference. If it's cgroup v2, they directly passtask->cgroups->dfl_cgrpto the helper function.
- The
This modification allows the same BPF code to run in both cgroup v1 and v2 modes, depending on the parameters passed from user space. This significantly improves code reuse!
Checklist Review & Assessment
Let's go through a checklist to evaluate the code changes:
-
Logic & Functional Correctness: The code logic is sound. The patch correctly distinguishes between cgroup v1 and v2 processing paths. In the cgroup v1 path, it performs the necessary
NULLcheck on thecgrppointer returned bybpf_task_get_cgroup1and correctly releases the reference usingbpf_cgroup_release(cgrp)after use. This covers both successful and failed cgroup retrieval scenarios, preventing null pointer dereferences and resource leaks. Functionally, it achieves the intended goal. -
Coding Style & Readability: The coding style is good and follows kernel BPF programming conventions. Abstracting the core logic into static functions like
__on_lookupand__on_updatemakes the main entry functions (on_lookupandon_update) very clear – they're responsible only for getting the cgroup object. This separation enhances code structure and readability. -
Potential Risk Assessment: No obvious performance, security, or concurrency risks are identified. This patch primarily affects the kernel's selftest code and doesn't directly impact kernel paths in production environments. Within the BPF program, the use of
bpf_task_get_cgroup1()is standard and safe, with correct reference count handling, which is the primary risk point for this function. -
Architecture & Maintainability: The architectural design is excellent. Extending the tests through parameterization rather than copy-pasting code is a prime example of improving maintainability. If the cgroup local storage test logic needs to be modified in the future, you only need to change the
__on_...helper functions, and the changes will automatically apply to both cgroup v1 and v2 test cases. This significantly reduces future maintenance costs and the risk of introducing inconsistencies. -
Nit-picking: In the BPF program
cgrp_ls_recursion.c, the newly introduced helper functions__on_lookupand__on_updateare declared asstatic void. In C, for short helper functions used within a file, the more conventional declaration isstatic inline void. This suggests to the compiler that the function should be inlined, eliminating the function call overhead. While modern compilers often optimize this automatically, explicitly declaringinlineclarifies the author's intent.
Community Review & Discussion
Let's take a peek at what the kernel development community had to say about this patch.
Thread 1: Initiated by Yafang Shao - Patch Application and Minor Correction
- Yafang Shao at Wed, 6 Dec 2023 11:53:26 +0000:
-
Key Point: Submitted this patch to add selftests for the BPF cgroup local storage feature, specifically targeting cgroup v1, to improve test coverage.
-
Tags:
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>,Acked-by: Tejun Heo <tj@kernel.org>,Acked-by: Yonghong Song <yonghong.song@linux.dev> -
Martin KaFai Lau (replying to Yafang Shao) at Fri, 8 Dec 2023 17:12:54 -0800:
- Key Point: Informed the author that the patch had been accepted and merged, but a minor modification was made before merging: the
statickeyword was added to the newly addedcgrp2_local_storageandcgrp1_local_storagefunctions in the user-space test program because they are only used within the current file.
(EN)
Applied with adding 'static' to the above. Thanks!(ZH)已合入,并为上述函数添加了 'static' 关键字。谢谢! - Key Point: Informed the author that the patch had been accepted and merged, but a minor modification was made before merging: the
-
This discussion highlights the collaborative nature of kernel development. A minor coding style improvement was suggested and implemented during the review process.
Conclusion
This patch is a fantastic example of how to improve code quality and test coverage in a maintainable way. By parameterizing the existing selftests, the patch adds crucial testing for cgroup v1 local storage without duplicating code. The code is well-structured, easy to understand, and addresses a significant gap in the BPF selftest suite. The community discussion also shows the importance of collaboration and attention to detail in kernel development. So, next time you're working on a project, remember the lessons learned from this patch: keep your code modular, reuse existing logic whenever possible, and always strive for thorough testing! Keep hacking, guys!