By analyzing the resulting LLVM IR for source-based coverage instrumented programs, we should now have a better understanding of what we need to do for having it on eBPF programs running in the Linux kernel. We now know what to completely get rid of and do differently, and what we need to patch to make it loadable and usable by the BPF VM in the kernel.
The plan is simple: get Clang to instrument a BPF LLVM intermediate representation for source-based code coverage, then patch it to model it into a valid representation for BPF ELF. How do we need to transform it?
First of all, we are so lucky we don’t have to mess with the actual BPF instructions — namely the counters increments. We can keep them the way they are. This is a huge win because we let LLVM keep track of the global state of the registers and we avoid a lot of work this way.
But for sure we have to strip any profile initialization stuff that Clang creates, things like
__llvm_profile_init – when present – are no good for the BPF VM in the kernel.
We also want to ensure the global variables, whether constants or not, have the right visibility (ie.,
dso_local) and linkage, to have them in the
libbpf skeletons if we plan to use them.
For the global structs that we need for generating the
profraw files, namely the
__profd_ variables, we just transform them into different and single global variables, one for each field.
For example, this is what I did for the
__profd_* variables which originally are a struct with 7 fields. For other global structs like the
__covrec_ ones, we can just strip them from the BPF ELF that is meant to be loaded in the kernel.
Anyway, the report generation phase (ie.,
bpfcov out) will need them for knowing at which line and column a code region or a branch starts. For this reason, I decided to give the LLVM pass an option (enabled with the
strip-initializers-only flag) that keeps them, so we can later create a BPF ELF that is only meant for this phase and not for loading.
This BPF ELF will have
.bpf.obj as an extension, rather than
Finally, we know that libbpf supports (on recent Linux kernels) eBPF global variables, which are simply eBPF maps with one single value, and we are planning to use them. But, as already mentioned, it does not accept or recognize the ELF sections that the Clang instrumentation injects in the intermediate representation.
So we need our LLVM pass to change them to custom eBPF sections. The eBPF custom sections are in the form of
.data.* made to contain static and/or global data. We can change the section of the counters to be
.data.profc. The section of the
.rodata.profn, and so on. You can find all this logic summarized in these bits of code.
So, assuming the following dummy eBPF program: