The ptrace(2) man page shows how one can subscribe for events such as PTRACE_EVENT_FORK or PTRACE_EVENT_VFORK type events that happens when a process calls fork()/vfork(). At such an event one can then call ptrace(PTRACE_GETEVENTMSG,…) to retrieve the pid of the new child.
This has worked fairly well, although I noticed in my most extreme cases, e.g. build-gcc with a million processes, that a small amount ~0.1% of the events might not retrieve a valid process id for the new child.
That may not sound like a huge amount, but unfortunately when the maximum process is 32768 this means the process ids get reused >30x in the workload. A missed event could mean a particular process with the previous generation and hence building a slightly incorrect tree. Worse yet, occasionally these trees I built were cyclic graphs and my tools might hang without measures to avoid the cycles.
I initially tried adding some robustness by also looking in /proc/[pid]/stat at process exit. That worked slightly better, though there were still a few processes that got missed and showed up as “orphans”. My best hypothesis is that through a quirk of scheduling the kernel might just be missing the process information at exactly the time I go looking for it so I would miss a parent/child link. This also seems to correlate with the problem being just slightly worse for my faster Ryzen machine with more cores.
An advantage of having multiple “processtree-engine” implementations in the same wspy program is that I can experiment with more than one implementation to see how well they work in building the trees for build-gcc workload.
I’ve run this experiment with three possible choices:
- The original “ptrace” engine described above
- My revised “ptrace2” engine that enhanced to get the ppid not at the PTRACE_EVENT_* stage, but install look in /proc/[pid]/stat at the point a new process arrives with a SIGSTOP event and to look for the “comm” field at process exit.
- The original “ftrace” engine that uses kernel tracing for fork/exec/exit events.
Here are my results
processes orphans missing comm ptrace 1,065,044 129,494 276,979 ptrace2 1,062,318 0 4,582 ftrace 1,065,564 0 2
Orphans are cases where obvious parts of the tree aren’t connected because some parent relationship was missed. Missing comm are cases where printing the command (comm) field results in no entry.
ftrace is clearly giving me the most overall processes and avoiding building incorrect trees due to missed parent relationships. The drawbacks of ftrace is that I get less information, e.g. utime and stime and that I don’t have hooks to add performance counters or similar “per process” information. So if my primary goal is to show a process hierarchy without this additional data, ftrace seems to work best.
ptrace shows problems with orphans, likely after some missed inheritances cause the tree relationships to be lost. There is also a fairly large problem with missed “comm”, need to check to make sure this isn’t a memory/file descriptor leak or something similar. Overall, while ptrace seems to still do OK with simpler programs e.g. <1000 processes, it does seem to get confused with the larger build-gcc type programs. ptrace2 seems to have addressed the missing child relationships by initiating tracing when SIGSTOP happens. There is a slight concern here with a few thousand fewer processes that can be looked into further by running both ftrace and ptrace2 at the same time and reconciling the missed nodes. ptrace2 does have an advantage of providing some of the hooks I need for analysis. Going forward, I'll keep all three tracers but also put further enhancements into ptrace2 only and eventually deprecate ptrace at points where keeping it makes the code more complex (right now there is slight complexity in my back end processing program, but not particularly difficult.