wspy – instrumentation: ftrace and ptrace
While the names sound different, ptrace and ftrace are two rather different methods of implementing process instrumentation.
A previous tool used an interface based on the ptrace(2) system call. This system call is used by debuggers, strace(1) and similar tools. It allows a tracer process to examine and control the tracee process as well as change state such as memory. There is a set of events one can receive including fork(2)/exec(2) and exit(2). When used with a follow-fork flag, it can create a tree of processes.
My previous tool did just that, using ptrace(2) to essentially follow forks of all children and then construct a process tree. This worked reasonably well with a few limitations/issues:
- For security reasons, one can’t trace through a setuid event, or essentially it doesn’t have effect
- I had a sense (unsubstantiated) that this might have a higher overhead to be getting all these events
- Signals had to be carefully managed. Essentially when one process signals another, the signals received are notified to the tracer program, which can then choose to pass them along to the tracee. I had a nasty bug in this area that I couldn’t quite resolve. Essentially it involved a fifo pipe between producer/consumer where some sort of notification was used to control flows through the pipe. The symptoms were my tracing mechanism would sometimes hang when the pipe was full. I thought it likely to be related to missed signals, but couldn’t ever quite sort it out.
When I created wspy, I initially tried a different approach using ftrace. Ftrace is a kernel function tracing mechanism that can be used to instrument and debug the kernel. There is a whole variety of different event types one can trace. The ones of initial interest were in the scheduler where one could trace fork/exec/exit events, but there might be others later.
One controls ftrace by changing values in /sys/kernel/debug/tracing and then one can read the events coming from the “tracing_pipe”. The implementation of wspy used this interface and it worked reasonably well, though I’ve noticed a few tradeoffs vs. my earlier tool.
- Permissions for kernel tracing are generally set for a root user. I added a “–uid” option so I could run the workload as setuid to a mere mortal, but it was sometimes a pain to remember to run wspy as root.
- The events were printed to the log without an (easy) explicit hook that allowed me to intervene. The easiest example is the exit(2) call where my previous tool could gather information from /proc/[pid] of the exited process to get the end state before allowing the process to continue. Similarly, my performance counter measurements could more easily be based by pid (rather than core) since I could get a final round of measurements at exit
- I had slightly different implementation issue, making sure I properly handled failed events, e.g. fork that spawns too fast, still to sort out
Overall, I like having the ftrace based process tree mapping in wspy, although the tradeoffs above tell me there isn’t a 100% clear winner in all situations. As a result, I’ll likely implement a ptrace(2) alternative in wspy as well. Will update this post with comments as this becomes completed as well as other comparisons e.g. overhead, I might make.
I have now added basic support to create a process tree from ptrace(2) events that follow a child process.
Along the way, I’ve added an option –processtree-engine with two options: ftrace and ptrace to pick between these two mechanisms.
At this point, the ptrace engine doesn’t have as much information such as the elapsed times of processes since this isn’t immediately given. However, I do have the potential for getting quite a bit more other information by going to /proc/[pid]/* files at the process exit.