wspy – process tracing, lost nodes and potential refactoring
One of the tough bugs I’ve noticed sometimes pop up is an “orphaned” process tree in my tree. When I’ve collected trees of processes, very occasionally I’ll have a tree node drop on the floor and show up as an orphan when it clearly should have been attached.
The problem seems to come up more with very large data sets, e.g. build-gcc run with >1,000,000 processes. I was fortunate to recently see it pop up in a few c-ray on Ryzen with ~600 runs. The symptoms are as if one of a few things might be happening:
- Somewhere race conditions are causing me to miss events, particularly exit(2) events. Without these, the processes never get closed out with finish times and their counters and closing statistics are never dumped
- Perhaps I am getting the events but my tree building and accounting have subtle bugs introducing problems, particularly when pid numbers are reused
When the problem first appeared with build-gcc (where pid’s wrap around more than 30 times), I was suspicious more of the second cause, but more recently I’ve been influenced more by the former, particularly when it showed up in a small c-ray example on a fast processor.
This makes it interesting to figure out how to best debug things. I’ve dumped quite a few trace files of underlying event points and gone through them to look for patterns. Haven’t found anything there yet.
However, recent restructuring I did to dump process information to a file and have a later program reconstruct the trees, makes me realize I might also do this using an alternate implementation. I am going to try these as alternate implementations of a “–processtree-engine” and then run more than one to diagnose things.
To describe this one can also follow this as a progression of four implementations and five steps:
- My first implementation was to use the /sys/kernel/debug/tracing interface directly and to dump kernel events for the scheduler, particularly fork/exec/exit. From these I built an in-memory version of the process tree and dump it at the end. This became –processtree-engine ftrace
- My second implementation was to use ptrace(2) to follow processes through fork/exec/exit events and built an in-memory version of the process trees and dump it at the end. This became –processtree-engine ptrace
- My third step was to dump not just a hierarchical process tree, but also the flat process tree data of each line by line. From this information, a separate program could reconstruct the tree hierarchy
- My third implementation, under consideration is to have wspy instead invoke an external command. The command I will have it invoke is “trace-cmd” to record a trace file. This is useful data by itself. It is also a way of debugging my ptrace implementation to see if there is anything unique that might have caused dropped events. Both useful functionality to extend the tool further and useful cross debug
- My fourth implementation under consideration is a rewrite of the ptrace engine to stop building an in-memory tree. Instead retain only information needed to dump a trace log file and focus on doing it quickly. Rely on the backend display program to dump the information.
The next result will be a wspy with the following options:
--processtree or --no-processtree - turn process tracing on/off --processtree-engine ftrace - trace /sys/kernel/debug/ events and dump to processtree.txt --processtree-engine tracecmd - invoke "trace-cmd -e sched" and dump to processtree.dat --processtree-engine ptrace - trace ptrace(2) events and dump to processtree.{txt,csv} --processtree-engine ptrace2 - trace ptrace(2) events and dump to processtree.csv --processtree-engine none
These can be mixed and matched with the following two restrictions (1) the “ftrace” and “tracecmd” options are incompatible and picking one will turn off the other and (2) the “ptrace” and “ptrace2” options are compatible and picking one will turn off the other.
To turn off the processtree-engine and start over, one can pick the none option.
It will take just a bit to do these rewrites and extensions, but hopefully get a more robust too with additional features.
Followup #1 After implementing and debugging ptrace2 driver, I have also found and fixed a few related problems
- Prace issues: (1) It turns out that fairly rarely (~2500 times out of 1,000,000 processes in build_gcc of ~1/4 of one percent), when a PTRACE_FORK event is delivered, the pid of the child isn’t available. I will get a SIGSTOP event for the new child and (2) at least as rare, a PTRACE_EXIT event isn’t delivered but I will get a notice when the process has exited. This doesn’t make cause a big issue for a smaller benchmark, but once the PID numbers wrap around in my 1,000,000 process version of build-gcc, things get screwed up.
- Parsing issues in the /proc/[pid]/stat files, e.g. processes with spaces in their names “dconf worker”
- Better error recovery/notification, e.g. I had a file descriptor leak that ran out of file descriptors in the largest workloads
I have fixed several of these and have wspy a bit more robust. More is still needed, but it also makes the tool work a little better on the larger (e.g. many many processes) workloads.
Comments
wspy – process tracing, lost nodes and potential refactoring — No Comments
HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>