Re: /proc pid number off-by-one? ... 2.6.18-028test003.1 [message #8286 is a reply to message #8236] |
Mon, 13 November 2006 17:19 ![Go to previous message Go to previous message](/theme/ovz3/images/up.png) ![Go to next message Go to previous message](/theme/ovz3/images/down.png) |
John Kelly
Messages: 97 Registered: May 2006 Location: Palmetto State
|
Member |
|
|
#!/bin/sh
SENDMAIL_CLIENT_ARGS="-L sendmail-client -Ac -qp30m"
msppid=/var/spool/clientmqueue/sm-client.pid
srvpid=/var/run/sendmail.pid
killproc -p $msppid -i $srvpid -TERM /usr/sbin/sendmail
startproc -p $msppid -i $srvpid /usr/sbin/sendmail $SENDMAIL_CLIENT_ARGS
Here is a reduced test case, the problem happens on the last line, startproc. The problem seems like some kind of race, because sometimes it happens, and other times, it does not.
I tried strace with startproc, but that seems to avoid the race. However, after running the test script above many times, followed immediately by "ps ax," I was able to see what the problem is (shown below). There is a zombie with the PID number in question, and the actual PID number of the running sendmail process is one higher. Seeing the zombie with "ps ax" is hard to reproduce, I only captured it one time.
This never happened until I started using the openvz 2.6.18 kernel. I don't know if this happens with any other VE, suse 9.1 is the only one I use enough to produce the problem.
startproc: cannot stat /proc/1372/exe: Permission denied
PID TTY STAT TIME COMMAND
1 ? Rs 0:00 init [3]
28095 ? Ss 0:00 sendmail: accepting connections
28107 ? Ss 0:00 /usr/sbin/sshd -o PidFile=/var/run/sshd.init.pid
28113 ? Ss 0:00 /usr/sbin/xinetd
28119 ? Ss 0:00 /usr/sbin/cron
28276 pts/1 Ss+ 0:00 -bash
1372 pts/0 Z 0:00 [sendmail] <defunct>
1373 ? Ss 0:00 sendmail: Queue control
1374 ? S 0:00 sendmail: running queue: /var/spool/clientmqueue
1375 pts/0 R+ 0:00 ps ax
[Updated on: Mon, 13 November 2006 20:00] Report message to a moderator
|
|
|