OpenVZ problem or not ? [message #7489] |
Mon, 16 October 2006 07:58 |
n00b_admin
Messages: 77 Registered: July 2006 Location: Romania
|
Member |
|
|
Hi there,
This morning i detected a serious problem with the apache server on multiple vps's and i don't know exactly were is the problem.
Before i used OpenVZ to make a vps based webhosting environment i was using shared hosting and this problem didn't occur.
On several vps's the apache process died without any resource beind exausted. The apache error_log says:
[Mon Oct 16 00:39:27 2006] [warn] child process 17451 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:27 2006] [warn] child process 17452 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:27 2006] [warn] child process 17453 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:27 2006] [warn] child process 17454 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:27 2006] [warn] child process 17455 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:27 2006] [warn] child process 17471 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:27 2006] [warn] child process 17472 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:27 2006] [warn] child process 17473 still did not exit, sending a SIGTERM
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
[Mon Oct 16 00:39:29 2006] [warn] child process 17451 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:30 2006] [warn] child process 17452 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:30 2006] [warn] child process 17453 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:30 2006] [warn] child process 17454 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:30 2006] [warn] child process 17455 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:30 2006] [warn] child process 17471 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:30 2006] [warn] child process 17472 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:30 2006] [warn] child process 17473 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17451 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17452 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17453 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17454 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17455 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17471 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17472 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17473 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:33 2006] [error] child process 17451 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:33 2006] [error] child process 17452 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:33 2006] [error] child process 17453 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:33 2006] [error] child process 17454 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:33 2006] [error] child process 17455 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:33 2006] [error] child process 17471 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:33 2006] [error] child process 17472 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:33 2006] [error] child process 17473 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:34 2006] [notice] SIGHUP received. Attempting to restart
I've checked the user_beancounters file and i don't have any failcnt's.
The site hosted is a simple one, html based, no php or other dinamic page generating language.
After some googleing i only found answers about this problem related to a php extension but not anything else
Maybe someone here has stumbled on this one before me and could give me some hints
EDIT:
After some checking i discovered that ALL apache servers are dead since 8:36 this morning !
They all ended operations with the following message:
[Mon Oct 16 08:36:59 2006] [notice] caught SIGTERM, shutting down
And i didn't shut them all down, the vps's were all up but the apache servers all down and all the logs have the messages posted above in them
[Updated on: Tue, 28 November 2006 10:11] Report message to a moderator
|
|
|
|
|
|
Re: OpenVZ problem or not ? [message #7497 is a reply to message #7489] |
Mon, 16 October 2006 12:21 |
n00b_admin
Messages: 77 Registered: July 2006 Location: Romania
|
Member |
|
|
I considered writing on theyr mailing list but the fact that happened on all the vps's got me wondering if this is not an openvz bug.
Apache is up to date version 2.2.2 (i know the latest is 2.2.3)
Mod_security is version 1.9.4 today they released version 2 and i'll wait for the rpm's to show up in fedora-extras.
I can't test the setup on my HN because i use centos on it
[Updated on: Mon, 16 October 2006 12:22] Report message to a moderator
|
|
|
|
|
|
|
|
|
Re: OpenVZ problem or not ? [message #8602 is a reply to message #7489] |
Tue, 28 November 2006 23:53 |
rickb
Messages: 368 Registered: October 2006
|
Senior Member |
|
|
Running any services on the HN is ill advised. The redhat init scripts aren't expecting many processes to be running, even though they are under VE context, HN can still see them in its /proc. So, if the initscript kills them in a lazy way, (killall), your VEs will suffer.
However, the problem exists for applications which are not "services"-apache, mysql, etc. For example, take crond in redhat/centos:
[root@gallium ~]# /etc/init.d/crond status
crond (pid 17077 8226 16661 22837 603 6829 19314 27367 14965 29437 9188 13289 9025 6810 6413 2284 18725 12985 4776 29353 24821 16585 6167 2093 28784 26881 8950 4117 15282 2945 2389 28970 25190 12507 9026) is running...
[root@gallium ~]#
Those are the cronds of VEs. Stopping cron from /etc/init.d/cron:
stop() {
echo -n $"Stopping $prog: "
if [ ! -e /var/lock/subsys/crond ]; then
echo -n $"cannot stop crond: crond is not running."
failure $"cannot stop crond: crond is not running."
echo
return 1;
fi
killproc crond
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/crond;
return $RETVAL
}
killproc:
pid=
if [ -f /var/run/${base}.pid ]; then
local line p
read line < /var/run/${base}.pid
for p in $line ; do
[ -z "${p//[0-9]/}" -a -d "/proc/$p" ] && pid="$pid $p"
done
fi
if [ -z "$pid" ]; then
pid=`pidof -o $$ -o $PPID -o %PPID -x $1 || \
pidof -o $$ -o $PPID -o %PPID -x $base`
fi
So, if the pid file exists, there shouldn't be a problem. But, if that first [if] can't find the pid file, theres gonna be trouble for all of your VE cronds.
[root@gallium ~]# pidof crond
17077 8226 16661 22837 603 6829 19314 27367 14965 29437 9188 13289 9025 6810 6413 2284 18725 12985 4776 29353 24821 16585 6167 2093 28784 26881 8950 4117 15282 2945 2389 28970 25190 12507 9026
[root@gallium ~]# vzpid 17077
Pid VPSID Name
17077 9889165 crond
[root@gallium ~]# vzpid 8226
Pid VPSID Name
8226 9889429 crond
This is probably the reason all of your httpd's got killed. Hope this helps.
Rick Blundell
-------------
Common Terms I post with: http://wiki.openvz.org/Category:Definitions
UBC. Learn it, love it, live it: http://wiki.openvz.org/Proc/user_beancounters
[Updated on: Tue, 28 November 2006 23:54] Report message to a moderator
|
|
|