OpenVZ Forum


Home » General » Support » OpenVZ problem or not ?
OpenVZ problem or not ? [message #7489] Mon, 16 October 2006 07:58 Go to next message
n00b_admin is currently offline  n00b_admin
Messages: 77
Registered: July 2006
Location: Romania
Member
Hi there,

This morning i detected a serious problem with the apache server on multiple vps's and i don't know exactly were is the problem.

Before i used OpenVZ to make a vps based webhosting environment i was using shared hosting and this problem didn't occur.

On several vps's the apache process died without any resource beind exausted. The apache error_log says:

[Mon Oct 16 00:39:27 2006] [warn] child process 17451 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:27 2006] [warn] child process 17452 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:27 2006] [warn] child process 17453 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:27 2006] [warn] child process 17454 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:27 2006] [warn] child process 17455 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:27 2006] [warn] child process 17471 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:27 2006] [warn] child process 17472 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:27 2006] [warn] child process 17473 still did not exit, sending a SIGTERM
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***
[Mon Oct 16 00:39:29 2006] [warn] child process 17451 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:30 2006] [warn] child process 17452 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:30 2006] [warn] child process 17453 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:30 2006] [warn] child process 17454 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:30 2006] [warn] child process 17455 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:30 2006] [warn] child process 17471 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:30 2006] [warn] child process 17472 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:30 2006] [warn] child process 17473 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17451 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17452 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17453 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17454 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17455 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17471 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17472 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:31 2006] [warn] child process 17473 still did not exit, sending a SIGTERM
[Mon Oct 16 00:39:33 2006] [error] child process 17451 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:33 2006] [error] child process 17452 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:33 2006] [error] child process 17453 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:33 2006] [error] child process 17454 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:33 2006] [error] child process 17455 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:33 2006] [error] child process 17471 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:33 2006] [error] child process 17472 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:33 2006] [error] child process 17473 still did not exit, sending a SIGKILL
[Mon Oct 16 00:39:34 2006] [notice] SIGHUP received.  Attempting to restart


I've checked the user_beancounters file and i don't have any failcnt's.

The site hosted is a simple one, html based, no php or other dinamic page generating language.

After some googleing i only found answers about this problem related to a php extension but not anything else Sad

Maybe someone here has stumbled on this one before me and could give me some hints Smile

EDIT:

After some checking i discovered that ALL apache servers are dead since 8:36 this morning !

They all ended operations with the following message:

[Mon Oct 16 08:36:59 2006] [notice] caught SIGTERM, shutting down


And i didn't shut them all down, the vps's were all up but the apache servers all down and all the logs have the messages posted above in them Sad

[Updated on: Tue, 28 November 2006 10:11]

Report message to a moderator

Re: OpenVZ problem or not ? [message #7491 is a reply to message #7489] Mon, 16 October 2006 08:32 Go to previous messageGo to next message
Valmont is currently offline  Valmont
Messages: 225
Registered: September 2005
Senior Member
Please, check resources /proc/user_beancounters, then errors will begin.



*** glibc detected *** /usr/sbin/httpd: double free or corruption (out): 0x403a7158 ***

Looks like error in httpd, or maybe in one of dynamic lib's.
Re: OpenVZ problem or not ? [message #7492 is a reply to message #7489] Mon, 16 October 2006 10:07 Go to previous messageGo to next message
n00b_admin is currently offline  n00b_admin
Messages: 77
Registered: July 2006
Location: Romania
Member
Well, i've sead that user_beancounters doesn't report any problems.

The thing that puzzles me is that all apache servers on all vps's died more or less at the same time.

The only thing common to all is that all of them are using the same template (fedora core 5) and the same modules loaded. If some module is causing trouble it's the same problem in all of them.

The only module added by me that is not standard in apache is mod_security. But that one didn't cause problems in the past.
Re: OpenVZ problem or not ? [message #7493 is a reply to message #7492] Mon, 16 October 2006 10:22 Go to previous messageGo to next message
Valmont is currently offline  Valmont
Messages: 225
Registered: September 2005
Senior Member
Maybe version of apache, or mod_security has been changed (updated?)

I think, you should report developers of mod_sec about it. Or before that check apache+mod_sec on HN (ve0).

[Updated on: Mon, 16 October 2006 10:23]

Report message to a moderator

Re: OpenVZ problem or not ? [message #7497 is a reply to message #7489] Mon, 16 October 2006 12:21 Go to previous messageGo to next message
n00b_admin is currently offline  n00b_admin
Messages: 77
Registered: July 2006
Location: Romania
Member
I considered writing on theyr mailing list but the fact that happened on all the vps's got me wondering if this is not an openvz bug.

Apache is up to date version 2.2.2 (i know the latest is 2.2.3)

Mod_security is version 1.9.4 today they released version 2 and i'll wait for the rpm's to show up in fedora-extras.

I can't test the setup on my HN because i use centos on it Wink

[Updated on: Mon, 16 October 2006 12:22]

Report message to a moderator

Re: OpenVZ problem or not ? [message #7498 is a reply to message #7497] Mon, 16 October 2006 12:31 Go to previous messageGo to next message
Valmont is currently offline  Valmont
Messages: 225
Registered: September 2005
Senior Member
Then try to use strace ( something like strace -o log -f -p pid's ) to check problem...
Re: OpenVZ problem or not ? [message #7543 is a reply to message #7489] Tue, 17 October 2006 11:34 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

glibc messages warn you about corrupted memory heap.
This can be due to numerous reasons:
1. application writes beyond the allocated memory ranges and thus corrupts some glibc data.
All the apaches theoretically can crash if you use the same version of httpd and mod_'s. You can try to debug it via running apache under njamd or similar memory debugger. It will point you the broken code.

2. It can be hardware memory corruption.
Check dmesg and /var/log/message output on suspicous messages like this:
MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0.
Bank 2: 940040000000017a


See more details here:
http://wiki.openvz.org/Machine_check_exception
http://wiki.openvz.org/Hardware_testing


http://static.openvz.org/userbars/openvz-developer.png
Re: OpenVZ problem or not ? [message #7613 is a reply to message #7489] Thu, 19 October 2006 07:54 Go to previous messageGo to next message
n00b_admin is currently offline  n00b_admin
Messages: 77
Registered: July 2006
Location: Romania
Member
Thanks for your answers !

I will update mod_security as soon as i get the new version.

I've checked /var/log/messages and /var/log/dmesg but i didn't find any problems or hardware error messages.

The weird thing is it didn't happened again from my first post.
Re: *CLOSED* OpenVZ problem or not ? [message #8566 is a reply to message #7489] Sun, 26 November 2006 16:32 Go to previous messageGo to next message
dagr is currently offline  dagr
Messages: 83
Registered: February 2006
Member
Had the same error with apache. Found out later that in httpd.conf php4 and php5 modules were loaded simultaneously (left from old php compile ). Beware -)
Re: OpenVZ problem or not ? [message #8592 is a reply to message #7489] Tue, 28 November 2006 11:50 Go to previous messageGo to next message
n00b_admin is currently offline  n00b_admin
Messages: 77
Registered: July 2006
Location: Romania
Member
Well it seems it may be an openvz issue after all.

It's been a while now and the crash didn't ocuur again but a weird thing happened.

I had an apache server installed on the HN to try a vps management php script but i dropped the idea at that time. The server was still running and it was set to start with the HN. After some work i've done on the HN that involved restarting the machine i noticed the webserver started on the HN an stopped it.

At first i didn't noticed but the phones started ringing. All apache servers in the vps's were stopped !

I stumbled on this same issue with a qmail server left on the HN that interfered with another one installed in a vps.

As a rule of thumb it is best NOT to run any services on the HN that may be running in a vps !
Re: OpenVZ problem or not ? [message #8593 is a reply to message #8592] Tue, 28 November 2006 12:01 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

maybe its scripts run something like:
# killall httpd
from VE0?
sure, this command kills all the users.

just don't run anything in VE0 except for ssh.
This is a very simple rule of secure system regarding to any virtualization system.


http://static.openvz.org/userbars/openvz-developer.png
Re: OpenVZ problem or not ? [message #8602 is a reply to message #7489] Tue, 28 November 2006 23:53 Go to previous message
rickb is currently offline  rickb
Messages: 368
Registered: October 2006
Senior Member
Running any services on the HN is ill advised. The redhat init scripts aren't expecting many processes to be running, even though they are under VE context, HN can still see them in its /proc. So, if the initscript kills them in a lazy way, (killall), your VEs will suffer.

However, the problem exists for applications which are not "services"-apache, mysql, etc. For example, take crond in redhat/centos:

[root@gallium ~]# /etc/init.d/crond status
crond (pid 17077 8226 16661 22837 603 6829 19314 27367 14965 29437 9188 13289 9025 6810 6413 2284 18725 12985 4776 29353 24821 16585 6167 2093 28784 26881 8950 4117 15282 2945 2389 28970 25190 12507 9026) is running...
[root@gallium ~]#

Those are the cronds of VEs. Stopping cron from /etc/init.d/cron:

stop() {
echo -n $"Stopping $prog: "
if [ ! -e /var/lock/subsys/crond ]; then
echo -n $"cannot stop crond: crond is not running."
failure $"cannot stop crond: crond is not running."
echo
return 1;
fi
killproc crond
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/crond;
return $RETVAL
}


killproc:
pid=
if [ -f /var/run/${base}.pid ]; then
local line p
read line < /var/run/${base}.pid
for p in $line ; do
[ -z "${p//[0-9]/}" -a -d "/proc/$p" ] && pid="$pid $p"
done
fi
if [ -z "$pid" ]; then
pid=`pidof -o $$ -o $PPID -o %PPID -x $1 || \
pidof -o $$ -o $PPID -o %PPID -x $base`
fi


So, if the pid file exists, there shouldn't be a problem. But, if that first [if] can't find the pid file, theres gonna be trouble for all of your VE cronds.

[root@gallium ~]# pidof crond
17077 8226 16661 22837 603 6829 19314 27367 14965 29437 9188 13289 9025 6810 6413 2284 18725 12985 4776 29353 24821 16585 6167 2093 28784 26881 8950 4117 15282 2945 2389 28970 25190 12507 9026
[root@gallium ~]# vzpid 17077
Pid VPSID Name
17077 9889165 crond
[root@gallium ~]# vzpid 8226
Pid VPSID Name
8226 9889429 crond


This is probably the reason all of your httpd's got killed. Hope this helps.

Rick Blundell


-------------
Common Terms I post with: http://wiki.openvz.org/Category:Definitions

UBC. Learn it, love it, live it: http://wiki.openvz.org/Proc/user_beancounters

[Updated on: Tue, 28 November 2006 23:54]

Report message to a moderator

Previous Topic: How to install OpenVZ on Ubuntu ?
Next Topic: DDOS attack on VPS
Goto Forum:
  


Current Time: Sat Jul 27 12:19:29 GMT 2024

Total time taken to generate the page: 0.02832 seconds