OpenVZ Forum


Home » General » Support » Application is being killed by kernel with signal 11  () 2 Votes
Application is being killed by kernel with signal 11 [message #7956] Tue, 31 October 2006 16:44 Go to next message
iziz is currently offline  iziz
Messages: 10
Registered: October 2006
Junior Member
Hello,

I use PostgreSQL server inside OpenVZ VE. From time to time it falls saying that one of the backends exited abnormally (after 11 signal) and possibly corrupted shared memory. This leads to whole server restart which is highly undesirable in my setup.

I checked your wiki and adjusted UBC accordingly to 4GB of RAM I have on physical machine (I set up beancounters to have some free resources for future but also big enough to have PostgreSQL stable working as I thought). Here is my /proc/user_beancounters file:

uid resource held maxheld barrier limit failcnt
98076:
kmemsize 21901424 36869261 70451200 71000000 0
lockedpages 0 0 64 64 0
privvmpages 279568 299439 1560000 1572864 0
shmpages 243748 243764 327680 327680 0
dummy 0 0 0 0 0
numproc 33 45 650 650 0
physpages 266354 275590 0 2147483647 0
vmguarpages 0 0 524288 2147483647 0
oomguarpages 267917 277186 524288 2147483647 0
numtcpsock 20 34 800 800 0
numflock 2 5 400 440 0
numpty 2 2 16 16 0
numsiginfo 0 3 256 256 0
tcpsndbuf 6684 204976 3194880 5242880 0
tcprcvbuf 0 19900 3194880 5242880 0
othersockbuf 122540 498476 1320960 4000000 0
dgramrcvbuf 0 185856 1320960 1320960 0
numothersock 81 90 1000 1000 0
dcachesize 256004 327754 4915200 54067200 0
numfile 1660 2781 12800 12800 0
dummy 0 0 0 0 0
dummy 0 0 0 0 0
dummy 0 0 0 0 0
numiptent 19 19 128 128 0

So the questions are:

1. Why does kernel kill PostgreSQL with SIGSEGV?
2. Why don't I see increments in fail counters?
3. How is that possible to understand what resource limit (if I'm right in my assumption that kernel kills PostgreSQL because of resource limits excess) PostgreSQL tried to exceed given all maxheld values?
4. What's wrong with these UBC?

Forgot to tell:

[root@pg1 ~]# uname -r
2.6.8-022stab078.21.ipsec.1-enterprise

Hope that someone can help me.
Re: Application is being killed by kernel with signal 11 [message #7972 is a reply to message #7956] Wed, 01 November 2006 11:28 Go to previous messageGo to next message
glowfish is currently offline  glowfish
Messages: 2
Registered: November 2006
Junior Member
We have the same issue but with Teamspeak running in our VPSĀ“es..

After we did the latest update to OpenVZ yesterday we can not start a single teamspeak server for our customers anymore.

Teamspeak is using an SQL Lite Database.

Thats what i get from strace:

strace ./server_linux
execve("./server_linux", ["./server_linux"], [/* 10 vars */]) = 0
+++ killed by SIGKILL +++


Daniel

[Updated on: Wed, 01 November 2006 11:30]

Report message to a moderator

Re: Application is being killed by kernel with signal 11 [message #7974 is a reply to message #7972] Wed, 01 November 2006 14:27 Go to previous messageGo to next message
Antarion is currently offline  Antarion
Messages: 17
Registered: December 2005
Location: Huerth
Junior Member
I've opened Bug #332 (http://bugzilla.openvz.org/show_bug.cgi?id=332) which is covering the teamspeak issues.

@Kir/Dev: I would very much appreciate a quick solution/workaround.

Many Thanks!

-Torsten
Re: Application is being killed by kernel with signal 11 [message #7975 is a reply to message #7972] Wed, 01 November 2006 15:44 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

hey, it looks totally different!
the original report is about SIGSEGV (signal 11) on 2.6.8 kernel,
while you report SIGKILL (signal 9) on 2.6.9.

Don't mess it please Smile


http://static.openvz.org/userbars/openvz-developer.png
Re: Application is being killed by kernel with signal 11 [message #7995 is a reply to message #7975] Thu, 02 November 2006 15:40 Go to previous messageGo to next message
iziz is currently offline  iziz
Messages: 10
Registered: October 2006
Junior Member
So what about SIGSEGV and my situation?

Are beancounters considered to be reliable and stable system with no bugs known among the community? Or they tend to have issues like mine (that is difficult to understand) from time to time?

Is it possible to turn some detailed logging on to see what's happened and why these kills occured?

If I switch on core dumps will they be able to clarify the situation?

Any help would be greatly appreciated.
Re: Application is being killed by kernel with signal 11 [message #7997 is a reply to message #7995] Thu, 02 November 2006 15:54 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

SIGSEGV is sent by kernel usually when application tries to access memory by invalid pointer. The typical reasons for this are:
1. memory corruptions. check your hardware according to http://wiki.openvz.org/Hardware_testing

2. bugs in software. this can be caught by:
a) adding kernel messages when SIGSEGV is sent.
for example, for i386 arch check arch/i386/mm/fault.c, function do_page_fault():

<skipped>
bad_area:
up_read(&mm->mmap_sem);

bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
if (error_code & 4) {
/*
* Valid to do another page fault here because this one came
* from user space.
*/
if (is_prefetch(regs, address, error_code))
return;

tsk->thread.cr2 = address;
/* Kernel addresses are always protection faults */
tsk->thread.error_code = error_code | (address >= TASK_SIZE);
tsk->thread.trap_no = 14;
info.si_signo = SIGSEGV;
info.si_errno = 0;
/* info.si_code has been set above */
info.si_addr = (void __user *)address;
force_sig_info(SIGSEGV, &info, tsk);
return;
}

SIGSEGV is sent above by force_sig_info(), so what you can do here is to print application eip address (regs->eip) to check where its problem occurs.

b) debug application with gdb to see where it gets SIGSEGV.
c) save core dump from an application and analyze it later.


http://static.openvz.org/userbars/openvz-developer.png
Re: Application is being killed by kernel with signal 11 [message #7998 is a reply to message #7997] Thu, 02 November 2006 15:58 Go to previous messageGo to next message
iziz is currently offline  iziz
Messages: 10
Registered: October 2006
Junior Member
Ok, thanks for detailed reply.

Could you please tell my how to turn coredumps on in OpenVZ? Where then am I be able to find them?
Re: Application is being killed by kernel with signal 11 [message #7999 is a reply to message #7998] Thu, 02 November 2006 16:05 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

You are welcome!

AFAIK, ulimit -c as on any other distro.

by default cores are disabled:
[dev@localhost vzctl]$ ulimit -a
core file size (blocks, -c) 0



http://static.openvz.org/userbars/openvz-developer.png
Re: Application is being killed by kernel with signal 11 [message #8000 is a reply to message #7999] Thu, 02 November 2006 16:14 Go to previous messageGo to next message
iziz is currently offline  iziz
Messages: 10
Registered: October 2006
Junior Member
I enabled core dumps on my system. Now I'll be waiting for the PostgreSQL to be killed. Hope you'll help me with core dump analysis Smile
Re: Application is being killed by kernel with signal 11 [message #8001 is a reply to message #8000] Thu, 02 November 2006 16:34 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

If possible I suggest to check memory first. this can save a lot of time for me and you.

http://static.openvz.org/userbars/openvz-developer.png
Re: Application is being killed by kernel with signal 11 [message #8002 is a reply to message #8001] Thu, 02 November 2006 16:51 Go to previous messageGo to next message
iziz is currently offline  iziz
Messages: 10
Registered: October 2006
Junior Member
I have experienced memory corruption several times in my life but do not see anything similar on that here. Faults are not random, they occur under heavy load and look very similar to each other. Moreover, PostgreSQL also was killed several times with 9 signal because of kmemsize exceed, I managed to trace it. After kmemsize barrier increase it seems to be OK.

So I suppose beancounters settings here but not a memory corruption. Anyway, the server is remote and in production enviroinment so even if I schedule memory check there, it can be carried out not earlier than 2-3 weeks from now. And I'm expecting to have first core dump on this week already Smile

Also, it's a Dell server with rich self-testing equipment that should have notified me in case of RAM problems.
Re: Application is being killed by kernel with signal 11 [message #9003 is a reply to message #7956] Wed, 13 December 2006 07:26 Go to previous messageGo to next message
lcslouis is currently offline  lcslouis
Messages: 26
Registered: September 2006
Junior Member
from what i have read this problem has been fixed by the most recent 2.6.9 Virtuozzo Kernel Release so any chance with the openvz kernel being updated with the same fixes?
this is the bugzilla thing http://bugzilla.openvz.org/show_bug.cgi?id=332 how do you apply this fix or when will the new kernel be available for download?

[Updated on: Wed, 13 December 2006 07:41]

Report message to a moderator

Re: Application is being killed by kernel with signal 11 [message #9007 is a reply to message #9003] Wed, 13 December 2006 08:46 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

it is 99% unlikely that it is related to your problem. signal number is different. you can apply this patch yourself and check. kernel will be published soon.


http://static.openvz.org/userbars/openvz-developer.png
Re: Application is being killed by kernel with signal 11 [message #9008 is a reply to message #7956] Wed, 13 December 2006 08:50 Go to previous messageGo to next message
lcslouis is currently offline  lcslouis
Messages: 26
Registered: September 2006
Junior Member
well teamspeak is giving me the process killed and from what bug 332 says its my problem exactly. the thing is i have no idea how to apply the fix. and how soon we looking i am already behind schedule as it is this kernel problem only puts me more behind. not to mention i have a lot of angry people.
Re: Application is being killed by kernel with signal 11 [message #9010 is a reply to message #9008] Wed, 13 December 2006 08:55 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

not, a problem. I will upload it for you here:
http://download.openvz.org/~dev/023stab037.2/
it will take ~10-15 minutes...



http://static.openvz.org/userbars/openvz-developer.png

[Updated on: Wed, 13 December 2006 08:58]

Report message to a moderator

Re: Application is being killed by kernel with signal 11 [message #9011 is a reply to message #9008] Wed, 13 December 2006 08:59 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

please, if you experience the problem and what it to be fixed sooner report this in the bug, otherwise we can be not in a big hurry if the original reporter just applied the fix and is happy.


http://static.openvz.org/userbars/openvz-developer.png
Re: Application is being killed by kernel with signal 11 [message #9012 is a reply to message #7956] Wed, 13 December 2006 09:30 Go to previous messageGo to next message
lcslouis is currently offline  lcslouis
Messages: 26
Registered: September 2006
Junior Member
ok just 1 question how do i upgrade the openvz kernel by itself because it is trying to upgrade the system kernel which is wrong i used yum originally to install openvz should i edit my boot config go back into the old kernel and run the upgrade? or how should i proceed i haven't done an upgrade to extent before and don't want to mess up. I have no physical access to the box.
Re: Application is being killed by kernel with signal 11 [message #9013 is a reply to message #9012] Wed, 13 December 2006 09:36 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

you don't need to upgrade, i.e. doing rpm -Uhv,
you need to install the kernel with rpm -ihv
thus all previous kernels will be left on the system.


http://static.openvz.org/userbars/openvz-developer.png
Re: Application is being killed by kernel with signal 11 [message #9020 is a reply to message #7956] Wed, 13 December 2006 13:42 Go to previous message
lcslouis is currently offline  lcslouis
Messages: 26
Registered: September 2006
Junior Member
ok its installed and it fixes the problem teamspeak is running smooth.
Previous Topic: *SOLVED* How to provide access between Virtual servers
Next Topic: VPS won't start: Got signal 9
Goto Forum:
  


Current Time: Thu May 09 02:46:05 GMT 2024

Total time taken to generate the page: 0.01751 seconds