OpenVZ Forum: Support » Need help diagnosing OOM Killer report

Home » General » Support » Need help diagnosing OOM Killer report

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Need help diagnosing OOM Killer report [message #20742]

Tue, 25 September 2007 20:10

rickb
Messages: 368
Registered: October 2006

Senior Member

Hello Devs, I need your help.

Kernel: ovz028stab039.1
CPU: 2x quad core 2.33 xeon
mem: 32GB
disk: Emulex Corporation LP9000 Fibre Channel, local sata disks

When doing a really long read from the fiber channel card and write to the local disks, say 20+GB which takes 45 minutes or so, example when backing up a VE, the OOM is triggered:

http://208.77.99.251/oom.txt

Can you help me understand what is happening?

Thanks!
Rick

-------------
Common Terms I post with: http://wiki.openvz.org/Category:Definitions

UBC. Learn it, love it, live it: http://wiki.openvz.org/Proc/user_beancounters

Report message to a moderator

Re: Need help diagnosing OOM Killer report [message #20744 is a reply to message #20742]

Tue, 25 September 2007 20:23

rickb
Messages: 368
Registered: October 2006

Senior Member

I asked almost the same question in may (different server/kernel).

http://forum.openvz.org/index.php?t=msg&goto=13373&

I believe this is causing the OOM killer to spawn as Vasily pointed it out last time:

Normal free:2728kB min:3756kB low:4692kB high:5632kB active:2324kB inactive:2624kB present:901120kB pages_scanned:7828 all_unreclaimable? yes

I notice this slab is rather enormous:

buffer_head : size 372822016 objsize 52

--

Any pointers or insight into what could be causing this or how I can avoid running out of normal_free would be great. To duplicate this problem I reported today, I was executing a long running tar of a many files.

Thanks!
Rick

-------------
Common Terms I post with: http://wiki.openvz.org/Category:Definitions

UBC. Learn it, love it, live it: http://wiki.openvz.org/Proc/user_beancounters

[Updated on: Tue, 25 September 2007 20:32]

Report message to a moderator

Re: Need help diagnosing OOM Killer report [message #20801 is a reply to message #20744]

Wed, 26 September 2007 12:17

vaverin
Messages: 708
Registered: September 2005

Senior Member

Rick,

could you please tell me what the kernel you have used?

OOM happens because disk activity eats some part of memory in normal zone and it becomes full.
whole normal zone is ~800mb and only buffer_head uses ~ 370Mb.
You can try to call sync in cycle (to force write to disk, it should reduce number of busy buffer_heads) -- but I'm not sure that it would be enough to prevent OOM-kill.

The simplest way to work around this issue -- increase size of normal zone.
In your kernel you have only 800 Mb used for normal zone.
Youcan switch to our enterprise kernels that have enabled 4Gb split option (at leaset for 2.6.9/rhel4 and 2.6.18/rhel5 kernels). 4Gb split patch increases size of normal zone up to 3.6 Gb.

Also you can switch to 64-bit kernels -- however it requires re-installation of hostOS on your node. In general I would recommend to use 64-bit kernels now, even if you have only 32-bit VEs

thank you,
Vasily Averin

Report message to a moderator

Re: Need help diagnosing OOM Killer report [message #20972 is a reply to message #20801]

Fri, 28 September 2007 18:16

rickb
Messages: 368
Registered: October 2006

Senior Member

As suggested by Vasily via private chat, I patched 2.6.18 with the EL5 openvz kernel release, and it solved the problem. This allows 4GB of kernel memory space under 32bit rather then 800m.

Thanks!
Rick

-------------
Common Terms I post with: http://wiki.openvz.org/Category:Definitions

UBC. Learn it, love it, live it: http://wiki.openvz.org/Proc/user_beancounters

Report message to a moderator

Previous Topic:	gentoo template does not have sshd starting by default?
Next Topic:	SOLVED X server in VE, can't install udev on debian etch

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Wed Jul 30 03:45:03 GMT 2025

Total time taken to generate the page: 0.03453 seconds