OpenVZ Forum


Home » General » Support » OOM didn't save the machine
Re: OOM didn't save the machine [message #35492 is a reply to message #35490] Mon, 30 March 2009 21:41 Go to previous messageGo to previous message
lazy
Messages: 16
Registered: January 2008
Junior Member
thanks to sysreq i managed to reboot the machine without a crash, I didnt find anything interesting in proc beside filp beaing eaten in slab and this

CPU 1, VCPU 3000:0
Modules linked in: vzethdev(U) vznetdev(U) simfs(U) vzrst(U) ip_nat(U) vzcpt(U) ip_conntrack(U) nfnetlink(U) ipip(U) tunnel4(U) tun(U) vzdquota(U) vzmon(U)
vzdev(U) xt_tcpudp(U) xt_length(U) ipt_ttl(U) xt_tcpmss(U) ipt_TCPMSS(U) iptable_mangle(U) iptable_filter(U) xt_multiport(U) xt_limit(U) ipt_tos(U) ipt_REJE
CT(U) ip_tables(U) x_tables(U) button(U) dm_snapshot(U) dm_mirror(U) dm_mod(U) mptctl(U) loop(U) sg(U) sr_mod(U) cdrom(U) sd_mod(U) ehci_hcd(U) ata_piix(U)
libata(U) tg3(U) uhci_hcd(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) scsi_mod(U)
Pid: 10722, comm: httpd Tainted: P
^^^^^^^^^^^ this is the unkillable 100% cpu eating monster

2.6.18-92.1.18.el5.028stab060.2 #1 028stab060
RIP: 0060:[<ffffffff8004a9b7>] [<ffffffff8004a9b7>] unix_stream_sendmsg+0x194/0x3d7
RSP: 0000:ffff8101f250fb88 EFLAGS: 00000203
RAX: 0000000000000000 RBX: ffff8101f250fee8 RCX: 00000000000003b8
RDX: fffffffffffffef0 RSI: ffff81011e35b4c0 RDI: ffff81012fbae000
RBP: 0000000000000227 R08: ffff8101c476cb80 R09: 0000000000000286
R10: 000053fe5fafdbc6 R11: ffff8101f250fb38 R12: 0000000000000000
R13: ffff8101f34b50c0 R14: 0000000000000000 R15: ffff8101b17cbc80
FS: 0000000000000000(0000) GS:ffff81022f494b40(0033) knlGS:00000000b7bd06c0
CS: 0060 DS: 007b ES: 007b CR0: 000000008005003b
CR2: 00000000b6d19030 CR3: 00000001f2634000 CR4: 00000000000006e0

Call Trace:
<NMI> <<EOE>> [<ffffffff8001df4c>] __pollwait+0x0/0xe1
[<ffffffff80055a09>] sock_sendmsg+0xd4/0xec
[<ffffffff8003f92b>] memcpy_toiovec+0x36/0x66
[<ffffffff80064edf>] _spin_lock_bh+0x9/0x14
[<ffffffff800960cc>] autoremove_wake_function+0x0/0x2e
[<ffffffff801de53f>] cmsghdr_from_user_compat_to_kern+0x180/0x20b
[<ffffffff801ca06a>] sys_sendmsg+0x217/0x28a
[<ffffffff8000c801>] do_sync_read+0xc7/0x104
[<ffffffff80064edf>] _spin_lock_bh+0x9/0x14
[<ffffffff800960cc>] autoremove_wake_function+0x0/0x2e
[<ffffffff801ddd6c>] compat_sys_socketcall+0x159/0x172
[<ffffffff800615fa>] ia32_sysret+0x0/0xa

this was the unkillable process, this apache process recives http requests and sends sockets to other http processes using sendmsg, there can be some error in this msghdr can be wrong or there might be some loop but it shouldn't kill the machine.

FD_SETSIZE is raised to 2048 if it makes any diference.

I think i might be able to reproduce some of the flow from this trace, to bad I did sysreq p only once Sad
Basicly it should be sthing like poll(), accept a socket, sendmsg the socket to another process threw one of sockets created by socketpair(PF_UNIX, SOCK_STREAM, 0, socks)


Same code runs without problems on non openvz kernels 2.6.22 25 27 for months

Any pointers how to debug it ?
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: 2.6.27 - how's it coming?
Next Topic: vzdump fix: ERROR: wrong lvm mount point
Goto Forum:
  


Current Time: Sat Nov 09 19:11:44 GMT 2024

Total time taken to generate the page: 0.03290 seconds