OpenVZ Forum


Home » General » Support » *SOLVED* Kernel Panics Often! HELP
*SOLVED* Kernel Panics Often! HELP [message #11301] Sun, 18 March 2007 00:57 Go to next message
Vetrox is currently offline  Vetrox
Messages: 1
Registered: March 2007
Junior Member
Hey guys,

I'm not the smartest tool in the shed with OpenVZ, but I have tried everything with this problem. I'm running Pentium D 3.4 GhZ, with 2 GB RAM, and 160 GB Hard Drive. I'm using HyperVM as well. My kernel panics often, three times a day sometimes - and tells me theres so many kernel bugs. I know for sure this isn't a hardware issue, so please tell me what could be to blame. I just installed the version of OpenVZ that comes with HyperVM. If you guys need any info, let me know. Also - the RAM will fill to 99.9% with cache, not utilize swap - and crash. (panic)

If you guys need logs or anything relevant - let me know

Regards,
Joe

(I'm on CentOS 4.4 by the way)

[Updated on: Mon, 26 March 2007 08:48] by Moderator

Report message to a moderator

Re: Kernel Panics Often! HELP [message #11302 is a reply to message #11301] Sun, 18 March 2007 19:35 Go to previous messageGo to next message
madguy24 is currently offline  madguy24
Messages: 6
Registered: March 2007
Junior Member
As a tech who tried to fix issue, server rises in load after it reports a kernel bug. See the latest ones. I did a custom compile of kernel, version 2.6.18. For me, it looks like a faulty memory, but server owner said, he replaced every hardware.

Mar 19 02:45:01 localhost kernel: BUG: warning at kernel/ub/ub_page_bc.c:322/pb_dup_ref() 

.....repeated thrice followed by,

Mar 19 02:45:05 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000002
Mar 19 02:45:05 localhost kernel:  printing eip:
Mar 19 02:45:05 localhost kernel: c0143c41
Mar 19 02:45:05 localhost kernel: *pde = 00000000
Mar 19 02:45:05 localhost kernel: Oops: 0000 [#1]
Mar 19 02:45:05 localhost kernel: SMP
Mar 19 02:45:05 localhost kernel: Modules linked in: simfs vzethdev vzrst ip_nat vzcpt ip_conntrack vzdquota af_packet xt_tcpudp xt_length ipt_ttl xt_tcpmss ipt_TCPMSS iptable_mangle xt_multiport xt_limit ipt_tos ipt_REJECT iptable_filter ip_tables x_tables parport_pc lp parport autofs4 sunrpc vznetdev vzmon vzdev thermal processor fan button battery asus_acpi ac uhci_hcd ehci_hcd usbcore i2c_i801 i2c_core 8139too mii
Mar 19 02:45:05 localhost kernel: CPU:    1, VCPU: 149.0
Mar 19 02:45:05 localhost kernel: EIP:    0060:[<c0143c41>]    Not tainted VLI
Mar 19 02:45:05 localhost kernel: EFLAGS: 00010286   (2.6.18-028 #3)
Mar 19 02:45:05 localhost kernel: EIP is at ub_page_uncharge+0x31/0x90
Mar 19 02:45:05 localhost kernel: eax: fffffffe   ebx: c203c6b0   ecx: 00000000   edx: 00000001
Mar 19 02:45:05 localhost kernel: esi: f6515600   edi: c203c6b0   ebp: c0539a40   esp: f4a59dcc
Mar 19 02:45:05 localhost kernel: ds: 007b   es: 007b   ss: 0068
Mar 19 02:45:05 localhost kernel: Process dcpumon (pid: 22650, veid: 149, ti=f4a59000 task=f4969340 task.ti=f4a59000)
Mar 19 02:45:05 localhost kernel: Stack: f4a59f20 00000000 c203c6b0 c0539a20 00000000 c015e8f5 c203c6b0 00000000
Mar 19 02:45:05 localhost kernel:        00000000 000006e6 c05399e0 0000000c f4a59e1c 0000000e 0000000e c015f32c
Mar 19 02:45:05 localhost kernel:        c1bce7ac 00000000 c0161fc2 f4a59e1c 0000000e 00000000 c1d7d564 c1d33cec
Mar 19 02:45:05 localhost kernel:  Call Trace:
Mar 19 02:45:05 localhost kernel:  [<c015e8f5>] free_hot_cold_page+0xf5/0x1a0
Mar 19 02:45:05 localhost kernel:  [<c015f32c>] __pagevec_free+0x1c/0x30
Mar 19 02:45:05 localhost kernel:  [<c0161fc2>] release_pages+0x102/0x190
Mar 19 02:45:05 localhost kernel:  [<c0172287>] free_pages_and_swap_cache+0x77/0xa0
Mar 19 02:45:05 localhost kernel:  [<c016838f>] zap_pte_range+0x27f/0x340
Mar 19 02:45:05 localhost kernel:  [<c0168514>] unmap_page_range+0xc4/0x160
Mar 19 02:45:05 localhost kernel:  [<c0168685>] unmap_vmas+0xd5/0x200
Mar 19 02:45:05 localhost kernel:  [<c016de65>] exit_mmap+0x85/0x120
Mar 19 02:45:05 localhost kernel:  [<c0121a88>] mmput+0x38/0xc0
Mar 19 02:45:05 localhost kernel:  [<c012859e>] do_exit+0xfe/0x480
Mar 19 02:45:05 localhost kernel:  [<c0128986>] do_group_exit+0x36/0xa0
Mar 19 02:45:05 localhost kernel:  [<c01031c7>] syscall_call+0x7/0xb
Mar 19 02:45:05 localhost kernel: Code: 1c 89 7c 24 10 8b 7c 24 18 89 5c 24 08 89 74 24 0c 8b 77 20 85 f6 74 4f 89 e2 8b 86 30 05 00 00 81 e2 00 f0 ff ff 8b 52 10 f7 d0 <8b> 14 90 b8 01 00 00 00 d3 e0 29 42 20 81 3e 75 62 75 62 75 37
Mar 19 02:45:05 localhost kernel: EIP: [<c0143c41>] ub_page_uncharge+0x31/0x90 SS:ESP 0068:f4a59dcc
Mar 19 02:45:05 localhost kernel: Fixing recursive fault but reboot is needed!
Mar 19 02:45:05 localhost kernel: BUG: scheduling while atomic: dcpumon/0x00000001/22650


============================================

Most common error is in rmap.c which repeats very frequently. as below.

Mar 19 02:45:29 localhost kernel: ------------[ cut here ]------------
Mar 19 02:45:29 localhost kernel: kernel BUG at mm/rmap.c:529!
Mar 19 02:45:29 localhost kernel: invalid opcode: 0000 [#2]
Mar 19 02:45:29 localhost kernel: SMP
Mar 19 02:45:29 localhost kernel: Modules linked in: simfs vzethdev vzrst ip_nat vzcpt ip_conntrack vzdquota af_packet xt_tcpudp xt_length ipt_ttl xt_tcpmss ipt_TCPMSS iptable_mangle xt_multiport xt_limit ipt_tos ipt_REJECT iptable_filter ip_tables x_tables parport_pc lp parport autofs4 sunrpc vznetdev vzmon vzdev thermal processor fan button battery asus_acpi ac uhci_hcd ehci_hcd usbcore i2c_i801 i2c_core 8139too mii
Mar 19 02:45:29 localhost kernel: CPU:    1, VCPU: 120.1
Mar 19 02:45:29 localhost kernel: EIP:    0060:[<c016feb7>]    Not tainted VLI
Mar 19 02:45:29 localhost kernel: EFLAGS: 00010286   (2.6.18-028 #3)
Mar 19 02:45:29 localhost kernel: EIP is at page_remove_rmap+0x37/0x50
Mar 19 02:45:29 localhost kernel: eax: ffffffff   ebx: c1d1d6f0   ecx: c0003ea0   edx: c20b4bb4
Mar 19 02:45:29 localhost kernel: esi: c1d1d6f0   edi: fffa85a4   ebp: c20b4bb4   esp: f3d0ff10
Mar 19 02:45:29 localhost kernel: ds: 007b   es: 007b   ss: 0068
Mar 19 02:45:29 localhost kernel: Process exim (pid: 22837, veid: 120, ti=f3d0f000 task=f38446e0 task.ti=f3d0f000)
=================================


With this above kernel bug, server load started rising and finally all VPSes stopped responding. I was not able to do a 'vzctl stop veid" or a "vzctl exec 120 kill -9 22837" or even to enter the vps "vzctl enter 120". It simply hangs there and finally have to do Ctrl + C to get the shell prompt back. BTW that exim process raised the load to 100+ before I issued a reboot.

Please shed some lights into this issue. I am going clueless. I dont want to think it is kernel bug, but what are our other options to try ?
Re: Kernel Panics Often! HELP [message #11309 is a reply to message #11302] Mon, 19 March 2007 08:07 Go to previous messageGo to next message
Vasily Tarasov is currently offline  Vasily Tarasov
Messages: 1345
Registered: January 2006
Senior Member
Hello, we need the following information from you:

1) what is the kernel version you're using? You've renamed the kernel to 2.6.18-028, so we don't know the version of kernel sources you have used to compile the kernel.
2) We need the .config file you've used to compile the kernel.


Thank you,
Vasily.
Re: Kernel Panics Often! HELP [message #11310 is a reply to message #11309] Mon, 19 March 2007 08:10 Go to previous messageGo to next message
madguy24 is currently offline  madguy24
Messages: 6
Registered: March 2007
Junior Member
I used both the available rpms at openvz kernel repo.

For the source compilation, I used the kernel config , http://download.openvz.org/kernel/devel/028test018.1/configs /kernel-2.6.18-i686-smp.config.ovz

and the patch at http://download.openvz.org/kernel/devel/028test018.1/patches /
Re: Kernel Panics Often! HELP [message #11311 is a reply to message #11310] Mon, 19 March 2007 08:30 Go to previous messageGo to next message
Vasily Tarasov is currently offline  Vasily Tarasov
Messages: 1345
Registered: January 2006
Senior Member
Hmmm... It is very strange, that you have so much bugs on such widely used kernel. This makes me think, that the problem is in hardware. Can you, please, test hardware: http://wiki.openvz.org/Hardware_testing? If there will be no corruptions, we'll create a BUG.

Thank you,
Vasily.
Re: Kernel Panics Often! HELP [message #11312 is a reply to message #11311] Mon, 19 March 2007 08:35 Go to previous messageGo to next message
madguy24 is currently offline  madguy24
Messages: 6
Registered: March 2007
Junior Member
yeah that's what I wonder. I already ran memtester. But (it may be the reason) it crashed the server and owner asked not to run it again. I confirmed with him that he has replaced the RAM a week back and already the entire server itself.

Results of running memtester on 1024 out of the total 2048 gave me the following results. mapcount went negative here, which I doubted corrupt RAM and that's why asked owner about the RAM replacement and he said he did it.

last line is what I got, before it crashed.

memtester 1024M
memtester version 4.0.6 (32-bit)
Copyright (C) 2006 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 1024MB (1073741824 bytes)
got  1024MB (1073741824 bytes), trying mlock ...locked.
Loop 1:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : ok
  Checkerboard        : ok
  Bit Spread          : setting  34
Message from syslogd@localhost at Mon Mar 19 09:08:58 2007 ...
localhost kernel: Bad page state in process 'kswapd0'

Message from syslogd@localhost at Mon Mar 19 09:08:58 2007 ...
localhost kernel: page:c15bf5e4 flags:0x80000000 mapping:00000000 mapcount:-1 count:0

Message from syslogd@localhost at Mon Mar 19 09:08:58 2007 ...
localhost kernel: Trying to fix it up, but a reboot is needed

Message from syslogd@localhost at Mon Mar 19 09:08:58 2007 ...
localhost kernel: Backtrace:
ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok

Loop 2:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : setting  33
Re: Kernel Panics Often! HELP [message #11313 is a reply to message #11312] Mon, 19 March 2007 08:47 Go to previous messageGo to next message
Vasily Tarasov is currently offline  Vasily Tarasov
Messages: 1345
Registered: January 2006
Senior Member
I guess, you should convince the owner to run memtest on the new node again. At the moment there is a huge amount of broken memory on market, and it's quite possible that the second node is also broken. Frankly speaking each production node should be tested before installing OS there.

Thank you,
Vasily
Re: Kernel Panics Often! HELP [message #11314 is a reply to message #11313] Mon, 19 March 2007 08:48 Go to previous message
madguy24 is currently offline  madguy24
Messages: 6
Registered: March 2007
Junior Member
Thanks Vasily. I will try and report it back.
Previous Topic: *SOLVED* new kernel-smp-2.6.18-ovz028stab021.1.x86_64
Next Topic: *SOLVED* Need kernel 2.6.9-023stab039.1-smp
Goto Forum:
  


Current Time: Fri Apr 26 02:58:10 GMT 2024

Total time taken to generate the page: 0.01647 seconds