OpenVZ Forum


Home » General » Support » *HARDWARE* Node crashed with Machine Check Exception error
*HARDWARE* Node crashed with Machine Check Exception error [message #4397] Mon, 10 July 2006 04:35 Go to next message
ha77ab is currently offline  ha77ab
Messages: 7
Registered: June 2006
Junior Member
Hi,

Our openvz server crashed three times in a day with these errors. Is it something to with openvz kernel or can it be because of bad cpus (we have dual core dual amd opteron 270)?

we use: 2.6.8-022stab078.10-enterprise


Jul  9 12:19:41 vz01 kernel: CPU 0: Machine Check Exception: 0000000000000004
Jul  9 12:19:41 vz01 kernel: Bank 4: b604a00100000813 at 000000007e4f2f68
Jul  9 12:19:41 vz01 kernel: Kernel panic: CPU context corrupt


Jul  9 05:44:28 vz01 kernel: CPU 0: Machine Check Exception: 0000000000000007
Jul  9 05:44:28 vz01 kernel: Unable to handle kernel paging request at virtual address 00f00094
Jul  9 05:44:28 vz01 kernel:  printing eip:
Jul  9 05:44:28 vz01 kernel: 023d7d02
Jul  9 05:44:28 vz01 kernel: *pde = 00000000
Jul  9 05:44:28 vz01 kernel: Oops: 0000 [#1]
Jul  9 05:44:28 vz01 kernel: Bank 4: b404a00000000a13 at 0000000055c8df68
Jul  9 05:44:28 vz01 kernel: Kernel panic: Unable to continue
Jul  9 05:44:28 vz01 kernel: SMP
Jul  9 05:44:28 vz01 kernel: Modules linked in: simfs vzdquota af_packet ip_nat_ftp ip_conntrack_ftp iptable_nat ipt_state ipt_length ipt_ttl ipt_tcpmss ipt_$
Jul  9 05:44:28 vz01 kernel: CPU:    3, VCPU: 101:3
Jul  9 05:44:28 vz01 kernel: EIP:    0060:[<023d7d02>]    Not tainted
Jul  9 05:44:28 vz01 kernel: EFLAGS: 00010282   (2.6.8-022stab078.10-enterprise)
Jul  9 05:44:30 vz01 kernel: EIP is at skb_drop_fraglist+0x22/0x50
Jul  9 05:44:30 vz01 kernel: eax: 57c8df60   ebx: 00f00000   ecx: 57c8df60   edx: 00f00000
Jul  9 05:44:30 vz01 kernel: esi: c0c0b0c0   edi: 0000003a   ebp: 09724354   esp: 875cddc8
Jul  9 05:44:30 vz01 kernel: ds: 007b   es: 007b   ss: 0068
Jul  9 05:44:30 vz01 kernel: Process spamd (pid: 2662, veid=101, threadinfo=875cc000 task=b7170d40)
Jul  9 05:44:31 vz01 kernel: Stack: 001b1b1f c0c0b0c0 023d7dea c0c0b0c0 c0c0b0c0 c0c0b0c0 023d7e20 c0c0b0c0
Jul  9 05:44:31 vz01 kernel:        0000003a 00000000 023d7ee1 c0c0b0c0 00000000 c0c0b0c0 0000003a 00000000
Jul  9 05:44:31 vz01 kernel:        02402e35 c0c0b0c0 00000000 875cdedc 0000003a 7690e600 c059f200 875cc000
Jul  9 05:44:31 vz01 kernel: Call Trace:
Jul  9 05:44:31 vz01 kernel:  [<023d7dea>] skb_release_data+0x9a/0xc0
Jul  9 05:44:31 vz01 kernel:  [<023d7e20>] kfree_skbmem+0x10/0x30
Jul  9 05:44:32 vz01 kernel:  [<023d7ee1>] __kfree_skb+0xa1/0x140
Jul  9 05:44:32 vz01 kernel:  [<02402e35>] tcp_recvmsg+0x715/0x890
Jul  9 05:44:32 vz01 kernel:  [<023d7942>] sock_common_recvmsg+0x52/0x70
Jul  9 05:44:32 vz01 kernel:  [<023d3cf0>] sock_aio_read+0x100/0x120
Jul  9 05:44:32 vz01 kernel:  [<02171dc0>] do_sync_read+0x80/0xc0
Jul  9 05:44:32 vz01 kernel:  [<0211bfbb>] vcpu_put+0x8b/0x110
Jul  9 05:44:32 vz01 kernel:  [<0211d61e>] finish_task_switch+0x3e/0x90
Jul  9 05:44:32 vz01 kernel:  [<02171e9a>] vfs_read+0x9a/0x160
Jul  9 05:44:32 vz01 kernel:  [<021721d1>] sys_read+0x51/0x80
Jul  9 05:44:32 vz01 kernel: Code: 8b 82 94 00 00 00 8b 1b 48 74 0e f0 ff 8a 94 00 00 00 0f 94


Jul  9 22:35:59 vz01 kernel: CPU 2: Machine Check Exception: 0000000000000004
Jul  9 22:35:59 vz01 kernel: Bank 4: b603200100000813 at 0000000104b27ef8
Jul  9 22:35:59 vz01 kernel: Kernel panic: CPU context corrupt



[Updated on: Mon, 10 July 2006 21:16] by Moderator

Report message to a moderator

Re: Node crashed with Machine Check Exception error [message #4444 is a reply to message #4397] Mon, 10 July 2006 21:16 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

Such messages are printed when CPU reports about detected hardware problems, e.g. memory corruptions or some other problems.
i.e. it is 100% hardware problem with your server.
http://www.answers.com/topic/machine-check-exception

You can try to check your memory/CPU as decribed here:
http://wiki.openvz.org/Hardware_testing


http://static.openvz.org/userbars/openvz-developer.png
Re: Node crashed with Machine Check Exception error [message #4574 is a reply to message #4444] Fri, 14 July 2006 19:05 Go to previous messageGo to next message
ha77ab is currently offline  ha77ab
Messages: 7
Registered: June 2006
Junior Member
could this be also a HW problem?, the same error without "Machine Check"

Jul 14 12:09:12 vz01 kernel: Unable to handle kernel paging request at virtual address feafea2d
Jul 14 12:09:12 vz01 kernel:  printing eip:
Jul 14 12:09:12 vz01 kernel: ecee0966
Jul 14 12:09:12 vz01 kernel: *pde = 00000000
Jul 14 12:09:12 vz01 kernel: Oops: 0000 [#1]
Jul 14 12:09:12 vz01 kernel: SMP
Jul 14 12:09:12 vz01 kernel: Modules linked in: simfs vzdquota af_packet ip_nat_ftp ip_conntrack_ftp iptable$
Jul 14 12:09:12 vz01 kernel: CPU:    2, VCPU: 121:1
Jul 14 12:09:12 vz01 kernel: EIP:    0060:[<ecee0966>]    Not tainted
Jul 14 12:09:12 vz01 kernel: EFLAGS: 00010282   (2.6.8-022stab078-smp)
Jul 14 12:09:12 vz01 kernel: EIP is at vzquota_transfer_usage+0x66/0x150 [vzdquota]
Jul 14 12:09:12 vz01 kernel: eax: bbb8ec8c   ebx: 67715e6c   ecx: feafea15   edx: 00000000
Jul 14 12:09:12 vz01 kernel: esi: bbb8ec8c   edi: 00000000   ebp: 67715e68   esp: 67715e20
Jul 14 12:09:12 vz01 kernel: ds: 007b   es: 007b   ss: 0068
Jul 14 12:09:12 vz01 kernel: Process cp (pid: 2831, veid=121, threadinfo=67714000 task=9779a820)
Jul 14 12:09:12 vz01 kernel: Stack: b6cf7740 67715e6c 00000001 00001000 00000000 00001000 00000000 b6cf7740
Jul 14 12:09:12 vz01 kernel:        bbb8ec8c 00000003 67715e68 ecede636 bbb8ec8c 00000003 67715e68 00000003
Jul 14 12:09:12 vz01 kernel:        b6cf7740 00000000 b6cf7740 48dce240 48dce280 67715e74 67715e74 00000002
Jul 14 12:09:12 vz01 kernel: Call Trace:
Jul 14 12:09:12 vz01 kernel:  [<ecede636>] vzquota_inode_transfer_call+0x166/0x1c0 [vzdquota]
Jul 14 12:09:12 vz01 kernel:  [<ecee0a67>] vzquota_transfer+0x17/0x30 [vzdquota]
Jul 14 12:09:12 vz01 kernel:  [<021eb8f6>] ext3_setattr+0xc6/0x270
Jul 14 12:09:12 vz01 kernel:  [<0218fbfe>] notify_change+0x1fe/0x270
Jul 14 12:09:12 vz01 kernel:  [<0216f9e6>] chown_common+0xb6/0x100
Jul 14 12:09:12 vz01 kernel:  [<0218137c>] __user_walk+0x5c/0x80
Jul 14 12:09:12 vz01 kernel:  [<0216fa7f>] sys_chown+0x4f/0x70
Jul 14 12:09:12 vz01 kernel:  [<0216e15c>] get_user_size+0x3c/0x80
Jul 14 12:09:12 vz01 kernel:  [<0216f3df>] sys_utimes+0x3f/0x50
Re: Node crashed with Machine Check Exception error [message #4584 is a reply to message #4574] Sat, 15 July 2006 11:06 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

AFAICS it is the same machine, yeah?
it can be also your hardware, though I will check for obvious errors.


http://static.openvz.org/userbars/openvz-developer.png
Re: Node crashed with Machine Check Exception error [message #4590 is a reply to message #4584] Sat, 15 July 2006 15:35 Go to previous messageGo to next message
ha77ab is currently offline  ha77ab
Messages: 7
Registered: June 2006
Junior Member
Thank you,

Yes thats the same machine, I also compiled the kernel myself but got this error, DC also replaced all the RAM modules. Now they replaced them again, I think they will swap the CPUs next time.
Re: Node crashed with Machine Check Exception error [message #4600 is a reply to message #4590] Mon, 17 July 2006 09:16 Go to previous message
kir is currently offline  kir
Messages: 1645
Registered: August 2005
Location: Moscow, Russia
Senior Member

I really suggest you to run some hardware tests as described in http://wiki.openvz.org/Hardware_testing

Kir Kolyshkin
http://static.openvz.org/userbars/openvz-developer.png
Previous Topic: kernel-2.6.16-1.2080_FC5.026test007.i686.rpm
Next Topic: *SOLVED* see processes belonging to the Host system
Goto Forum:
  


Current Time: Sun Apr 28 01:41:51 GMT 2024

Total time taken to generate the page: 0.02513 seconds