Home » General » Support » vzmond hanging during vzctl stop - ends in panic
vzmond hanging during vzctl stop - ends in panic [message #14587] |
Tue, 03 July 2007 02:14 |
jochum
Messages: 21 Registered: December 2006 Location: Naperville, IL, USA
|
Junior Member |
|
|
Hi All:
I am seeing an problem occasionally on stopping a VPS. I see the following symptoms:
1) vzlist -a reports:
VEID NPROC STATUS IP_ADDR HOSTNAME
200 0 running lss-vps140.xx.xx.xx
2) running a top on the host, shows that there is a process "vzmond/200", that is running 100% of the CPU, and has run for over 17 minutes
3) it then ends up with a kernel panic, I captured the screen to a .png file that I (hopefully) have attached, the EIP was
EIP: [<c05b8294>] __ip_route_output_key+0xf7/0x822 SS:ESP 0068:c5df3d90
other info:
uname -a
Linux lss-host40.ih.lucent.com 2.6.18-8.1.4.el5.028stab035.1PAE #1 SMP Sat Jun 9 02:27:12 MSD 2007 i686 athlon i386 GNU/Linux
I set up the host as Scientific Linux SL release 4.4 (a recompile of RedHat Enterprise 4.4, just like CentOS 4.4). I seemed to have the same problem when I set up the host as CentOS 5.0 (but I didn't capture the kernel panic, so not sure if it was is the same place).
The VPS is a CentOS 5.0 VPS.
Any other info needed that I can provide?
thanks,
Paul
[Updated on: Mon, 23 July 2007 07:34] by Moderator Report message to a moderator
|
|
|
|
|
|
|
|
|
|
|
|
Re: *NOT SOLVED* vzmond hanging during vzctl stop - ends in panic [message #15231 is a reply to message #15229] |
Mon, 23 July 2007 12:10 |
jochum
Messages: 21 Registered: December 2006 Location: Naperville, IL, USA
|
Junior Member |
|
|
Hi Den and Dave:
I am sorry, but I am not sure I understand the question. If you are asking if I installed any other package in my build (other than patch diff-ve-nfsstop-b-20070704.patch), than the answer is no (i.e. only patch was added). If you are asking if the behavior of the kernel was different once I used that patch, than I believe the answer is also no (to me, the kernel panic looks like it came from the same area, but I am not very good at reading kernel panics, so I can't guarantee that).
Also, I uploaded the kernel panic in the previous response, but forgot to add the extension .png to the file.
thanks,
Paul
|
|
|
|
|
|
|
|
Re: *NOT SOLVED* vzmond hanging during vzctl stop - ends in panic [message #15359 is a reply to message #15348] |
Fri, 27 July 2007 22:20 |
jochum
Messages: 21 Registered: December 2006 Location: Naperville, IL, USA
|
Junior Member |
|
|
Hi Den:
Thanks for responding and your patience in working on this. I will try to answer your questions, and also attach a copy of a full kernel dump.
1) Can I try Kirill's latest version?
A) I downloaded and installed his version (ovzkernel-PAE-2.6.18-8.1.8.el5.028stab039.1.i686.rpm
ovzkernel-PAE-devel-2.6.18-8.1.8.el5.028stab039.1.i686.rpm
), but they still failed.
2) I have tries this on 2 differen't VE's. Both of them are based on CentOS 5.0. One of these was the download from the OpenVZ website, the other from my own conversion of CentOS4 to CentOS5.
3) Do I have the crashes on the same node, or arbitrary ones?
I have this configuration running on 6 nodes, and I can reproduce the failure on all 6 of them.
4) Is there any specific activity before the crash?
Nothing special. Basically, I start the host, and than have been using a script to start the VPS, perform a pwd of an automounted NFS filesystem, and then stop the VPS. I can tell it will fail, when the VPS does not shut down completly (vzlist -a shows it is running but has 0 processes). Once in this state (where the VPS does not shut down completely), it takes a while (anywhere from minutes to hours) before the kernel panics. During this interval, I do receive messages on the console, (nfs) "server xxx not responding, still trying"
Here is the copy of the kernel dump. Note, this was run against Kirill's kernel, with out the included patch. I will next work on building Kirill's kernel with the new patch.
thanks,
Paul
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000040
printing eip:
c05b85e3
*pde = 37532001
Oops: 0000 [#1]
SMP
last sysfs file:
Modules linked in: simfs(U) vzethdev(U) vzrst(U) ip_nat(U) vzcpt(U) ip_conntrack(U) nfnetlink(U) vzdquota(U) xt_tcpudp(U) xt_length(U) ipt_ttl(U) xt_tcpmss(U) ipt_TCPMSS(U) iptable_mangle(U) iptable_filter(U) xt_multiport(U) xt_limit(U) ipt_tos(U) ipt_REJECT(U) ip_tables(U) x_tables(U) autofs4(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) sunrpc(U) vznetdev(U) vzmon(U) vzdev(U) ipv6(U) cpufreq_ondemand(U) dm_mirror(U) dm_mod(U) video(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) sg(U) i2c_nforce2(U) i2c_core(U) k8_edac(U) pcspkr(U) ide_cd(U) edac_mc(U) e1000(U) serio_raw(U) forcedeth(U) cdrom(U) usb_storage(U) sata_nv(U) libata(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) sd_mod(U) scsi_mod(U) raid1(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U)
CPU: 2, VCPU: 0.3
EIP: 0060:[<c05b85e3>] Not tainted VLI
EFLAGS: 00010286 (2.6.18-8.1.8.el5.028stab039.1PAE #1)
EIP is at __ip_route_output_key+0xf7/0x822
eax: 00000000 ebx: 00000000 ecx: 00000000 edx: 00000000
esi: 00000000 edi: f7f4bea4 ebp: f63e09c0 esp: f7f4bd90
ds: 007b es: 007b ss: 0068
Process events/3 (pid: 13, veid: 0, ti=f7f4a000 task=f7f49990 task.ti=f7f4a000)
Stack: f7f4bef8 00000000 00000000 00000040 00000000 f765f0c0 f4c724b8 c05d40f4
f4c724b8 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call Trace:
[<c05d40f4>] tcp_v4_send_check+0x76/0xbb
[<c05d4490>] tcp_v4_connect+0x102/0x5f8
[<c04d5d5c>] __next_cpu+0x12/0x21
[<c0418c91>] find_busiest_group+0x178/0x45c
[<c05faef7>] _spin_lock_bh+0x8/0x18
[<c05de6b8>] inet_stream_connect+0x7d/0x208
[<c05f96e7>] schedule+0xcc3/0xda1
[<f91b3bb7>] xs_tcp_connect_worker+0x221/0x2bd [sunrpc]
[<c04314d7>] run_workqueue+0x7f/0xbc
[<f91b3996>] xs_tcp_connect_worker+0x0/0x2bd [sunrpc]
[<c0431af1>] worker_thread+0xd9/0x10c
[<c0419818>] default_wake_function+0x0/0xc
[<c0431a18>] worker_thread+0x0/0x10c
[<c043464b>] kthread+0xc0/0xed
[<c043458b>] kthread+0x0/0xed
[<c05fdc77>] kernel_thread_helper+0x7/0x10
=======================
Code: 83 e6 1d e8 ea 13 f2 ff 8b 07 f7 db 8b 4f 0c 83 e3 fd 89 44 24 24 89 e0 25 00 e0 ff ff 8b 00 8b 80 d4 05 00 00 8b 80 e8 02 00 00 <8b> 40 40 89 4c 24 30 88 5c 24 39 c7 44 24 7c 00 00 00 00 89 44
EIP: [<c05b85e3>] __ip_route_output_key+0xf7/0x822 SS:ESP 0068:f7f4bd90
Kernel panic - not syncing: Fatal exception
|
|
|
Re: *NOT SOLVED* vzmond hanging during vzctl stop - ends in panic [message #15362 is a reply to message #15359] |
Sat, 28 July 2007 02:53 |
jochum
Messages: 21 Registered: December 2006 Location: Naperville, IL, USA
|
Junior Member |
|
|
Hi Den:
I rebuilt Kirill's version with the patch. The kernel panic looks the same (at least to me), I have included it below. I did not see anything unusual in the /var/log/vzctl.log file, and the only important thing I saw in /var/log/messages was the following 2 messages:
kernel: nfs: server lss-nfsa09 not responding, still
trying
Jul 27 20:23:44 lss-host40 last message repeated 4 times
thanks,
Paul
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000040
printing eip:
c05b85e3
*pde = 00003001
Oops: 0000 [#1]
SMP
last sysfs file:
Modules linked in: simfs(U) vzethdev(U) vzrst(U) ip_nat(U) vzcpt(U) ip_conntrack(U) nfnetlink(U) vzdquota(U) xt_tcpudp(U) xt_length(U) ipt_ttl(U) xt_tcpmss(U) ipt_TCPMSS(U) iptable_mangle(U) iptable_filter(U) xt_multiport(U) xt_limit(U) ipt_tos(U) ipt_REJECT(U) ip_tables(U) x_tables(U) autofs4(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) sunrpc(U) vznetdev(U) vzmon(U) vzdev(U) ipv6(U) cpufreq_ondemand(U) dm_mirror(U) dm_mod(U) video(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) i2c_nforce2(U) sg(U) pcspkr(U) i2c_core(U) e1000(U) k8_edac(U) edac_mc(U) forcedeth(U) serio_raw(U) ide_cd(U) cdrom(U) usb_storage(U) sata_nv(U) libata(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) sd_mod(U) scsi_mod(U) raid1(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U)
CPU: 1, VCPU: 0.2
EIP: 0060:[<c05b85e3>] Not tainted VLI
EFLAGS: 00010286 (2.6.18-8.1.8.el5.028stab039.1PAE.prj #1)
EIP is at __ip_route_output_key+0xf7/0x822
eax: 00000000 ebx: 00000000 ecx: 00000000 edx: 00000000
esi: 00000000 edi: c5dedea4 ebp: f63019c0 esp: c5dedd90
ds: 007b es: 007b ss: 0068
Process events/2 (pid: 12, veid: 0, ti=c5dec000 task=f7e3a0d0 task.ti=c5dec000)
Stack: c5dedef8 f7615e9e 00000000 c05bf53c f7203000 f67032bc f67032bc f7203000
c059fed2 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call Trace:
[<c05bf53c>] ip_finish_output+0x0/0x19a
[<c059fed2>] dev_hard_start_xmit+0x1b8/0x22a
[<c05c00dd>] ip_queue_xmit+0x3a5/0x3eb
[<c05d4490>] tcp_v4_connect+0x102/0x5f8
[<f8c79238>] scsi_end_request+0x9f/0xa9 [scsi_mod]
[<f8c7938b>] scsi_io_completion+0x149/0x2f3 [scsi_mod]
[<c05d40f4>] tcp_v4_send_check+0x76/0xbb
[<c05cec56>] tcp_transmit_skb+0x6a4/0x6d2
[<c042b0f9>] lock_timer_base+0x15/0x2f
[<c042b20d>] __mod_timer+0x9c/0xa6
[<c05faef7>] _spin_lock_bh+0x8/0x18
[<c05de6b8>] inet_stream_connect+0x7d/0x208
[<c05f96e7>] schedule+0xcc3/0xda1
[<f91bcbe8>] xs_tcp_connect_worker+0x221/0x2bd [sunrpc]
[<c04314d7>] run_workqueue+0x7f/0xbc
[<f91bc9c7>] xs_tcp_connect_worker+0x0/0x2bd [sunrpc]
[<c0431af1>] worker_thread+0xd9/0x10c
[<c0419818>] default_wake_function+0x0/0xc
[<c0431a18>] worker_thread+0x0/0x10c
[<c043464b>] kthread+0xc0/0xed
[<c043458b>] kthread+0x0/0xed
[<c05fdc77>] kernel_thread_helper+0x7/0x10
=======================
Code: 83 e6 1d e8 ea 13 f2 ff 8b 07 f7 db 8b 4f 0c 83 e3 fd 89 44 24 24 89 e0 25 00 e0 ff ff 8b 00 8b 80 d4 05 00 00 8b 80 e8 02 00 00 <8b> 40 40 89 4c 24 30 88 5c 24 39 c7 44 24 7c 00 00 00 00 89 44
EIP: [<c05b85e3>] __ip_route_output_key+0xf7/0x822 SS:ESP 0068:c5dedd90
Kernel panic - not syncing: Fatal exception
|
|
|
|
|
|
Re: *NOT SOLVED* vzmond hanging during vzctl stop - ends in panic [message #15456 is a reply to message #15443] |
Tue, 31 July 2007 22:48 |
jochum
Messages: 21 Registered: December 2006 Location: Naperville, IL, USA
|
Junior Member |
|
|
Hi Den:
In my limited testing, I can't get the system to panic any longer, but it does seem to corrupt NFS when I stop a VPS now, which causes the system to hang.
I built ovzkernel-2.6.18-8.1.4.el5.028stab039.1.src.rpm, with the following patches:
diff-ve-opseminit-20070723.patch
diff-ve-nfsstop-b-20070704.patch
diff-ve-nfsstop-d-20070731.patch
I do not get a kernel dump, but in /var/log/messages, I received:
Jul 31 13:43:28 lss-host40 kernel: portmap: RPC call returned error 101
Jul 31 13:43:28 lss-host40 kernel: RPC: failed to contact portmap (errno -101).
thanks,
Paul
|
|
|
|
Re: *NOT SOLVED* vzmond hanging during vzctl stop - ends in panic [message #15490 is a reply to message #15461] |
Wed, 01 August 2007 22:34 |
jochum
Messages: 21 Registered: December 2006 Location: Naperville, IL, USA
|
Junior Member |
|
|
Hi Den:
Here is the output, hope it helps.
thanks,
Paul
(ps - I know, I have a lot of processes running on the host, a lot more than normally needed, someday, I might even get a chance of cleaning some of them up )
-
Attachment: alt-t-p
(Size: 102.36KB, Downloaded 402 times)
|
|
|
|
|
|
|
Re: vzmond hanging during vzctl stop - ends in panic [message #36498 is a reply to message #14587] |
Wed, 24 June 2009 20:35 |
bbjwp
Messages: 1 Registered: June 2009
|
Junior Member |
|
|
We're presently being affected by this bug on both:
* 2.6.18-92.1.18.el5.028stab060.8
* 2.6.18-128.1.1.el5.028stab062.3
We're using CentOS 5.3 templates with NFS mounted. Unmounting the NFS before a shut down does not resolve this for us: We still get the 0 NPROC listed and the 100% CPU for vzmond/7279.
We have other hosts where this isn't a problem, but on this particular setup, we've confirmed this issue across 6 machines.
Nothing shows up in /var/log/messages or dmesg.
Yesterday, the containers stopped after a while (20 minutes?). Today, they seem to just be hanging.
|
|
|
Goto Forum:
Current Time: Mon Nov 11 01:43:04 GMT 2024
Total time taken to generate the page: 0.03524 seconds
|