| Home » General » Support » vzmond hanging during vzctl stop - ends in panic Goto Forum:
	| 
		
			| vzmond hanging during vzctl stop - ends in panic [message #14587] | Tue, 03 July 2007 02:14  |  
			| 
				
				
					|  jochum Messages: 21
 Registered: December 2006
 Location: Naperville, IL, USA
 | Junior Member |  |  |  
	| Hi All: 
 I am seeing an problem occasionally on stopping a VPS.  I see the following symptoms:
 
 1) vzlist -a reports:
 VEID      NPROC STATUS  IP_ADDR         HOSTNAME
 200          0 running                 lss-vps140.xx.xx.xx
 
 2) running a top on the host, shows that there is a process "vzmond/200", that is running 100% of the CPU, and has run for over  17 minutes
 
 3) it then ends up with a kernel panic, I captured the screen to a .png file that I (hopefully) have attached, the EIP was
 EIP: [<c05b8294>] __ip_route_output_key+0xf7/0x822 SS:ESP 0068:c5df3d90
 
 other info:
 uname -a
 Linux lss-host40.ih.lucent.com 2.6.18-8.1.4.el5.028stab035.1PAE #1 SMP Sat Jun 9 02:27:12 MSD 2007 i686 athlon i386 GNU/Linux
 
 I set up the host as Scientific Linux SL release 4.4 (a recompile of RedHat Enterprise 4.4, just like CentOS 4.4).  I seemed to have the same problem when I set up the host as CentOS 5.0 (but I didn't capture the kernel panic, so not sure if it was is the same place).
 
 The VPS is a CentOS 5.0 VPS.
 
 Any other info needed that I can provide?
 
 thanks,
 Paul
 [Updated on: Mon, 23 July 2007 07:34] by Moderator Report message to a moderator |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	| 
		
			| Re: *NOT SOLVED* vzmond hanging during vzctl stop - ends in panic [message #15231 is a reply to message #15229] | Mon, 23 July 2007 12:10   |  
			| 
				
				
					|  jochum Messages: 21
 Registered: December 2006
 Location: Naperville, IL, USA
 | Junior Member |  |  |  
	| Hi Den and Dave: 
 I am sorry, but I am not sure I understand the question.  If you are asking if I installed any other package in my build (other than patch diff-ve-nfsstop-b-20070704.patch), than the answer is no (i.e. only patch was added).  If you are asking if the behavior of the kernel was different once I used that patch, than I believe the answer is also no (to me, the kernel panic looks like it came from the same area, but I am not very good at reading kernel panics, so I can't guarantee that).
 
 Also, I uploaded the kernel panic in the previous response, but forgot to add the extension .png to the file.
 
 thanks,
 
 Paul
 |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	| 
		
			| Re: *NOT SOLVED* vzmond hanging during vzctl stop - ends in panic [message #15359 is a reply to message #15348] | Fri, 27 July 2007 22:20   |  
			| 
				
				
					|  jochum Messages: 21
 Registered: December 2006
 Location: Naperville, IL, USA
 | Junior Member |  |  |  
	| Hi Den: 
 Thanks for responding and your patience in working on this.  I will try to answer your questions, and also attach a copy of a full kernel dump.
 
 1) Can I try Kirill's latest version?
 A) I downloaded and installed his version (ovzkernel-PAE-2.6.18-8.1.8.el5.028stab039.1.i686.rpm
 ovzkernel-PAE-devel-2.6.18-8.1.8.el5.028stab039.1.i686.rpm
 ), but they still failed.
 
 2) I have tries this on 2 differen't VE's.  Both of them are based on CentOS 5.0.  One of these was the download from the OpenVZ website, the other from my own conversion of CentOS4 to CentOS5.
 
 3) Do I have the crashes on the same node, or arbitrary ones?
 I have this configuration running on 6 nodes, and I can reproduce the failure on all 6 of them.
 
 4) Is there any specific activity before the crash?
 Nothing special.  Basically, I start the host, and than have been using a script to start the VPS, perform a pwd of an automounted NFS filesystem, and then stop the VPS.  I can tell it will fail, when the VPS does not shut down completly (vzlist -a shows it is running but has 0 processes).  Once in this state (where the VPS does not shut down completely), it takes a while (anywhere from minutes to hours) before the kernel panics.  During this interval, I do receive messages on the console, (nfs) "server xxx  not responding, still trying"
 
 
 Here is the copy of the kernel dump.  Note, this was run against Kirill's kernel, with out the included patch.  I will next work on building Kirill's kernel with the new patch.
 
 thanks,
 
 Paul
 
 BUG: unable to handle kernel NULL pointer dereference at virtual address 00000040
 printing eip:
 c05b85e3
 *pde = 37532001
 Oops: 0000 [#1]
 SMP
 last sysfs file:
 Modules linked in: simfs(U) vzethdev(U) vzrst(U) ip_nat(U) vzcpt(U) ip_conntrack(U) nfnetlink(U) vzdquota(U) xt_tcpudp(U) xt_length(U) ipt_ttl(U) xt_tcpmss(U) ipt_TCPMSS(U) iptable_mangle(U) iptable_filter(U) xt_multiport(U) xt_limit(U) ipt_tos(U) ipt_REJECT(U) ip_tables(U) x_tables(U) autofs4(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) sunrpc(U) vznetdev(U) vzmon(U) vzdev(U) ipv6(U) cpufreq_ondemand(U) dm_mirror(U) dm_mod(U) video(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) sg(U) i2c_nforce2(U) i2c_core(U) k8_edac(U) pcspkr(U) ide_cd(U) edac_mc(U) e1000(U) serio_raw(U) forcedeth(U) cdrom(U) usb_storage(U) sata_nv(U) libata(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) sd_mod(U) scsi_mod(U) raid1(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U)
 CPU:    2, VCPU: 0.3
 EIP:    0060:[<c05b85e3>]    Not tainted VLI
 EFLAGS: 00010286   (2.6.18-8.1.8.el5.028stab039.1PAE #1)
 EIP is at __ip_route_output_key+0xf7/0x822
 eax: 00000000   ebx: 00000000   ecx: 00000000   edx: 00000000
 esi: 00000000   edi: f7f4bea4   ebp: f63e09c0   esp: f7f4bd90
 ds: 007b   es: 007b   ss: 0068
 Process events/3 (pid: 13, veid: 0, ti=f7f4a000 task=f7f49990 task.ti=f7f4a000)
 Stack: f7f4bef8 00000000 00000000 00000040 00000000 f765f0c0 f4c724b8 c05d40f4
 f4c724b8 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 Call Trace:
 [<c05d40f4>] tcp_v4_send_check+0x76/0xbb
 [<c05d4490>] tcp_v4_connect+0x102/0x5f8
 [<c04d5d5c>] __next_cpu+0x12/0x21
 [<c0418c91>] find_busiest_group+0x178/0x45c
 [<c05faef7>] _spin_lock_bh+0x8/0x18
 [<c05de6b8>] inet_stream_connect+0x7d/0x208
 [<c05f96e7>] schedule+0xcc3/0xda1
 [<f91b3bb7>] xs_tcp_connect_worker+0x221/0x2bd [sunrpc]
 [<c04314d7>] run_workqueue+0x7f/0xbc
 [<f91b3996>] xs_tcp_connect_worker+0x0/0x2bd [sunrpc]
 [<c0431af1>] worker_thread+0xd9/0x10c
 [<c0419818>] default_wake_function+0x0/0xc
 [<c0431a18>] worker_thread+0x0/0x10c
 [<c043464b>] kthread+0xc0/0xed
 [<c043458b>] kthread+0x0/0xed
 [<c05fdc77>] kernel_thread_helper+0x7/0x10
 =======================
 Code: 83 e6 1d e8 ea 13 f2 ff 8b 07 f7 db 8b 4f 0c 83 e3 fd 89 44 24 24 89 e0 25 00 e0 ff ff 8b 00 8b 80 d4 05 00 00 8b 80 e8 02 00 00 <8b> 40 40 89 4c 24 30 88 5c 24 39 c7 44 24 7c 00 00 00 00 89 44
 EIP: [<c05b85e3>] __ip_route_output_key+0xf7/0x822 SS:ESP 0068:f7f4bd90
 Kernel panic - not syncing: Fatal exception
 
 |  
	|  |  |  
	| 
		
			| Re: *NOT SOLVED* vzmond hanging during vzctl stop - ends in panic [message #15362 is a reply to message #15359] | Sat, 28 July 2007 02:53   |  
			| 
				
				
					|  jochum Messages: 21
 Registered: December 2006
 Location: Naperville, IL, USA
 | Junior Member |  |  |  
	| Hi Den: 
 I rebuilt Kirill's version with the patch.  The kernel panic looks the same (at least to me), I have included it below.  I did not see anything unusual in the /var/log/vzctl.log file, and the only important thing I saw in /var/log/messages was the following 2 messages:
 
 kernel: nfs: server lss-nfsa09 not responding, still
 trying
 Jul 27 20:23:44 lss-host40 last message repeated 4 times
 
 thanks,
 
 Paul
 
 
 
 BUG: unable to handle kernel NULL pointer dereference at virtual address 00000040
 printing eip:
 c05b85e3
 *pde = 00003001
 Oops: 0000 [#1]
 SMP
 last sysfs file:
 Modules linked in: simfs(U) vzethdev(U) vzrst(U) ip_nat(U) vzcpt(U) ip_conntrack(U) nfnetlink(U) vzdquota(U) xt_tcpudp(U) xt_length(U) ipt_ttl(U) xt_tcpmss(U) ipt_TCPMSS(U) iptable_mangle(U) iptable_filter(U) xt_multiport(U) xt_limit(U) ipt_tos(U) ipt_REJECT(U) ip_tables(U) x_tables(U) autofs4(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) sunrpc(U) vznetdev(U) vzmon(U) vzdev(U) ipv6(U) cpufreq_ondemand(U) dm_mirror(U) dm_mod(U) video(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) i2c_nforce2(U) sg(U) pcspkr(U) i2c_core(U) e1000(U) k8_edac(U) edac_mc(U) forcedeth(U) serio_raw(U) ide_cd(U) cdrom(U) usb_storage(U) sata_nv(U) libata(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) sd_mod(U) scsi_mod(U) raid1(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U)
 CPU:    1, VCPU: 0.2
 EIP:    0060:[<c05b85e3>]    Not tainted VLI
 EFLAGS: 00010286   (2.6.18-8.1.8.el5.028stab039.1PAE.prj #1)
 EIP is at __ip_route_output_key+0xf7/0x822
 eax: 00000000   ebx: 00000000   ecx: 00000000   edx: 00000000
 esi: 00000000   edi: c5dedea4   ebp: f63019c0   esp: c5dedd90
 ds: 007b   es: 007b   ss: 0068
 Process events/2 (pid: 12, veid: 0, ti=c5dec000 task=f7e3a0d0 task.ti=c5dec000)
 Stack: c5dedef8 f7615e9e 00000000 c05bf53c f7203000 f67032bc f67032bc f7203000
 c059fed2 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 Call Trace:
 [<c05bf53c>] ip_finish_output+0x0/0x19a
 [<c059fed2>] dev_hard_start_xmit+0x1b8/0x22a
 [<c05c00dd>] ip_queue_xmit+0x3a5/0x3eb
 [<c05d4490>] tcp_v4_connect+0x102/0x5f8
 [<f8c79238>] scsi_end_request+0x9f/0xa9 [scsi_mod]
 [<f8c7938b>] scsi_io_completion+0x149/0x2f3 [scsi_mod]
 [<c05d40f4>] tcp_v4_send_check+0x76/0xbb
 [<c05cec56>] tcp_transmit_skb+0x6a4/0x6d2
 [<c042b0f9>] lock_timer_base+0x15/0x2f
 [<c042b20d>] __mod_timer+0x9c/0xa6
 [<c05faef7>] _spin_lock_bh+0x8/0x18
 [<c05de6b8>] inet_stream_connect+0x7d/0x208
 [<c05f96e7>] schedule+0xcc3/0xda1
 [<f91bcbe8>] xs_tcp_connect_worker+0x221/0x2bd [sunrpc]
 [<c04314d7>] run_workqueue+0x7f/0xbc
 [<f91bc9c7>] xs_tcp_connect_worker+0x0/0x2bd [sunrpc]
 [<c0431af1>] worker_thread+0xd9/0x10c
 [<c0419818>] default_wake_function+0x0/0xc
 [<c0431a18>] worker_thread+0x0/0x10c
 [<c043464b>] kthread+0xc0/0xed
 [<c043458b>] kthread+0x0/0xed
 [<c05fdc77>] kernel_thread_helper+0x7/0x10
 =======================
 Code: 83 e6 1d e8 ea 13 f2 ff 8b 07 f7 db 8b 4f 0c 83 e3 fd 89 44 24 24 89 e0 25 00 e0 ff ff 8b 00 8b 80 d4 05 00 00 8b 80 e8 02 00 00 <8b> 40 40 89 4c 24 30 88 5c 24 39 c7 44 24 7c 00 00 00 00 89 44
 EIP: [<c05b85e3>] __ip_route_output_key+0xf7/0x822 SS:ESP 0068:c5dedd90
 Kernel panic - not syncing: Fatal exception
 
 |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	| 
		
			| Re: *NOT SOLVED* vzmond hanging during vzctl stop - ends in panic [message #15456 is a reply to message #15443] | Tue, 31 July 2007 22:48   |  
			| 
				
				
					|  jochum Messages: 21
 Registered: December 2006
 Location: Naperville, IL, USA
 | Junior Member |  |  |  
	| Hi Den: 
 In my limited testing, I can't get the system to panic any longer, but it does seem to corrupt NFS when I stop a VPS now, which causes the system to hang.
 
 I built  ovzkernel-2.6.18-8.1.4.el5.028stab039.1.src.rpm, with the following patches:
 diff-ve-opseminit-20070723.patch
 diff-ve-nfsstop-b-20070704.patch
 diff-ve-nfsstop-d-20070731.patch
 
 I do not get a kernel dump, but in /var/log/messages, I received:
 Jul 31 13:43:28 lss-host40 kernel: portmap: RPC call returned error 101
 Jul 31 13:43:28 lss-host40 kernel: RPC: failed to contact portmap (errno -101).
 
 thanks,
 
 Paul
 |  
	|  |  |  
	|  |  
	| 
		
			| Re: *NOT SOLVED* vzmond hanging during vzctl stop - ends in panic [message #15490 is a reply to message #15461] | Wed, 01 August 2007 22:34   |  
			| 
				
				
					|  jochum Messages: 21
 Registered: December 2006
 Location: Naperville, IL, USA
 | Junior Member |  |  |  
	| Hi Den: 
 Here is the output, hope it helps.
 
 thanks,
 
 Paul
 
 (ps - I know, I have a lot of processes running on the host, a lot more than normally needed, someday, I might even get a chance of cleaning some of them up
  ) 
 
 
 
	
	 Attachment: alt-t-p (Size: 102.36KB, Downloaded 467 times)
 |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	|  |  
	| 
		
			| Re: vzmond hanging during vzctl stop - ends in panic [message #36498 is a reply to message #14587] | Wed, 24 June 2009 20:35  |  
			| 
				
				
					|  bbjwp Messages: 1
 Registered: June 2009
 | Junior Member |  |  |  
	| We're presently being affected by this bug on both: 
 * 2.6.18-92.1.18.el5.028stab060.8
 * 2.6.18-128.1.1.el5.028stab062.3
 
 We're using CentOS 5.3 templates with NFS mounted.  Unmounting the NFS before a shut down does not resolve this for us: We still get the 0 NPROC listed and the 100% CPU for vzmond/7279.
 
 We have other hosts where this isn't a problem, but on this particular setup, we've confirmed this issue across 6 machines.
 
 Nothing shows up in /var/log/messages or dmesg.
 
 Yesterday, the containers stopped after a while (20 minutes?).  Today, they seem to just be hanging.
 |  
	|  |  | 
 
 
 Current Time: Sat Oct 25 06:20:59 GMT 2025 
 Total time taken to generate the page: 0.15337 seconds |