OpenVZ Forum


Home » General » Support » OpenVZ strange lockups and reboots
icon9.gif  OpenVZ strange lockups and reboots [message #15756] Fri, 10 August 2007 17:52 Go to next message
newwave is currently offline  newwave
Messages: 3
Registered: August 2007
Junior Member
I apologize for the long post but i am trying to include all the relevant information...

I am having stability issues with my OpenVZ server. It randomly locks up or reboots. It seems to do this more when the box is being taxed (i.e. large rsyncs or scps, or software installs, etc). The logs are clean and box generally locks up more than it reboots. When it locks up the cron log does not log any entries at all (There are several cron jobs which run every 5 mins.) the resource limits of each VE do not appear to be exceeded (user_beancouters).

I am having difficulty narrowing it down to one process, config issue, or module. I am thinking it may be a hardware problem, but I was wondering if incorrectly set resource limits for the VEs could cause the hardware node to lock up. I was under the understanding that only the VE that was overloaded would lock up. Do I have the hardware node overallocated? I followed all the equations here:
http://wiki.openvz.org/UBC_consistency_check
http://wiki.openvz.org/UBC_interdependencies_table
and vzcfgvalidate returns Validation completed: success for all the VEs.

The system is a Dual Proc Xeon with Hyperthreading (shows 4 CPUs) and has 4 GB of RAM.

[sal@buzzsaw conf]$ cat /proc/cpuinfo
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
stepping : 5
cpu MHz : 2392.157
cache size : 512 KB

[sal@buzzsaw conf]$ cat /proc/meminfo
MemTotal: 4018884 kB
MemFree: 2360636 kB
I am currently at the kernel level:
Linux buzzsaw 2.6.18-ovz028stab039.1-smp #1 SMP Tue Jul 24 12:13:58 MSD 2007 i686 i686 i386 GNU/Linux
I have also tried:
vmlinuz-2.6.9-023stab032.1-smp and
vmlinuz-2.6.8-022stab078.14-smp

I am only running 5 VEs at this time. I have attached all the conf files if anyone is able to help. Should i disable hyperthreading? Is my openvz config overcommitted? Should i recompile the kernel from source rather then using binary RPM? I am basically at a lost, but my system is very unstable.

Also one quick question... should my 0.conf file be set to onboot="yes"?

thanks very much,

Adam Salowitz
  • Attachment: vzconfs.txt
    (Size: 11.26KB, Downloaded 219 times)
Re: OpenVZ strange lockups and reboots [message #15757 is a reply to message #15756] Fri, 10 August 2007 17:57 Go to previous messageGo to next message
rickb is currently offline  rickb
Messages: 368
Registered: October 2006
Senior Member
from my experience, the best way to troubleshoot this situation is to load up the netconsole module, and check the destination server when the server crashed for the oops. if there is no oops, its likely a hardware problem, check for overheating or just start swapping components.

Rick


-------------
Common Terms I post with: http://wiki.openvz.org/Category:Definitions

UBC. Learn it, love it, live it: http://wiki.openvz.org/Proc/user_beancounters
Re: OpenVZ strange lockups and reboots [message #15760 is a reply to message #15757] Sat, 11 August 2007 03:35 Go to previous messageGo to next message
newwave is currently offline  newwave
Messages: 3
Registered: August 2007
Junior Member
Thanks for the idea. I have never used netconsole but heard good things about it. I loaded it but I am not sure it is working. I am not seeing any kernel messages from boot when i put the following in /etc/rc.d/rc.sysinit (on a test system):

echo -n netconsole
/sbin/modprobe netconsole netconsole=4444@192.168.0.200/eth0,6666@192.168.0.66/00:D0:B7:46:BC:DA

I have 192.168.0.66 listening with "netcat -l -p 6666 -u". netconsole is showing on 192.168.0.200 with lsmod.

My test system is at home. The hardware node with the problem is actaully over 2000 miles away, so swapping hardware is not really an option :-\ On the hardware with the lockup issue, i just ran

sudo modprobe netconsole netconsole=4444@x.x.x.228/eth0,6666@x.x.x.128/00:0A:95:84:93:A2

and it shows up in lsmod but i haven't seen any kernel messages as of yet. Now i just wait for another lockup.

My question is, will netconsole work over the WAN? I don't have another box in that data center.

Btw I found these pages after i posted. Sorry i didn't see these first. I will try memtester one night but i can't boot to Memtest86+:

http://wiki.openvz.org/Hardware_testing
http://wiki.openvz.org/Remote_console_setup

Thanks again, and i will wait for another crash unless you can see something wrong with my modprobe stuff.

Adam Salowitz



Re: OpenVZ strange lockups and reboots [message #15765 is a reply to message #15760] Sat, 11 August 2007 11:00 Go to previous messageGo to next message
rickb is currently offline  rickb
Messages: 368
Registered: October 2006
Senior Member
what I do to test netconsole is load/unload something silly, like the floppy kernel module, and see if the netconsole messaging udp packets are reaching the destination server. For some reason I remember I had to set the destination to a server on the same lan for it to work, but I am not sure if this is required.

Rick


-------------
Common Terms I post with: http://wiki.openvz.org/Category:Definitions

UBC. Learn it, love it, live it: http://wiki.openvz.org/Proc/user_beancounters

[Updated on: Sat, 11 August 2007 12:19]

Report message to a moderator

Re: OpenVZ strange lockups and reboots [message #15810 is a reply to message #15765] Tue, 14 August 2007 11:02 Go to previous messageGo to next message
khorenko is currently offline  khorenko
Messages: 533
Registered: January 2006
Location: Moscow, Russia
Senior Member
rickb wrote on Sat, 11 August 2007 15:00

...
For some reason I remember I had to set the destination to a server on the same lan for it to work, but I am not sure if this is required.


Well, it's not strictly *required* but most often this is a necessary condition to get netconsole working. This happens due to non-guaranteed UDP packets delivery - so some of them can be simply dropped making logs corrupted, moreover sometimes providers just drop all the external UDP traffic.

So to make netconsole more or less reliable it makes sense to choose the node for collecting logs from the same LAN.

Hope this helps.

Konstantin.


If your problem is solved - please, report it!
It's even more important than reporting the problem itself...
Re: OpenVZ strange lockups and reboots [message #15815 is a reply to message #15810] Tue, 14 August 2007 14:31 Go to previous message
newwave is currently offline  newwave
Messages: 3
Registered: August 2007
Junior Member
As for netconsole over the WAN, would i just leave the MAC address off? It is surrounded by [] in the documentation.

Thanks for your help once again. I haven't worked on this all weekend, but luckily that means my box was stable Smile

I can load netconsole post-boot using modprobe and it shows up in /proc/modules, but i am not seeing any messages on my console server running netcat. I will try syslog next.

I know we are kind of Off Topic. I am looking for netconsole and grub mailing lists and forums to take my issue there. I will repost if i get another crash and am able to capture the cause.

thanks again, adam

[root@blackout ~]# cat /boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You do not have a /boot partition. This means that
# all kernel and initrd paths are relative to /, eg.
# root (hd0,0)
# kernel /boot/vmlinuz-version ro root=/dev/md0
# initrd /boot/initrd-version.img
#boot=/dev/hda1
default=0
timeout=5
splashimage=(hd0,0)/boot/grub/splash.xpm.gz
hiddenmenu
title openVZ1stHDD (2.6.18-8.1.8.el5.028stab039.1)
root (hd0,0)
# this kernel line is one line in the real file
kernel /boot/vmlinuz-2.6.18-8.1.8.el5.028stab039.1 ro root=/dev/md0 netconsole=4444@192.168.0.200/eth0,6666@192.168.0.66/00:D0:B7:46:BC:DA
initrd /boot/initrd-2.6.18-8.1.8.el5.028stab039.1.img
Previous Topic: ovzkernel-devel for 2.6.9
Next Topic: CentOS 5 x86_64 template metadata or precreated template?
Goto Forum:
  


Current Time: Sun Jul 14 10:45:58 GMT 2024

Total time taken to generate the page: 0.02228 seconds