OpenVZ Crashing [message #38493] |
Sat, 26 December 2009 13:51 |
andre
Messages: 36 Registered: January 2008
|
Member |
|
|
Yesterday one of our servers started crashing from 3 to 3 hours. We thought it was a hardware issue so we moved all VE's to another hardware (different hardware, even HD's).
After moving, the new hardware started crashing. Couldn't get much information, just this messages before it crashes:
Route hash chain too long!
Adjust your secret_interval!
vzctl-3.0.23-1
vzctl-lib-3.0.23-1
ovzkernel-2.6.18-164.2.1.el5.028stab066.10
We were with older versions before (on the 1st server), did an upgrade to these versions and problem persists.
Thanks
|
|
|
|
Re: OpenVZ Crashing [message #38556 is a reply to message #38493] |
Thu, 31 December 2009 15:17 |
TheWiseOne
Messages: 66 Registered: September 2005 Location: Pennsylvania
|
Member |
|
|
We started noticing this same EXACT issue on our production Virtuozzo nodes (66.10 kernel). We have KVM hooked up to the server to view the console, there is no OOPs/panic to catch, but we do see the two messages this user posted.
Matt Ayres
TekTonic
|
|
|
Re: OpenVZ Crashing [message #38559 is a reply to message #38493] |
Fri, 01 January 2010 13:23 |
frog252
Messages: 2 Registered: April 2007
|
Junior Member |
|
|
We have the exact same problem as well, running 2.6.18-164.2.1.el5.028stab066.10
We had some "neighbor leakage" errors on the console initially, so as per the Paralells forum we added:
kernel.pid_max = 32768
net.ipv4.tcp_mem = 786432 1048576 1572864
into /etc/sysctl.conf
However now machines are crashing with this:
Quote: |
Jan 1 01:00:01 vz1 kernel: Route hash chain too long!
Jan 1 01:00:01 vz1 kernel: Adjust your secret_interval!
Jan 1 01:05:28 vz1 kernel: irq 82: nobody cared (try booting with the "irqpoll" option)
Jan 1 01:05:28 vz1 kernel:
Jan 1 01:05:28 vz1 kernel: Call Trace:
Jan 1 01:05:28 vz1 kernel: <IRQ> [<ffffffff800c3cdf>] __report_bad_irq+0x30/0x7d
Jan 1 01:05:28 vz1 kernel: [<ffffffff800c3f12>] note_interrupt+0x1e6/0x227
Jan 1 01:05:28 vz1 kernel: [<ffffffff800c340e>] __do_IRQ+0xbd/0x103
Jan 1 01:05:28 vz1 kernel: [<ffffffff8006e253>] do_IRQ+0x13f/0x14d
Jan 1 01:05:28 vz1 kernel: [<ffffffff80060665>] ret_from_intr+0x0/0xa
Jan 1 01:05:28 vz1 kernel: [<ffffffff800660bc>] .text.lock.spinlock+0x2/0x30
Jan 1 01:05:28 vz1 kernel: [<ffffffff8023f6b3>] rt_check_expire+0xd6/0x21d
Jan 1 01:05:28 vz1 kernel: [<ffffffff8023f5dd>] rt_check_expire+0x0/0x21d
Jan 1 01:05:28 vz1 kernel: [<ffffffff8009729b>] run_timer_softirq+0x153/0x1e6
Jan 1 01:05:28 vz1 kernel: [<ffffffff80011d6f>] __do_softirq+0xfa/0x1d4
Jan 1 01:05:28 vz1 kernel: [<ffffffff8006134c>] call_softirq+0x1c/0x28
Jan 1 01:05:28 vz1 kernel: [<ffffffff8006e28d>] do_softirq+0x2c/0x85
Jan 1 01:05:28 vz1 kernel: [<ffffffff80190ec8>] acpi_processor_idle+0x0/0x3fc
Jan 1 01:05:28 vz1 kernel: [<ffffffff80060cde>] apic_timer_interrupt+0x66/0x6c
Jan 1 01:05:28 vz1 kernel: <EOI> [<ffffffff80191047>] acpi_processor_idle+0x17f/0x3fc
Jan 1 01:05:28 vz1 kernel: [<ffffffff80190f58>] acpi_processor_idle+0x90/0x3fc
Jan 1 01:05:28 vz1 kernel: [<ffffffff80190ec8>] acpi_processor_idle+0x0/0x3fc
Jan 1 01:05:28 vz1 kernel: [<ffffffff80190ec8>] acpi_processor_idle+0x0/0x3fc
Jan 1 01:05:28 vz1 kernel: [<ffffffff8004b576>] cpu_idle+0x77/0x96
Jan 1 01:05:28 vz1 kernel: [<ffffffff80407826>] start_kernel+0x240/0x245
Jan 1 01:05:28 vz1 kernel: [<ffffffff80407237>] _sinittext+0x237/0x23e
Jan 1 01:05:28 vz1 kernel:
Jan 1 01:05:28 vz1 kernel: handlers:
Jan 1 01:05:28 vz1 kernel: [<ffffffff881fc146>] (e1000_msix_other+0x0/0x9c [e1000e])
Jan 1 01:05:28 vz1 kernel: Disabling IRQ #82
|
Any ideas?
[Updated on: Fri, 01 January 2010 13:23] Report message to a moderator
|
|
|
Re: OpenVZ Crashing [message #38560 is a reply to message #38493] |
Fri, 01 January 2010 15:23 |
TheWiseOne
Messages: 66 Registered: September 2005 Location: Pennsylvania
|
Member |
|
|
I downgraded our node that was crashing every few hours to Virtuozzo 064.8 and it is fine now, I'd guess 66.7 would also work fine as we only noticed these issues in 66.10.
Matt Ayres
TekTonic
|
|
|
|
Re: OpenVZ Crashing [message #38569 is a reply to message #38493] |
Sun, 03 January 2010 19:29 |
|
Messages from syslog (right before crash):
Jan 3 06:48:56 vdsm5 kernel: unregister_netdevice: waiting for lo=d59ef000 to become free. Usage count = 6 ve=6899
Jan 3 06:49:36 vdsm5 last message repeated 4 times
Jan 3 06:49:36 vdsm5 kernel: unregister_netdevice: device d59ef000 marked to leak
Jan 3 06:49:36 vdsm5 kernel: free_netdev: device lo=d59ef000 leaked
Jan 3 06:49:36 vdsm5 kernel: neighbour leakage
Jan 3 07:16:28 vdsm5 kernel: Route hash chain too long!
Jan 3 07:16:28 vdsm5 kernel: Adjust your secret_interval!
vdsm5 kernel: unregister_netdevice: waiting for lo=c2e8e000 to become free. Usage count = 4 ve=7043Jan 3 21:30:16 vdsm5 kernel: unregister_netdevice: waiting for lo=c2e8e000 to become free. Usage count = 4 ve=7043
Welcome to xfes.ru OpenVZ repository mirror
|
|
|
|
|
|
|
|
|
Re: OpenVZ Crashing [message #38737 is a reply to message #38559] |
Fri, 22 January 2010 17:54 |
digitallinx
Messages: 3 Registered: October 2008
|
Junior Member |
|
|
This is odd. We're running 028stab066.10 on a single server for a week now never faced issues and 028stab066.7 for almost two months on more than 40 servers no issues whatsoever.
I was about to reboot all servers into 028stab066.10 when I stumbled to this thread so I'm having second thoughts.
Any input from the development team?
Thank you.
EDIT: Never mind just saw the bug report. Guess it's most safe to just wait for the patched pre-compiled version.
[Updated on: Fri, 22 January 2010 18:24] Report message to a moderator
|
|
|
|
|
Re: OpenVZ Crashing [message #40352 is a reply to message #40173] |
Fri, 13 August 2010 16:01 |
MikeDVB
Messages: 12 Registered: April 2010
|
Junior Member |
|
|
I've just experienced this as well on a node that had been running for nearly 6 months with no issues. We are using KSplice rebootless kernel updates as well so perhaps an update that was applied triggered this issue.
CentOS Release 5.5
2.6.18-164.15.1.el5.028stab068.9
http://www.screen-shot.net/2010-08-13_1144.png
I'm reluctant to go to the latest OpenVZ Kernel build as it seems as soon as we switch to Deadline the Kernel crashes with a panic however I suppose we could go back to CFQ temporarily.
In the meantime I've reduced the route cache flush down to 300 seconds instead of 600 ... while it creates additional CPU load on the system every time the route cache reloads fully I'd rather have additional CPU periodically than a total loss of responsiveness.
Edit:
Almost immediately:
http://www.screen-shot.net/2010-08-13_1217.png
[root@boreas ~]# dmesg|grep '^IP route'
IP route cache hash table entries: 524288 (order: 10, 4194304 bytes)
The server however has not become unresponsive...
[Updated on: Fri, 13 August 2010 16:19] Report message to a moderator
|
|
|
|
Re: OpenVZ Crashing [message #40355 is a reply to message #38493] |
Fri, 13 August 2010 17:03 |
MikeDVB
Messages: 12 Registered: April 2010
|
Junior Member |
|
|
I wonder if any of these patches perhaps caused this issue:
ofqqthgs Clear garbage data on the kernel stack when handling signals.
sq58o0s1 CVE-2009-4307: Divide-by-zero mounting an ext4 filesystem.
4k0w0ova CVE-2010-0727: Denial of Service in GFS2 locking.
j5fdodr5 Floating point state corruption after signal.
jgna6cb9 CVE-2010-1085: Divide-by-zero in Intel HDA driver.
2dmfdj4t CVE-2010-0307: Denial of service on amd64
e12qm3wp CVE-2010-1436: Privilege escalation in GFS2 server
fmge3aoa CVE-2010-1087: Oops when truncating a file in NFS
0a2yg60n CVE-2010-1187: Denial of service in TIPC
[Updated on: Fri, 13 August 2010 17:11] Report message to a moderator
|
|
|
|
|
|
|
|
|
|