OpenVZ Forum


Home » General » Support » OpenVZ Crashing (OpenVZ Crashing)
OpenVZ Crashing [message #38493] Sat, 26 December 2009 13:51 Go to next message
andre is currently offline  andre
Messages: 36
Registered: January 2008
Member
Yesterday one of our servers started crashing from 3 to 3 hours. We thought it was a hardware issue so we moved all VE's to another hardware (different hardware, even HD's).

After moving, the new hardware started crashing. Couldn't get much information, just this messages before it crashes:

Route hash chain too long!
Adjust your secret_interval!

vzctl-3.0.23-1
vzctl-lib-3.0.23-1
ovzkernel-2.6.18-164.2.1.el5.028stab066.10

We were with older versions before (on the 1st server), did an upgrade to these versions and problem persists.

Thanks
Re: OpenVZ Crashing [message #38496 is a reply to message #38493] Sat, 26 December 2009 20:19 Go to previous messageGo to next message
kir is currently offline  kir
Messages: 1645
Registered: August 2005
Location: Moscow, Russia
Senior Member

http://wiki.openvz.org/When_you_have_an_oops

Kir Kolyshkin
http://static.openvz.org/userbars/openvz-developer.png
Re: OpenVZ Crashing [message #38556 is a reply to message #38493] Thu, 31 December 2009 15:17 Go to previous messageGo to next message
TheWiseOne is currently offline  TheWiseOne
Messages: 66
Registered: September 2005
Location: Pennsylvania
Member
We started noticing this same EXACT issue on our production Virtuozzo nodes (66.10 kernel). We have KVM hooked up to the server to view the console, there is no OOPs/panic to catch, but we do see the two messages this user posted.

Matt Ayres
TekTonic
Re: OpenVZ Crashing [message #38559 is a reply to message #38493] Fri, 01 January 2010 13:23 Go to previous messageGo to next message
frog252 is currently offline  frog252
Messages: 2
Registered: April 2007
Junior Member
We have the exact same problem as well, running 2.6.18-164.2.1.el5.028stab066.10

We had some "neighbor leakage" errors on the console initially, so as per the Paralells forum we added:

kernel.pid_max = 32768
net.ipv4.tcp_mem = 786432 1048576 1572864

into /etc/sysctl.conf

However now machines are crashing with this:

Quote:

Jan 1 01:00:01 vz1 kernel: Route hash chain too long!
Jan 1 01:00:01 vz1 kernel: Adjust your secret_interval!
Jan 1 01:05:28 vz1 kernel: irq 82: nobody cared (try booting with the "irqpoll" option)
Jan 1 01:05:28 vz1 kernel:
Jan 1 01:05:28 vz1 kernel: Call Trace:
Jan 1 01:05:28 vz1 kernel: <IRQ> [<ffffffff800c3cdf>] __report_bad_irq+0x30/0x7d
Jan 1 01:05:28 vz1 kernel: [<ffffffff800c3f12>] note_interrupt+0x1e6/0x227
Jan 1 01:05:28 vz1 kernel: [<ffffffff800c340e>] __do_IRQ+0xbd/0x103
Jan 1 01:05:28 vz1 kernel: [<ffffffff8006e253>] do_IRQ+0x13f/0x14d
Jan 1 01:05:28 vz1 kernel: [<ffffffff80060665>] ret_from_intr+0x0/0xa
Jan 1 01:05:28 vz1 kernel: [<ffffffff800660bc>] .text.lock.spinlock+0x2/0x30
Jan 1 01:05:28 vz1 kernel: [<ffffffff8023f6b3>] rt_check_expire+0xd6/0x21d
Jan 1 01:05:28 vz1 kernel: [<ffffffff8023f5dd>] rt_check_expire+0x0/0x21d
Jan 1 01:05:28 vz1 kernel: [<ffffffff8009729b>] run_timer_softirq+0x153/0x1e6
Jan 1 01:05:28 vz1 kernel: [<ffffffff80011d6f>] __do_softirq+0xfa/0x1d4
Jan 1 01:05:28 vz1 kernel: [<ffffffff8006134c>] call_softirq+0x1c/0x28
Jan 1 01:05:28 vz1 kernel: [<ffffffff8006e28d>] do_softirq+0x2c/0x85
Jan 1 01:05:28 vz1 kernel: [<ffffffff80190ec8>] acpi_processor_idle+0x0/0x3fc
Jan 1 01:05:28 vz1 kernel: [<ffffffff80060cde>] apic_timer_interrupt+0x66/0x6c
Jan 1 01:05:28 vz1 kernel: <EOI> [<ffffffff80191047>] acpi_processor_idle+0x17f/0x3fc
Jan 1 01:05:28 vz1 kernel: [<ffffffff80190f58>] acpi_processor_idle+0x90/0x3fc
Jan 1 01:05:28 vz1 kernel: [<ffffffff80190ec8>] acpi_processor_idle+0x0/0x3fc
Jan 1 01:05:28 vz1 kernel: [<ffffffff80190ec8>] acpi_processor_idle+0x0/0x3fc
Jan 1 01:05:28 vz1 kernel: [<ffffffff8004b576>] cpu_idle+0x77/0x96
Jan 1 01:05:28 vz1 kernel: [<ffffffff80407826>] start_kernel+0x240/0x245
Jan 1 01:05:28 vz1 kernel: [<ffffffff80407237>] _sinittext+0x237/0x23e
Jan 1 01:05:28 vz1 kernel:
Jan 1 01:05:28 vz1 kernel: handlers:
Jan 1 01:05:28 vz1 kernel: [<ffffffff881fc146>] (e1000_msix_other+0x0/0x9c [e1000e])
Jan 1 01:05:28 vz1 kernel: Disabling IRQ #82



Any ideas?

[Updated on: Fri, 01 January 2010 13:23]

Report message to a moderator

Re: OpenVZ Crashing [message #38560 is a reply to message #38493] Fri, 01 January 2010 15:23 Go to previous messageGo to next message
TheWiseOne is currently offline  TheWiseOne
Messages: 66
Registered: September 2005
Location: Pennsylvania
Member
I downgraded our node that was crashing every few hours to Virtuozzo 064.8 and it is fine now, I'd guess 66.7 would also work fine as we only noticed these issues in 66.10.

Matt Ayres
TekTonic
Re: OpenVZ Crashing [message #38568 is a reply to message #38493] Sun, 03 January 2010 19:26 Go to previous messageGo to next message
AnVir is currently offline  AnVir
Messages: 20
Registered: October 2009
Location: Russia
Junior Member

66.7 and 66.10 crashes in same way. Please, suggest some solution Sad

Welcome to xfes.ru OpenVZ repository mirror
Re: OpenVZ Crashing [message #38569 is a reply to message #38493] Sun, 03 January 2010 19:29 Go to previous messageGo to next message
AnVir is currently offline  AnVir
Messages: 20
Registered: October 2009
Location: Russia
Junior Member

Messages from syslog (right before crash):
Jan  3 06:48:56 vdsm5 kernel: unregister_netdevice: waiting for lo=d59ef000 to become free. Usage count = 6 ve=6899
Jan  3 06:49:36 vdsm5 last message repeated 4 times
Jan  3 06:49:36 vdsm5 kernel: unregister_netdevice: device d59ef000 marked to leak
Jan  3 06:49:36 vdsm5 kernel: free_netdev: device lo=d59ef000 leaked
Jan  3 06:49:36 vdsm5 kernel: neighbour leakage
Jan  3 07:16:28 vdsm5 kernel: Route hash chain too long!
Jan  3 07:16:28 vdsm5 kernel: Adjust your secret_interval!

vdsm5 kernel: unregister_netdevice: waiting for lo=c2e8e000 to become free. Usage count = 4 ve=7043Jan 3 21:30:16 vdsm5 kernel: unregister_netdevice: waiting for lo=c2e8e000 to become free. Usage count = 4 ve=7043


Welcome to xfes.ru OpenVZ repository mirror
Re: OpenVZ Crashing [message #38576 is a reply to message #38569] Mon, 04 January 2010 17:20 Go to previous messageGo to next message
andre is currently offline  andre
Messages: 36
Registered: January 2008
Member
Hello!

After posting on 26/12 we started trying to identify which VE was responsible for the crash. So we kept half of the VE's on one server and moved half to another. After that it stopped crashing.

Since no one had replied with same event, I thought this could be a race condition we would never experiment again.. Now as I see you had the same problem, it is time to start getting worried again Sad

If you are despaired and can't find a solution I suggest you to move half of VE's to other nodes and try to identify the one that is responsible (and keep this one isolated).
Re: OpenVZ Crashing [message #38582 is a reply to message #38493] Tue, 05 January 2010 09:58 Go to previous messageGo to next message
rcmc_ronny is currently offline  rcmc_ronny
Messages: 2
Registered: January 2010
Location: Germany
Junior Member
Hi,

no, we have the same problems with 66.10 and reverted back to 54.8, which is stable and does not show this issue.

I hope there will be an fix for this issue soon, so i hope the devs will make an post here Smile

greetz Ronny
Re: OpenVZ Crashing [message #38620 is a reply to message #38493] Fri, 08 January 2010 08:42 Go to previous messageGo to next message
BigLupu is currently offline  BigLupu
Messages: 3
Registered: January 2010
Junior Member
http:// phpsuxx.blogspot.com/2009/12/route-hash-chain-too-long-adjus t-your.html

As a workaround downgrade to RHEL 5.3 based kernel or decrease your /proc/sys/net/ipv4/route/secret_interval value (default 600), but not too much.

But I hope that a permanent fix is on it's way.
Re: OpenVZ Crashing [message #38712 is a reply to message #38493] Mon, 18 January 2010 23:08 Go to previous messageGo to next message
andre is currently offline  andre
Messages: 36
Registered: January 2008
Member
Problem started happening again.

1) We work with OpenVZ and could not find the suggested 54.8. Which version should we use to avoid this bug?

2) How much should we decrease from /proc/sys/net/ipv4/route/secret_interval? Don't know how much is "too much" Smile

Running at this moment: 2.6.18-164.2.1.el5.028stab066.10
Re: OpenVZ Crashing [message #38713 is a reply to message #38493] Tue, 19 January 2010 08:07 Go to previous messageGo to next message
rcmc_ronny is currently offline  rcmc_ronny
Messages: 2
Registered: January 2010
Location: Germany
Junior Member
Hi,

sorry, mean 64.8 (typo, sorry for that)
-> http://download.openvz.org/kernel/branches/rhel5-2.6.18/028s tab064.8/

Ronny
Re: OpenVZ Crashing [message #38721 is a reply to message #38493] Wed, 20 January 2010 13:24 Go to previous messageGo to next message
kir is currently offline  kir
Messages: 1645
Registered: August 2005
Location: Moscow, Russia
Senior Member

So basically there are 3 solutions to this:

1. Wait for 028stab067 kernel to be released
2. Downgrade to 028stab064.8
3. Rebuild the 028stab066.10 kernel, commenting out the faulty RH patch. You need to remove these two lines from the spec file:

Patch23770: linux-2.6-net-allow-for-on-demand-emergency-route-cache-flushing.patch
...and...
%patch23770 -p1


Also, this is a subject of bug #1409


Kir Kolyshkin
http://static.openvz.org/userbars/openvz-developer.png
Re: OpenVZ Crashing [message #38737 is a reply to message #38559] Fri, 22 January 2010 17:54 Go to previous messageGo to next message
digitallinx is currently offline  digitallinx
Messages: 3
Registered: October 2008
Junior Member
This is odd. We're running 028stab066.10 on a single server for a week now never faced issues and 028stab066.7 for almost two months on more than 40 servers no issues whatsoever.
I was about to reboot all servers into 028stab066.10 when I stumbled to this thread so I'm having second thoughts.
Any input from the development team?

Thank you.

EDIT: Never mind just saw the bug report. Guess it's most safe to just wait for the patched pre-compiled version.

[Updated on: Fri, 22 January 2010 18:24]

Report message to a moderator

Re: OpenVZ Crashing [message #38808 is a reply to message #38493] Tue, 02 February 2010 23:54 Go to previous messageGo to next message
andre is currently offline  andre
Messages: 36
Registered: January 2008
Member
writing just to let you know that 028stab067.4 is available (28-Jan-2010 12:25) (just found it out).

We should give it a try. Smile
Re: OpenVZ Crashing [message #40173 is a reply to message #38493] Mon, 26 July 2010 08:10 Go to previous messageGo to next message
vali.dragnuta is currently offline  vali.dragnuta
Messages: 6
Registered: December 2005
Location: Romania
Junior Member
I just had a server crash (more like hang/became non responsive )
and the last error in the logs was
Route hash chain too long!
Adjust your secret_interval!

No oops, no kernel panic.
The kernel was the same as the OP, so i fully upgraded to the last ovzkernel (centos 5 64 bit ). (2.6.18-194.8.1.el5.028stab070.2)
Now right after the reboot, i got in the logs more of the same error :
Route hash chain too long!
Adjust your secret_interval!
Route hash chain too long!
Adjust your secret_interval!
Route hash chain too long!
Adjust your secret_interval!
Route hash chain too long!
Adjust your secret_interval!
lo: 5 rebuilds is over limit, route caching disabled

As far as I understand newer kernels should not have that issue.
What I did not understand however is if "not having the issue" means a). "the system will not hang/crash" or b). "that route cache problem should not happen at all"

If it's a), it's good that the error will not cause another crash, but however I am curious why that error appears and why now. That server (and others i have) works in the same conditions for quite some time and we never saw that error before. What exactly triggers it ?

If it's b), then either the last kernels reintroduced the problem or the issue was not fixed at all. Is anybody aware of this and what workarounds exist ?
Re: OpenVZ Crashing [message #40352 is a reply to message #40173] Fri, 13 August 2010 16:01 Go to previous messageGo to next message
MikeDVB is currently offline  MikeDVB
Messages: 12
Registered: April 2010
Junior Member
I've just experienced this as well on a node that had been running for nearly 6 months with no issues. We are using KSplice rebootless kernel updates as well so perhaps an update that was applied triggered this issue.

CentOS Release 5.5
2.6.18-164.15.1.el5.028stab068.9

http://www.screen-shot.net/2010-08-13_1144.png

I'm reluctant to go to the latest OpenVZ Kernel build as it seems as soon as we switch to Deadline the Kernel crashes with a panic however I suppose we could go back to CFQ temporarily.

In the meantime I've reduced the route cache flush down to 300 seconds instead of 600 ... while it creates additional CPU load on the system every time the route cache reloads fully I'd rather have additional CPU periodically than a total loss of responsiveness.

Edit:

Almost immediately:
http://www.screen-shot.net/2010-08-13_1217.png

[root@boreas ~]# dmesg|grep '^IP route'
IP route cache hash table entries: 524288 (order: 10, 4194304 bytes)

The server however has not become unresponsive...

[Updated on: Fri, 13 August 2010 16:19]

Report message to a moderator

Re: OpenVZ Crashing [message #40354 is a reply to message #40352] Fri, 13 August 2010 16:58 Go to previous messageGo to next message
maratrus is currently offline  maratrus
Messages: 1495
Registered: August 2007
Location: Moscow
Senior Member
Hi,

according to bug#1409 is must be fixed in 029stab067.
So, I would recommend you either file a new bug report or reopen the existing one.
Re: OpenVZ Crashing [message #40355 is a reply to message #38493] Fri, 13 August 2010 17:03 Go to previous messageGo to next message
MikeDVB is currently offline  MikeDVB
Messages: 12
Registered: April 2010
Junior Member
I wonder if any of these patches perhaps caused this issue:

ofqqthgs Clear garbage data on the kernel stack when handling signals.
sq58o0s1 CVE-2009-4307: Divide-by-zero mounting an ext4 filesystem.
4k0w0ova CVE-2010-0727: Denial of Service in GFS2 locking.
j5fdodr5 Floating point state corruption after signal.
jgna6cb9 CVE-2010-1085: Divide-by-zero in Intel HDA driver.
2dmfdj4t CVE-2010-0307: Denial of service on amd64
e12qm3wp CVE-2010-1436: Privilege escalation in GFS2 server
fmge3aoa CVE-2010-1087: Oops when truncating a file in NFS
0a2yg60n CVE-2010-1187: Denial of service in TIPC

[Updated on: Fri, 13 August 2010 17:11]

Report message to a moderator

Re: OpenVZ Crashing [message #40356 is a reply to message #40355] Fri, 13 August 2010 17:14 Go to previous messageGo to next message
maratrus is currently offline  maratrus
Messages: 1495
Registered: August 2007
Location: Moscow
Senior Member
Why are you asking this question?
Are you running a custom kernel?
As far as I understand CVE-2009-4307 and
CVE-2010-0727 are subject to RHEL
2.6.18-164.17.1 update.
https://rhn.redhat.com/errata/RHSA-2010-0380.html
Re: OpenVZ Crashing [message #40357 is a reply to message #40356] Fri, 13 August 2010 17:17 Go to previous messageGo to next message
MikeDVB is currently offline  MikeDVB
Messages: 12
Registered: April 2010
Junior Member
I'm just trying to determine what is causing this Route Table issue and we're running KSplice kernel updates which installs these patches into the kernel.

I'm suspecting one of these patches as causing the issue but I'll be honest that I'm not well versed in the world of Kernels to be able to determine that on my own beyond my own simple speculation.
Re: OpenVZ Crashing [message #40358 is a reply to message #40357] Fri, 13 August 2010 17:29 Go to previous messageGo to next message
maratrus is currently offline  maratrus
Messages: 1495
Registered: August 2007
Location: Moscow
Senior Member
Quote:

CentOS Release 5.5
2.6.18-164.15.1.el5.028stab068.9


So, you kernel is not 2.6.18-164.15.1.el5.028stab068.9 because
you wrote that a bunch of patches were applied.

Then I just recommend you to get the whole list of patches or reproduce the issue on plain OpenVZ kernel.
Re: OpenVZ Crashing [message #40359 is a reply to message #40358] Fri, 13 August 2010 17:38 Go to previous messageGo to next message
MikeDVB is currently offline  MikeDVB
Messages: 12
Registered: April 2010
Junior Member
I did list the patches applied, in said above post.

This is why I was asking if one of those patches were likely culprit as I wouldn't know.

We'd boot into a newer kernel if R1Soft wasn't so incredibly slow at supporting new kernels.
Re: OpenVZ Crashing [message #40360 is a reply to message #40359] Fri, 13 August 2010 17:46 Go to previous messageGo to next message
maratrus is currently offline  maratrus
Messages: 1495
Registered: August 2007
Location: Moscow
Senior Member
Hi,

I don't think that any of those patches was harmful in this particular case so I would recommend you to report your problem
to bugzilla because it the fastest way to reach developers.
Re: OpenVZ Crashing [message #40362 is a reply to message #38493] Fri, 13 August 2010 17:57 Go to previous messageGo to next message
MikeDVB is currently offline  MikeDVB
Messages: 12
Registered: April 2010
Junior Member
Consider it done http://bugzilla.openvz.org/show_bug.cgi?id=1611
Re: OpenVZ Crashing [message #40428 is a reply to message #38493] Tue, 17 August 2010 18:08 Go to previous message
MikeDVB is currently offline  MikeDVB
Messages: 12
Registered: April 2010
Junior Member
It'd be amazingly awesome if anybody ever responded to bug reports (you know, before several weeks had passed)...

http://bugzilla.openvz.org/show_bug.cgi?id=1611
Previous Topic: Performance issue!
Next Topic: why i can't ping vps from the host?
Goto Forum:
  


Current Time: Thu Apr 25 04:16:46 GMT 2024

Total time taken to generate the page: 0.01577 seconds