el5 node crashing all of a sudden. Out of ideas [message #47800] |
Tue, 11 September 2012 08:44 |
mustardman
Messages: 91 Registered: October 2009
|
Member |
|
|
I have a node that just started crashing about a week ago. Only thing that changed is that I have added a couple new VPS's. Memory usage is still well below total RAM. Nothing in log files.
Only thing I see that is a bit suspicious is:
Route hash chain too long!
Adjust your secret_interval!
lo: 5 rebuilds is over limit, route caching disabled
After lots of googling this problem was supposed to be fixed many kernels ago but apparently isn't. I did change my secret_interval from the default of 600 to 300 afterwards but from what I have read it's not really a fix.
Any other ideas what could possibly be going on? As far as I can tell the kernel is not crashing. Just becoming non-responsive (network and maybe Disk I/O) but still responsive enough for a soft reboot as opposed to a hard reset.
I was running RHEL5 028stab099.3 x64 for months before this started happening. Since it has happened twice in the past week I did yum update of everything including the latest kernel as of this post which is RHEL5 028stab101.1 because I don't know what else I could possibly do. Right after I rebooted into that kernel I almost immediately got the
route hash chain too long!
Adjust your secret_interval!
lo: 5 rebuilds is over limit, route caching disabled
So that definitely didn't change. Not convinced that has anything to do with it because I am pretty sure I was getting that error way before these lock ups started happening.
[Updated on: Tue, 11 September 2012 08:46] Report message to a moderator
|
|
|