NOHZ: local_softirq_pending 100 - is there something to worry about? [message #45131] |
Tue, 31 January 2012 20:07 |
insider
Messages: 11 Registered: January 2012
|
Junior Member |
|
|
Hello,
Now again, after the node reboot because of network device freeze (e1000e driver bug? Still not updated driver version in openvz kernel) our node up for 22hours and we see this message in the logs:
NOHZ: local_softirq_pending 100
Is this is something we have to worry about? For a while it is not caused any problems. Or we should just ignore this message.
P.S.
Well, I think we made a mistake to put RHEL6 openvz 2.6.32 kernel on the production to early... the more we use it the more bugs, locks, freezes and problems we have. It is far far away from "stable"...
All other 2.6.18 centos 5 openvz nodes runs very good.
Considering now to go back to RHEL 5 with 2.6.18 kernel for one more year, even if End-of-life for RHEL5 is alot shorter than RHEL6...
Or maybe there is solution to use 2.6.18 kernel in Centos 6 and not to use 2.6.32 until it becomes really "stable" and then just switch to a 2.6.32? What you think about it?
Thank you for any answers of suggestions.
|
|
|
Re: NOHZ: local_softirq_pending 100 - is there something to worry about? [message #45134 is a reply to message #45131] |
Tue, 31 January 2012 20:56 |
Paparaciz
Messages: 302 Registered: August 2009
|
Senior Member |
|
|
I just wanted to say that I have small set of servers running centos6 servers with rhel6 based openvz kernels with running centos5 or centos6 CT's inside them and did not have any issues of kernel panicks or whatever. it works like a charm.
I suggest that you use latest stable kernel, and if you still have issues and can provide why server crashes than submit a bug report.
p.s. I use 2.6.32-042stab044.xx kernel versions
|
|
|
Re: NOHZ: local_softirq_pending 100 - is there something to worry about? [message #45135 is a reply to message #45134] |
Tue, 31 January 2012 21:30 |
insider
Messages: 11 Registered: January 2012
|
Junior Member |
|
|
Paparaciz wrote on Tue, 31 January 2012 22:56I just wanted to say that I have small set of servers running centos6 servers with rhel6 based openvz kernels with running centos5 or centos6 CT's inside them and did not have any issues of kernel panicks or whatever. it works like a charm.
I suggest that you use latest stable kernel, and if you still have issues and can provide why server crashes than submit a bug report.
p.s. I use 2.6.32-042stab044.xx kernel versions
Yes, in production centos 6 nodes we are using latest stable 2.6.32 versions only. And I already submited three bug reports regarding our previous issues in bugzilla with this kernel, so I hope this will help to fix these bugs to make 2.6.32 kernel more and more stable.
So, until 2.6.32 kernel becomes more stable, is there a reason to temporary use 2.6.18 stable kernel from centos 5 on the centos 6 nodes. And after some time, when 2.6.32 will be stable enough, switch back from 2.6.18 to a 2.6.32 kernel.
The reason to use centos 6 OS (not centos 5) is longer support than for centos 5. Currently this is the single reason for us to start using centos 6, because all our centos 5 nodes runs stable, but there will be day when comes end-of-life for centos 5 support, so we will be forced to move all our centos 5 nodes to a centos 6 OS anyway, and if more centos 5 nodes we will use the more moving need to be done, more downtime, more work and more problems.
Upgrade from centos 5 to centos 6 is not officially supported. So, to upgrade 5=>6 you'll have to completly reinstall centos 6 from zero and after that move containers. If there is just a few nodes, then this is not so big problem, but if there is a few tens or hunderds of nodes, then there will be a problem.
|
|
|
|
|
Re: NOHZ: local_softirq_pending 100 - is there something to worry about? [message #50865 is a reply to message #45131] |
Sun, 17 November 2013 23:22 |
|
I am still experiencing this, including the NOHZ message, and horrible instability in general. It's a nightmare!
My uname -a:
Linux vhost-amc-01 2.6.32-042stab081.5 #1 SMP Mon Sep 30 16:52:24 MSK 2013 x86_64 x86_64 x86_64 GNU/Linux
I can't go back to 5, I have to remain on 6. Is there a known kernel version that works really well with CentOS 6 and OpenVZ?
|
|
|
|
Re: NOHZ: local_softirq_pending 100 - is there something to worry about? [message #50868 is a reply to message #50867] |
Mon, 18 November 2013 05:29 |
|
The problem is that when the machine crashes, there's nothing in the logs, no indication. It just happens, and I am left hanging with no data to hand to developers that might aid them in debugging and understanding the cause. I don't know how to reproduce it, it happens randomly.
All I can see is those small "warnings" in the logs, such as the NOHZ error message, CPU locking messages, etc.
For example:
[42071.390012] hrtimer: interrupt took 13881 ns
Or this one:
[30754.200039] NOHZ: local_softirq_pending 100
Or this one which happened a few days ago and complete froze the machine:
[249348.095995] BUG: soft lockup - CPU#4 stuck for 67s! [flush-8:0:866]
(for all CPU's, not just CPU#4).
|
|
|
|
|