OpenVZ Forum


Home » General » Support » NOHZ: local_softirq_pending 100 - is there something to worry about? (Kernel 2.6.32-042stab044.17 64bit)
NOHZ: local_softirq_pending 100 - is there something to worry about? [message #45131] Tue, 31 January 2012 20:07 Go to next message
insider
Messages: 11
Registered: January 2012
Junior Member
Hello,

Now again, after the node reboot because of network device freeze (e1000e driver bug? Still not updated driver version in openvz kernel) our node up for 22hours and we see this message in the logs:

NOHZ: local_softirq_pending 100

Is this is something we have to worry about? For a while it is not caused any problems. Or we should just ignore this message.

P.S.
Well, I think we made a mistake to put RHEL6 openvz 2.6.32 kernel on the production to early... the more we use it the more bugs, locks, freezes and problems we have. It is far far away from "stable"... Sad
All other 2.6.18 centos 5 openvz nodes runs very good.
Considering now to go back to RHEL 5 with 2.6.18 kernel for one more year, even if End-of-life for RHEL5 is alot shorter than RHEL6...
Or maybe there is solution to use 2.6.18 kernel in Centos 6 and not to use 2.6.32 until it becomes really "stable" and then just switch to a 2.6.32? What you think about it?

Thank you for any answers of suggestions.
Re: NOHZ: local_softirq_pending 100 - is there something to worry about? [message #45134 is a reply to message #45131] Tue, 31 January 2012 20:56 Go to previous messageGo to next message
Paparaciz
Messages: 302
Registered: August 2009
Senior Member
I just wanted to say that I have small set of servers running centos6 servers with rhel6 based openvz kernels with running centos5 or centos6 CT's inside them and did not have any issues of kernel panicks or whatever. it works like a charm.

I suggest that you use latest stable kernel, and if you still have issues and can provide why server crashes than submit a bug report.


p.s. I use 2.6.32-042stab044.xx kernel versions
Re: NOHZ: local_softirq_pending 100 - is there something to worry about? [message #45135 is a reply to message #45134] Tue, 31 January 2012 21:30 Go to previous messageGo to next message
insider
Messages: 11
Registered: January 2012
Junior Member
Paparaciz wrote on Tue, 31 January 2012 22:56
I just wanted to say that I have small set of servers running centos6 servers with rhel6 based openvz kernels with running centos5 or centos6 CT's inside them and did not have any issues of kernel panicks or whatever. it works like a charm.

I suggest that you use latest stable kernel, and if you still have issues and can provide why server crashes than submit a bug report.


p.s. I use 2.6.32-042stab044.xx kernel versions


Yes, in production centos 6 nodes we are using latest stable 2.6.32 versions only. And I already submited three bug reports regarding our previous issues in bugzilla with this kernel, so I hope this will help to fix these bugs to make 2.6.32 kernel more and more stable.

So, until 2.6.32 kernel becomes more stable, is there a reason to temporary use 2.6.18 stable kernel from centos 5 on the centos 6 nodes. And after some time, when 2.6.32 will be stable enough, switch back from 2.6.18 to a 2.6.32 kernel.
The reason to use centos 6 OS (not centos 5) is longer support than for centos 5. Currently this is the single reason for us to start using centos 6, because all our centos 5 nodes runs stable, but there will be day when comes end-of-life for centos 5 support, so we will be forced to move all our centos 5 nodes to a centos 6 OS anyway, and if more centos 5 nodes we will use the more moving need to be done, more downtime, more work and more problems.
Upgrade from centos 5 to centos 6 is not officially supported. So, to upgrade 5=>6 you'll have to completly reinstall centos 6 from zero and after that move containers. If there is just a few nodes, then this is not so big problem, but if there is a few tens or hunderds of nodes, then there will be a problem.
Re: NOHZ: local_softirq_pending 100 - is there something to worry about? [message #45161 is a reply to message #45131] Thu, 02 February 2012 14:38 Go to previous messageGo to next message
datahunter is currently offline  datahunter
Messages: 1
Registered: January 2012
Location: HK
Junior Member

I also has network device freeze daily .

This happen on Centos6, Hardware is Dxll R410 (Ram: 16G)

I tested it with memtest86+

No any useful info. on log file & console ....

There have 4 VPS only, and it configured by "vzsplit -n4"

Then i switch to Debian6, but network device still freeze -___-

The good news is it freeze around 2 month one time

I think it is acceptable ^ ^

Re: NOHZ: local_softirq_pending 100 - is there something to worry about? [message #48853 is a reply to message #45131] Thu, 20 December 2012 21:12 Go to previous messageGo to next message
mangelot is currently offline  mangelot
Messages: 14
Registered: January 2012
Junior Member
Same errors also here, around 2 times a day.
2 system hangs in about 1 month, dont no if it is related to this error, because i need to reset the server
(not reponding on anything / blank screen / network down)

Is it possible to add a fix in openvz kernel?
perhaps: h_t_t_p_s://lkml.org/lkml/2007/5/22/35 is related ??

Using kernel 2.6.32-042stab065.3

Regards,

Marco


www.mangelot-hosting.nl
Re: NOHZ: local_softirq_pending 100 - is there something to worry about? [message #50865 is a reply to message #45131] Sun, 17 November 2013 23:22 Go to previous messageGo to next message
skaag is currently offline  skaag
Messages: 3
Registered: November 2013
Location: New York, NY
Junior Member

I am still experiencing this, including the NOHZ message, and horrible instability in general. It's a nightmare!

My uname -a:

Linux vhost-amc-01 2.6.32-042stab081.5 #1 SMP Mon Sep 30 16:52:24 MSK 2013 x86_64 x86_64 x86_64 GNU/Linux

I can't go back to 5, I have to remain on 6. Is there a known kernel version that works really well with CentOS 6 and OpenVZ?
Re: NOHZ: local_softirq_pending 100 - is there something to worry about? [message #50867 is a reply to message #45131] Mon, 18 November 2013 00:37 Go to previous messageGo to next message
Ales is currently offline  Ales
Messages: 330
Registered: May 2009
Senior Member
Quote:
Is there a known kernel version that works really well with CentOS 6 and OpenVZ?
I'd say that the majority of CentOS 6 / OpenVZ servers work quite well, so that's the wrong question to be asking. Try filing a bug report with the information specific to your server and the problems you're having, that's probably the best course of action.
Re: NOHZ: local_softirq_pending 100 - is there something to worry about? [message #50868 is a reply to message #50867] Mon, 18 November 2013 05:29 Go to previous messageGo to next message
skaag is currently offline  skaag
Messages: 3
Registered: November 2013
Location: New York, NY
Junior Member

The problem is that when the machine crashes, there's nothing in the logs, no indication. It just happens, and I am left hanging with no data to hand to developers that might aid them in debugging and understanding the cause. I don't know how to reproduce it, it happens randomly.

All I can see is those small "warnings" in the logs, such as the NOHZ error message, CPU locking messages, etc.

For example:
[42071.390012] hrtimer: interrupt took 13881 ns

Or this one:
[30754.200039] NOHZ: local_softirq_pending 100

Or this one which happened a few days ago and complete froze the machine:
[249348.095995] BUG: soft lockup - CPU#4 stuck for 67s! [flush-8:0:866]
(for all CPU's, not just CPU#4).
Re: NOHZ: local_softirq_pending 100 - is there something to worry about? [message #50871 is a reply to message #45131] Mon, 18 November 2013 11:38 Go to previous messageGo to next message
Ales is currently offline  Ales
Messages: 330
Registered: May 2009
Senior Member
Perhaps if you post your HW configuration, someone will be able to provide more specific answers.

Also, what steps did you take to get additional debug information? Perhaps someone can see a missing step...
Re: NOHZ: local_softirq_pending 100 - is there something to worry about? [message #50897 is a reply to message #50871] Mon, 25 November 2013 19:54 Go to previous message
skaag is currently offline  skaag
Messages: 3
Registered: November 2013
Location: New York, NY
Junior Member

This is my hardware configuration (output of 'lshw'):

gist.github.com/skaag/7647697

And my 'dmesg' output:

gist.github.com/skaag/7647719

It will be great if you can give me some ideas of other commands to run, other than 'lshw' and 'dmesg', which I can then post here.
Previous Topic: tc RTNETLINK answers: Invalid argument / We have an error talking to the kernel
Next Topic: Script to backup groups of vms each day via cron job
Goto Forum:
  


Current Time: Sat Nov 09 05:17:29 GMT 2024

Total time taken to generate the page: 0.03166 seconds