Re: HW Node Lockup Issue [message #10734 is a reply to message #10711] |
Tue, 27 February 2007 13:07 |
dev
Messages: 1693 Registered: September 2005 Location: Moscow
|
Senior Member |
|
|
shushry wrote on Tue, 27 February 2007 00:30 | Have a number of hardware nodes, all configured identically (hardware/OS). Running into a strange problem that has been occurring quite frequently, where the hw node will suddenly stop accepting any kind of tcp connection (ssh, smtp, etc). Existing sessions just freeze up, and even my direct connection to the serial port (system configured to allow com port console login), freezes up.
|
the freeze of the console means a lot... Thanks for such details.
Can you please collect the output of Alt-SysRq-P (press 5-8 times plz) and Alt-SysRq-T when it hangs?
shushry wrote on Tue, 27 February 2007 00:30 | The freeze up will happen for anywhere from 30 seconds to 15 minutes (or as long as I can tolerate before power-bouncing the box).
This is a high-bandwidth system, but not very heavy in memory or CPU (other than what supports the traffic).
One question - there is alot of inter-vps traffic occurring; for example http to proxy servers and back...upwards of 10-50 mbs. Could the tcp communication amongst virtual servers along the venet0 interface be somehow "overloading" and freezing up the box?
|
there should be no problem with lots of inter-VE traffic. it there is - it is a bug.
shushry wrote on Tue, 27 February 2007 00:30 |
Output of user_beancounters attached.
In general, I have all 63,000+ ports open for connectivity on the system. Result of "cat /proc/net/sockstat" is generally never more than 7000 - 10000 sockets in use.
HW node system specs:
64bit Centos 4.4
4GB Ram (fully utilized per allocation, but no swap dipping)
2.6.18-ovz028test015.1-smp
vzctl version 3.0.13
Any other details I can provide?
Anyone that can solve this gets lunch on me.
|
yes, see above the info which can help us to resolve.
at first I though it is TCP/IP problem, but the fact the your console freezes up makes me believe it is something else - e.g. long loop somewhere in the kernel or similar stuff.
BTW, is it a binary kernel from openvz.org or you build it yourself?
|
|
|