SSH stop responding on HN [message #9030] |
Wed, 13 December 2006 23:21 |
stec
Messages: 9 Registered: December 2006
|
Junior Member |
|
|
This happend regularly after, especialy after a long up-time. When I try to connect to ssh on the host node, it either reply "ssh: connect to host hostname port 22: Operation timed out" or "ssh_exchange_identification: Connection closed by remote host". After shutting down all vm (which are working properly and responding to ssh), the reply is more likely the second one.
Any clue of what could be wrong ?
Denis
PS: This a 2.6.16-026test018 self built kernel and I have no physical access to the host.
[Updated on: Fri, 15 December 2006 08:35] Report message to a moderator
|
|
|
Re: SSH stop responding on host node [message #9035 is a reply to message #9030] |
Thu, 14 December 2006 07:08 |
rickb
Messages: 368 Registered: October 2006
|
Senior Member |
|
|
Hi, do you have ssh access to the HN now? If not, one could only theorize what the problem could be. sshd is not processing your authentication request, theres really no further details beyond that. This could be due to many thing, however without access to the server anything we tell you would be speculation and useless anyway. My recommendation would be to reboot the HN and check /var/log/messages and /var/log/secure on the HN around the times when you could not ssh to the HN..
Rick
-------------
Common Terms I post with: http://wiki.openvz.org/Category:Definitions
UBC. Learn it, love it, live it: http://wiki.openvz.org/Proc/user_beancounters
|
|
|
Re: SSH stop responding on host node [message #9039 is a reply to message #9035] |
Thu, 14 December 2006 08:12 |
stec
Messages: 9 Registered: December 2006
|
Junior Member |
|
|
Oh, sorry , I have forgot to tell you that I have an another issue found in the logs, that some process of the host are complaining that fopen report "Too many open files in system". Last time there were nothing about ssh at all, but this time, I have checked again, and here is what I found:
* A few days ago, sshd has reported "error: fork: Cannot allocate memory"
* Before I shutdown all vm, ssh report no connection attempt
* After, these shutdowns, ssh complains:
sshd[3014]: error: fork: Cannot allocate memory
Fatal resource shortage: kmemsize, UB 0.
I really do not catch from where these issues may came from, can you suggest me some investigation path ?
|
|
|
Re: SSH stop responding on host node [message #9040 is a reply to message #9039] |
Thu, 14 December 2006 08:25 |
rickb
Messages: 368 Registered: October 2006
|
Senior Member |
|
|
OK, I see now. When you said:
"When I try to connect to ssh on the host node"
You meant using ssh from the HN to the VE. Metrics numfiles, kmemsize, privvmpages exhaustion would all explain why any software malfuncitons in a VE. To verify:
cat /proc/user_beancounters
and paste here. All you need to do is raise the limits for the failcnt'ing metrics, understand how much resources your applications need and give them more then that. Have you read the UBC page in the wiki?
Rick
-------------
Common Terms I post with: http://wiki.openvz.org/Category:Definitions
UBC. Learn it, love it, live it: http://wiki.openvz.org/Proc/user_beancounters
|
|
|
|
|
Re: SSH stop responding on host node [message #9043 is a reply to message #9040] |
Thu, 14 December 2006 10:00 |
stec
Messages: 9 Registered: December 2006
|
Junior Member |
|
|
Dear Rick,
Here is an excerpt of my logs. I have remove many "fopen: Too many open files in system". Note that I have tried to connect using ssh before and after shutting down the VEs.
Dec 13 21:24:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:24:25 ns22021 Neighbour table overflow.
Dec 13 21:25:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:25:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:25:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:26:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:26:21 ns22021 IPv6 addrconf: prefix with wrong length 56
Dec 13 21:27:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:28:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:29:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:29:36 ns22021 IPv6 addrconf: prefix with wrong length 56
Dec 13 21:30:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:30:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:30:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:30:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:31:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:32:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:32:29 ns22021 IPv6 addrconf: prefix with wrong length 56
Dec 13 21:33:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:34:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:35:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:35:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:35:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:35:28 ns22021 VPS: 18639: stopped
Dec 13 21:35:38 ns22021 IPv6 addrconf: prefix with wrong length 56
Dec 13 21:35:52 ns22021 VPS: 18638: stopped
Dec 13 21:36:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:37:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:38:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:38:01 ns22021 Uncharging too much 2220 h 1368, res othersockbuf ub 18637
Dec 13 21:38:01 ns22021 Uncharging too much 2220 h 0, res othersockbuf ub 18637
Dec 13 21:38:01 ns22021 Uncharging too much 2220 h 0, res othersockbuf ub 18637
Dec 13 21:38:01 ns22021 Uncharging too much 2220 h 0, res othersockbuf ub 18637
Dec 13 21:38:01 ns22021 Uncharging too much 2220 h 0, res othersockbuf ub 18637
Dec 13 21:38:01 ns22021 Uncharging too much 2220 h 0, res othersockbuf ub 18637
Dec 13 21:38:07 ns22021 VPS: 18637: stopped
Dec 13 21:38:11 ns22021 IPv6 addrconf: prefix with wrong length 56Dec 13 21:38:26 ns22021 VPS: 18636: stopped
Dec 13 21:38:31 ns22021 Fatal resource shortage: kmemsize, UB 0.
Dec 13 21:38:31 ns22021 sshd[3014]: error: fork: Cannot allocate memory
Dec 13 21:38:35 ns22021 Fatal resource shortage: kmemsize, UB 0.
Dec 13 21:38:35 ns22021 sshd[3014]: error: fork: Cannot allocate memory
Dec 13 21:38:36 ns22021 Fatal resource shortage: kmemsize, UB 0.
Dec 13 21:38:36 ns22021 sshd[3014]: error: fork: Cannot allocate memory
Dec 13 21:38:37 ns22021 Fatal resource shortage: kmemsize, UB 0.
Dec 13 21:38:37 ns22021 sshd[3014]: error: fork: Cannot allocate memory
Dec 13 21:39:01 ns22021 cron[8714]: (CRON) error (can't fork)
Dec 13 21:39:16 ns22021 Ub 18636 helds 13320 in tcpsndbuf on put
Here is the beancounters as of now. Of course, I cannot produce those when the issue occurs since I cannot ssh anymore.
Version: 2.5
uid resource held maxheld barrier limit failcnt
0: kmemsize 109296870 109841586 2147483647 2147483647 0
lockedpages 0 0 2147483647 2147483647 0
privvmpages 3262 14897 2147483647 2147483647 0
shmpages 679 695 2147483647 2147483647 0
dummy 0 0 2147483647 2147483647 0
numproc 61 82 2147483647 2147483647 0
physpages 1560 6760 2147483647 2147483647 0
vmguarpages 0 0 2147483647 2147483647 0
oomguarpages 1560 6760 2147483647 2147483647 0
numtcpsock 3 5 2147483647 2147483647 0
numflock 1 2 2147483647 2147483647 0
numpty 1 1 2147483647 2147483647 0
numsiginfo 0 2 2147483647 2147483647 0
tcpsndbuf 31080 35520 2147483647 2147483647 0
tcprcvbuf 87564 87564 2147483647 2147483647 0
othersockbuf 11100 16224 2147483647 2147483647 0
dgramrcvbuf 0 8364 2147483647 2147483647 0
numothersock 23 27 2147483647 2147483647 0
dcachesize 0 0 2147483647 2147483647 0
numfile 525860 526009 2147483647 2147483647 0
dummy 0 0 2147483647 2147483647 0
dummy 0 0 2147483647 2147483647 0
dummy 0 0 2147483647 2147483647 0
numiptent 0 0 2147483647 2147483647 0
18636: kmemsize 2162620 2702410 18486804 20335484 0
lockedpages 0 0 902 902 0
privvmpages 63043 69808 152556 167811 0
shmpages 14 350 15255 15255 0
dummy 0 0 0 0 0
numproc 46 58 800 800 0
physpages 39341 42286 0 2147483647 0
vmguarpages 0 0 152556 2147483647 0
oomguarpages 39341 42286 152556 2147483647 0
numtcpsock 7 19 800 800 0
numflock 6 7 720 792 0
numpty 0 0 80 80 0
numsiginfo 0 2 1024 1024 0
tcpsndbuf 62160 417360 2885468 6162268 0
tcprcvbuf 114688 147456 2885468 6162268 0
othersockbuf 133012 376720 1442734 4719534 0
dgramrcvbuf 0 4268 1442734 1442734 0
numothersock 95 100 800 800 0
dcachesize 0 0 4026407 4147200 0
numfile 1339 1736 7200 7200 0
dummy 0 0 0 0 0
dummy 0 0 0 0 0
dummy 0 0 0 0 0
numiptent 0 0 200 200 0
18637: kmemsize 890020 1425116 18486804 20335484 0
lockedpages 0 0 902 902 0
privvmpages 12696 19885 152556 167811 0
shmpages 14 350 15255 15255 0
dummy 0 0 0 0 0
numproc 18 31 800 800 0
physpages 3081 6911 0 2147483647 0
vmguarpages 0 0 152556 2147483647 0
oomguarpages 3081 6911 152556 2147483647 0
numtcpsock 11 25 800 800 0
numflock 2 3 720 792 0
numpty 0 0 80 80 0
numsiginfo 0 2 1024 1024 0
tcpsndbuf 97680 426240 2885468 6162268 0
tcprcvbuf 180224 368984 2885468 6162268 0
othersockbuf 15540 19128 1442734 4719534 0
dgramrcvbuf 0 606056 1442734 1442734 0
numothersock 7 11 800 800 0
dcachesize 0 0 4026407 4147200 0
numfile 619 820 7200 7200 0
dummy 0 0 0 0 0
dummy 0 0 0 0 0
dummy 0 0 0 0 0
numiptent 0 0 200 200 0
18638: kmemsize 454989 1046206 18486804 20335484 0
lockedpages 0 0 902 902 0
privvmpages 1253 8677 152556 167811 0
shmpages 0 336 15255 15255 0
dummy 0 0 0 0 0
numproc 8 20 800 800 0
physpages 837 1734 0 214
...
|
|
|
|
|
|
|
|
|
|
|
|