OpenVZ Forum


Home » General » Support » Active SSH Session to Container is Lost when Suspend/Resume Container
Active SSH Session to Container is Lost when Suspend/Resume Container [message #53447] Wed, 03 October 2018 22:32 Go to next message
sindimo is currently offline  sindimo
Messages: 2
Registered: October 2018
Location: USA
Junior Member
From: 98.217.122*
Dear All,

I am trying to install OpenVZ on a RedHat 7.5 machine (AWS instance). I have successfully installed it and its utilities (with yum mainly, openvz-release-7.0.8-4.vz7.x86_64) and I am able to start a container, suspend, and resume it. The container launched is Centos 7.

[root@ip-172-31-7-139 ec2-user_scripts]# uname -r
3.10.0-862.11.6.vz7.64.7

When I launch a container "ct1", I can ssh to it and run something simple like the top command "top -d 1" which updates every second. If I suspend the container while I have that active ssh session to ct1, the shell with the top command obviously freezes. My expectation when I resume the container is that the shell and top command should resume, however instead the ssh and tcp connection seem to break on that shell where I ssh'ed to ct1 and I end up getting kicked out of the container when the container is resumed. I can do a new ssh connection to the container after it resumes with no problem. However my expectation is that I shouldn't be losing the ssh connection in the first place, it should just hang during the suspend/resume then continue normally after resume is done. The error I get is "packet_write_wait: Connection to ct1 port 22: Broken pipe".

I googled the error and tried to adjust some ssh configuration parameters from both the client and server side but none helped, for example I adjusted these:
- ServerAliveInterval
- ServerAliveCountMax
- ClientAliveInterval

I also tried to adjust some kernel parameters on the hardware node and container as I've seen suggestions on line to do so, but that didn't help either, these are the ones adjusted:
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_intvl = 300
net.ipv4.tcp_keepalive_probes = 100


Are they any settings to adjust on the hardware node (e.g. kernel parameters) or when launching the container to fix this behavior? Any suggestions are really appreciated.

This is the summary of the scenario I am describing:

#Create container
prlctl create ct1 --vmtype ct
#setup network, dns, etc..
#ssh to ct1 and run simple command like "top -d 1"
#Suspend container while active ssh session is taking place
prlctl suspend ct1
#Resume container
prlctl resume ct1
#Expectation: ssh session should still be intact after resuming container, instead it gets broken

Thank you for your help.

Mohamad Sindi
Massachusetts Institute of Technology (MIT)

Re: Active SSH Session to Container is Lost when Suspend/Resume Container [message #53453 is a reply to message #53447] Thu, 04 October 2018 09:20 Go to previous messageGo to next message
vaverin is currently offline  vaverin
Messages: 682
Registered: September 2005
Senior Member
From: *virtuozzo.com
Dear Mohamad,
please use jira for reporteing about noticed issues.

I've submitted
https://bugs.openvz.org/browse/OVZ-7063
Re: Active SSH Session to Container is Lost when Suspend/Resume Container [message #53455 is a reply to message #53453] Thu, 04 October 2018 15:20 Go to previous messageGo to next message
TomvB is currently offline  TomvB
Messages: 27
Registered: July 2017
Location: -Root-
Junior Member
From: *ip-037-017-220-003.dds.nl
vaverin wrote on Thu, 04 October 2018 11:20
Dear Mohamad,
please use jira for reporteing about noticed issues.

I've submitted
https://bugs.openvz.org/browse/OVZ-7063


Quote:
"I am trying to install OpenVZ on a RedHat 7.5 machine (AWS instance). I have successfully installed it and its utilities (with yum mainly, openvz-release-7.0.8-4.vz7.x86_64) and I am able to start a container, suspend, and resume it. The container launched is Centos 7."

OpenVZ 7 on CentOS 7 or Red Hat 7 is not supported?

@Mohamed, Please use one of the official installation methods. Replacing the 7.5 kernel is not supported.
ISO: https://download.openvz.org/virtuozzo/releases/7.0/x86_64/is o/

Please check: https://blogs.oracle.com/ravello/install-iso-image-aws-googl e-cloud for AWS.

[Updated on: Thu, 04 October 2018 15:23]

Report message to a moderator

Re: Active SSH Session to Container is Lost when Suspend/Resume Container [message #53457 is a reply to message #53455] Mon, 08 October 2018 07:13 Go to previous message
sindimo is currently offline  sindimo
Messages: 2
Registered: October 2018
Location: USA
Junior Member
From: 98.217.122*
Dear Vasily,

Thank you for opening the ticket and informing me about jira.

Dear Tom,

Thank you for the clarification and pointing me to the supported OpenVZ iso.

I finally got an AWS instance installed with the latest OpenVZ release iso. Unfortunately the problem still remains even after installing with the supported OpenVZ iso. For reference, below are the versions used after doing a "yum update" on the system:

[ec2-user@openvz-node ~]$ cat /etc/redhat-release
Virtuozzo Linux release 7.5

[ec2-user@openvz-node ~]$ uname -r
3.10.0-862.14.4.vz7.72.4

[ec2-user@openvz-node ~]$ rpm -qa | egrep " openvz-release|criu|prlctl|prl-disp-service|vzkernel|ploop|p ython-subprocess32|yum-plugin-priorities|libprlsdk "

criu-3.10.0.7-1.vz7.x86_64
libprlsdk-7.0.220-6.vz7.x86_64
libprlsdk-python-7.0.220-6.vz7.x86_64
openvz-release-7.0.9-2.vz7.x86_64
ploop-7.0.131-1.vz7.x86_64
ploop-lib-7.0.131-1.vz7.x86_64
prlctl-7.0.156-1.vz7.x86_64
prl-disp-service-7.0.863-1.vz7.x86_64
prl-disp-service-tests-7.0.863-1.vz7.x86_64
python-criu-3.10.0.7-1.vz7.x86_64
python-ploop-7.0.131-1.vz7.x86_64
python-subprocess32-3.2.7-1.vz7.5.x86_64
vzkernel-3.10.0-862.14.4.vz7.72.4.x86_64
yum-plugin-priorities-1.1.31-46.vl7.noarch


I tried to investigate this further and I was able to figure out what's triggering the issue but not sure how to fix it.

The container I am launching has an NFS4 mount inside it.

If I disable that nfs mount and try to suspend/resume the container, then it works fine and the active ssh sessions to the container resumes fine once the resume operation is completed.

However if I kept the nfs mount inside the container and I try to suspend/resume the container, any active ssh session to the container gets broken after resume is done (broken pipe error). Please note that once the container is resumed, I am able to establish a new ssh session to it and the nfs mount inside it is active and accessible and has no issues. So the nfs mount is successfully intact after resuming. It's just the fact that an nfs mount existing inside the container seems to be messing up restoring active ssh sessions once the resume is done.

I hope this gives more insight to have the problem investigated further, and please if you have any suggestions to get around this I would truly appreciate your feedback.

Many thanks for your help.

Sincerely,

Mohamad
Previous Topic: Centos 7 cannot open shared object file libpcs_client.so.1
Next Topic: hds vs hdd
Goto Forum:
  


Current Time: Wed Dec 19 13:46:13 GMT 2018