OpenVZ Forum


Home » General » Support » localhost issues on multi-VM machine (localhost unreachable for Munin scripts)
icon4.gif  localhost issues on multi-VM machine [message #48919] Wed, 16 January 2013 17:47 Go to next message
copperfrog is currently offline  copperfrog
Messages: 2
Registered: January 2013
Location: Portland, OR
Junior Member

From: 12.155.34*
Hi,

I have an OpenVZ host machine which runs 10 guests on it. The system is configured so that each of those VMs is using a certain amount of RAM and CPU's without over-allocating the physical machine. I.e. it's a 32 core machine with 64 GB RAM, all machines together are configured to use 63GB of RAM and 28 CPU cores.
I run Apache on all of the VMs and also Munin. In order to monitor correct behavior and patterns Munin uses 'apache_accesses', apache_processes' and 'apache_volume' on each of these VMs. Now the nature of Munin is to come along every 5 minutes and run all those scripts and retrieve the values. I have observed that if these scripts try to check /server-status via localhost (http : //localhost:80/server-status?auto) they time out on some of the VMs. This can be also observed by running a stupid while loop requesting the status with curl from the command line:
while [ 1 ]; do `curl http : //localhost/server-status?auto > /dev/null`; done

It will run until Munin comes along and then curl will report 'host unreachable' for about 30 seconds until it recovers.
I checked the beancounters and monitor the lo interface and do not see any errors on either of them.
It looks to me like the localhost interface is shared across all VMs and is running out of capacity to serve all of the requests. The math would dictate that at 10 VMs with 3 scripts running each there are 30 requests within a very small time frame. Maybe I'm wrong but I'd expect that load to be acceptable.
I can alleviate the problem by using the actual IP address of the VM (assigned to venet0) for these calls. So the venet0 interface seems to have a much higher capacity for requests. Another way to relive the affected VMs is by turning of Munin for some of the other VMs); I can run up to 7 VMs fine; 8 and more cause time outs.
Does anybody have an idea what the problem might be?

Here some of the information requested by the stickys:

- Host running CentOS 6.3
- 8 VMs running CentOS 6.3 (2 affected)
- 1 VM running Ubuntu 10.04 (fine)
- 1 VM running CentOS 5.8 (fine)

One of the affected VMs:
--------------------
>ifconfig
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1845251 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1845251 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:189093532 (180.3 MiB)  TX bytes:189093532 (180.3 MiB)

venet0    Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet addr:127.0.0.1  P-t-P:127.0.0.1  Bcast:0.0.0.0  Mask:255.255.255.255
          UP BROADCAST POINTOPOINT RUNNING NOARP  MTU:1500  Metric:1
          RX packets:13190677 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17361191 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:5631175102 (5.2 GiB)  TX bytes:17001995532 (15.8 GiB)

venet0:0  Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet addr:10.10.20.45  P-t-P:10.10.20.45  Bcast:10.10.20.45  Mask:255.255.255.255
          UP BROADCAST POINTOPOINT RUNNING NOARP  MTU:1500  Metric:1

> ip rule list
0:	from all lookup local 
32766:	from all lookup main 
32767:	from all lookup default

> ip route list table all
169.254.0.0/16 dev venet0  scope link  metric 1002 
default dev venet0  scope link 
broadcast 127.255.255.255 dev lo  table local  proto kernel  scope link  src 127.0.0.1 
local 10.10.20.45 dev venet0  table local  proto kernel  scope host  src 10.10.20.45 
broadcast 10.10.20.45 dev venet0  table local  proto kernel  scope link  src 10.10.20.45 
broadcast 127.0.0.0 dev lo  table local  proto kernel  scope link  src 127.0.0.1 
local 127.0.0.1 dev lo  table local  proto kernel  scope host  src 127.0.0.1 
local 127.0.0.1 dev venet0  table local  proto kernel  scope host  src 127.0.0.1 
local 127.0.0.0/8 dev lo  table local  proto kernel  scope host  src 127.0.0.1 
unreachable ::/96 dev lo  metric 1024  error -101 mtu 16436 advmss 16376 hoplimit 4294967295
unreachable ::ffff:0.0.0.0/96 dev lo  metric 1024  error -101 mtu 16436 advmss 16376 hoplimit 4294967295
unreachable 2002:a00::/24 dev lo  metric 1024  error -101 mtu 16436 advmss 16376 hoplimit 4294967295
unreachable 2002:7f00::/24 dev lo  metric 1024  error -101 mtu 16436 advmss 16376 hoplimit 4294967295
unreachable 2002:a9fe::/32 dev lo  metric 1024  error -101 mtu 16436 advmss 16376 hoplimit 4294967295
unreachable 2002:ac10::/28 dev lo  metric 1024  error -101 mtu 16436 advmss 16376 hoplimit 4294967295
unreachable 2002:c0a8::/32 dev lo  metric 1024  error -101 mtu 16436 advmss 16376 hoplimit 4294967295
unreachable 2002:e000::/19 dev lo  metric 1024  error -101 mtu 16436 advmss 16376 hoplimit 4294967295
unreachable 3ffe:ffff::/32 dev lo  metric 1024  error -101 mtu 16436 advmss 16376 hoplimit 4294967295
default dev venet0  metric 1  mtu 1500 advmss 1440 hoplimit 4294967295
unreachable default dev lo  table unspec  proto kernel  metric -1  error -101 hoplimit 255
local ::1 via :: dev lo  table local  proto none  metric 0  mtu 16436 advmss 16376 hoplimit 4294967295
unreachable default dev lo  table unspec  proto kernel  metric -1  error -101 hoplimit 255

> iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination 

> tcpdump -i lo -e host 127.0.0.1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
09:19:47.896467 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 74: localhost.localdomain.33038 > localhost.localdomain.http: Flags [S], seq 2650559739, win 32768, options [mss 16396,sackOK,TS val 3647429171 ecr 3647428160,nop,wscale 9], length 0
09:19:47.896502 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 74: localhost.localdomain.http > localhost.localdomain.33038: Flags [S.], seq 3952410189, ack 2650559740, win 32768, options [mss 16396,sackOK,TS val 3647429171 ecr 3647429171,nop,wscale 9], length 0
09:19:47.896530 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 66: localhost.localdomain.33038 > localhost.localdomain.http: Flags [.], ack 1, win 64, options [nop,nop,TS val 3647429171 ecr 3647429171], length 0
09:19:47.896626 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 248: localhost.localdomain.33038 > localhost.localdomain.http: Flags [P.], seq 1:183, ack 1, win 64, options [nop,nop,TS val 3647429171 ecr 3647429171], length 182
09:19:47.896664 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 66: localhost.localdomain.http > localhost.localdomain.33038: Flags [.], ack 183, win 67, options [nop,nop,TS val 3647429171 ecr 3647429171], length 0
09:19:47.897343 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 428: localhost.localdomain.http > localhost.localdomain.33038: Flags [P.], seq 1:363, ack 183, win 67, options [nop,nop,TS val 3647429172 ecr 3647429171], length 362
09:19:47.897369 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 66: localhost.localdomain.33038 > localhost.localdomain.http: Flags [.], ack 363, win 67, options [nop,nop,TS val 3647429172 ecr 3647429172], length 0
09:19:47.897589 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 66: localhost.localdomain.33038 > localhost.localdomain.http: Flags [F.], seq 183, ack 363, win 67, options [nop,nop,TS val 3647429172 ecr 3647429172], length 0
09:19:47.897697 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 66: localhost.localdomain.http > localhost.localdomain.33038: Flags [F.], seq 363, ack 184, win 67, options [nop,nop,TS val 3647429172 ecr 3647429172], length 0
09:19:47.897724 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 66: localhost.localdomain.33038 > localhost.localdomain.http: Flags [.], ack 364, win 67, options [nop,nop,TS val 3647429172 ecr 3647429172], length 0
09:19:48.907586 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 74: localhost.localdomain.33040 > localhost.localdomain.http: Flags [S], seq 1138244790, win 32768, options [mss 16396,sackOK,TS val 3647430182 ecr 3647429172,nop,wscale 9], length 0
09:19:48.907615 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 74: localhost.localdomain.http > localhost.localdomain.33040: Flags [S.], seq 1886956057, ack 1138244791, win 32768, options [mss 16396,sackOK,TS val 3647430182 ecr 3647430182,nop,wscale 9], length 0
09:19:48.907632 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 66: localhost.localdomain.33040 > localhost.localdomain.http: Flags [.], ack 1, win 64, options [nop,nop,TS val 3647430182 ecr 3647430182], length 0
09:19:48.907693 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 248: localhost.localdomain.33040 > localhost.localdomain.http: Flags [P.], seq 1:183, ack 1, win 64, options [nop,nop,TS val 3647430182 ecr 3647430182], length 182
09:19:48.907715 0
...

Re: localhost issues on multi-VM machine [message #49028 is a reply to message #48919] Wed, 27 February 2013 13:33 Go to previous messageGo to next message
seanfulton is currently offline  seanfulton
Messages: 100
Registered: May 2007
Senior Member
From: *dyn.optonline.net
Check the apache config in the affected containers. OpenVZ has no bearing on HTTP inside a container/munin. We run both on many containers here with no problem. By default, every VM has a localhost. So I suspect you have a problem with your apache config on those two machines. Either it is not set up to respond to 127.0.0.1, or you have access permissions (either in .htaccess or httpd.conf) that are preventing the script from connecting.

I don't think it has anything to do with OpenVZ.

sean
Re: localhost issues on multi-VM machine [message #49029 is a reply to message #49028] Wed, 27 February 2013 15:49 Go to previous messageGo to next message
copperfrog is currently offline  copperfrog
Messages: 2
Registered: January 2013
Location: Portland, OR
Junior Member

From: 12.155.34*
Thanks for the reply. However, this is not likely be related to Apache configs; because the problem goes away if I just turn off enough containers. And I do this without changing any of the configurations on the machines that previously had issues connecting to localhost. Also remember, I can connect fine to localhost if I don't hit the 5 minute sweetspot when Munin comes around and all containers try to access their respective localhost at the same time.
This has to do something with the amount of open connections on localhost at any one given point of time. And I'm still thinking it suggests that the localhost interfaces are connected on a lower level.
Re: localhost issues on multi-VM machine [message #52107 is a reply to message #49028] Thu, 02 July 2015 00:32 Go to previous message
hfb9 is currently offline  hfb9
Messages: 6
Registered: November 2008
Junior Member
From: *telstraglobal.net
We see the same problem on host machines running CentOS 6 with the latest kernel (2.6.32-042stab108.2). Localhost times out when connecting to it, about a minute at a time. Connecting to another IP on the same machine works fine. It seems to be related to how many containers are running at the same time.

Any solution or work around for this?
Previous Topic: vzctl on vanilla Linux 4.1.1 results in "Refusing to run"
Next Topic: Container/Template - RHEL 6.2
Goto Forum:
  


Current Time: Thu Sep 19 19:08:19 GMT 2019