OpenVZ Forum


Home » General » Support » TCP Packets get lost under moderate network load
TCP Packets get lost under moderate network load [message #34644] Thu, 22 January 2009 22:22 Go to next message
klathor is currently offline  klathor
Messages: 3
Registered: January 2009
Junior Member
Hello!

I've filed this as a bug (http://bugzilla.openvz.org/show_bug.cgi?id=1156),
but I was hoping to get some feedback/suggestions to make sure I'm not doing
something idiotic.
I apologize in advance for the length of this note...

Background:
I'm converting a fairly basic legacy webserver to an OpenVZ VPS. I've been using
OpenVZ for a few months on some infrastructure stuff and it's been working fine,
but this is the first production server I've tried converting. It's running
apache and php, gets about 2000 hits/min (mostly images, php is just for some
templates). The new OpenVZ machine is a Dell, dual Xeon 3G, 4G ram, this new VPS
is the only thing running on it.

Issue:
The morning after I converted, http monitors started to throw alerts just as
traffic was ramping up. Upon logging in, I saw 500+ sockets in SYN_RECV state.
Restarting the webserver would clear them out, but within a minute or two they
would reach 100+. Also, there were many "Orphaned socket dropped" messages in
the kernel logs, but no barriers were reached. After many false leads, I
discovered that the problem was very similar to the problem reported by Max
Deineko in January of 2008:
http://forum.openvz.org/index.php?t=msg&goto=25678
Unfortunately no resolution was posted.

I have reverted back to the original server and done some testing, but
unfortunately I have to break our live site to do so because the issue only
occurs when the VPS is hit with many different IPs. I can't replicate the
behavior even by running ab from ~7 machines. (Doing that only makes load climb
and the net link saturate, but SYN_RECV states do not pile up.)

Possible clues:
-Out of desperation I turned off iptables entirely (no rules, no modules loaded)
on both the HN & VPS, which appeared to clear up everything.
-The issue reappears with iptables turned ON on the HN but turned OFF on the VPS
(IPTABLES="" in /etc/vz/vz.conf).
-I've tried logging all INVALID state packets in case it was some sort of
conntrack issue, but I saw no hits for the problem connections captured.
-Happens whether syncookies are turned on or off.
-I've looked over the links referenced in Mr. Deineko's thread, but I don't
believe this is a buffer size or window size issue as the problem happens while
initiating the connection.

Does anyone have any hints? Anyone out there running very busy webservers? I've
seen some random google hits for people complaining about DDOS problems with
their OpenVZ sites which may be related since the issue looks like a DDOS at
first glance (many many SYN_RECV states). Any info at all would be greatly
appreciated.

Thanks,
-JayKim

Sample connection:
netstat -anp:
tcp        0      0 SERVER:80           CLIENT:41842          SYN_RECV    -


tcpdump (same thing seen on HN and VPS):
11:09:38.296692 IP CLIENT.41842 > SERVER.80: S 4100993638:4100993638(0) win 64512 <mss 1460,nop,nop,sackOK>
11:09:41.275425 IP CLIENT.41842 > SERVER.80: S 4100993638:4100993638(0) win 64512 <mss 1460,nop,nop,sackOK>
11:09:41.275468 IP SERVER.80 > CLIENT.41842: S 1902995022:1902995022(0) ack 4100993639 win 5840 <mss 1460,nop,nop,sackOK>
11:09:41.319954 IP CLIENT.41842 > SERVER.80: . ack 1 win 64512
11:09:41.346429 IP CLIENT.41842 > SERVER.80: P 1:462(461) ack 1 win 64512
11:09:44.982597 IP SERVER.80 > CLIENT.41842: S 1902995022:1902995022(0) ack 4100993639 win 5840 <mss 1460,nop,nop,sackOK>
11:09:45.025814 IP CLIENT.41842 > SERVER.80: . ack 1 win 64512
11:09:47.295413 IP CLIENT.41842 > SERVER.80: P 1:462(461) ack 1 win 64512
11:09:50.983478 IP SERVER.80 > CLIENT.41842: S 1902995022:1902995022(0) ack 4100993639 win 5840 <mss 1460,nop,nop,sackOK>
11:09:51.030111 IP CLIENT.41842 > SERVER.80: . ack 1 win 64512
11:09:59.334340 IP CLIENT.41842 > SERVER.80: P 1:462(461) ack 1 win 64512
11:10:03.185371 IP SERVER.80 > CLIENT.41842: S 1902995022:1902995022(0) ack 4100993639 win 5840 <mss 1460,nop,nop,sackOK>
11:10:03.226847 IP CLIENT.41842 > SERVER.80: . ack 1 win 64512
11:10:27.195543 IP SERVER.80 > CLIENT.41842: S 1902995022:1902995022(0) ack 4100993639 win 5840 <mss 1460,nop,nop,sackOK>


uname -a:
Linux SERVER 2.6.18-92.1.13.el5.028stab059.6PAE #1 SMP Fri Nov 14 20:46:53 MSK 2008 i686 i686 i386 GNU/Linux


rpm -qa | grep vz:
vzquota-3.0.12-1
vzctl-3.0.23-1
vztmpl-centos-5-2.0-3
ovzkernel-PAE-2.6.18-92.1.13.el5.028stab059.6
vzyum-2.4.0-11
vzctl-lib-3.0.23-1
vzpkg-2.7.0-18
vzrpm43-python-4.3.3-7_nonptl.6
vzrpm43-4.3.3-7_nonptl.6
vzrpm44-4.4.1-22.5
vzrpm44-python-4.4.1-22.5


/etc/redhat-release (both HN and VPS):
CentOS release 5.2 (Final)


various sysctls:
net.core.rmem_default = 113664
net.core.wmem_default = 113664
net.core.rmem_max = 131071
net.core.wmem_max = 131071
net.ipv4.tcp_mem = 16384     20480  24576
net.ipv4.tcp_rmem = 4096     87380  655360
net.ipv4.tcp_wmem = 4096     16384  655360
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_recv = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent = 120
net.ipv4.tcp_synack_retries = 5
net.ipv4.tcp_syn_retries = 5


VPS limits (generated from vzsplit):
       uid  resource           held    maxheld    barrier      limit    failcnt
     3000:  kmemsize        5098959   26116859   29720985   32693083          0
            numtcpsock            9        359       1333       1333          0
            tcpsndbuf         71552    2126436    4447027    9906995          0
            tcprcvbuf        132912     999560    4447027    9906995          0

Re: TCP Packets get lost under moderate network load [message #34649 is a reply to message #34644] Fri, 23 January 2009 11:28 Go to previous messageGo to next message
maratrus is currently offline  maratrus
Messages: 1495
Registered: August 2007
Location: Moscow
Senior Member
Hello,

Quote:


-Out of desperation I turned off iptables entirely (no rules, no modules loaded)
on both the HN & VPS, which appeared to clear up everything.
-The issue reappears with iptables turned ON on the HN but turned OFF on the VPS
(IPTABLES="" in /etc/vz/vz.conf).


Sorry, I didn't catch you. Why doesn't this mean that the problem is because of the iptables rules on the HN?


Quote:


tcpdump (same thing seen on HN and VPS)


Are you sure that we get exactly the same output inside VPS?

Is there anything in logs or in dmesg?

could you please attach the full output of /proc/user_beancounters?
And please attach the output of the following files:

- /proc/net/netstat
- /proc/net/softnet_stat
- /proc/net/snmp
- /proc/net/sockstat

but not as is. Could you please collect information in the dynamics i.e. when a problem appears collect information to obtain what changes and what kind of errors increase.

P.S. You'd better attach the requested information to your bug report instead of attaching here.
Re: TCP Packets get lost under moderate network load [message #34651 is a reply to message #34649] Fri, 23 January 2009 13:52 Go to previous messageGo to next message
klathor is currently offline  klathor
Messages: 3
Registered: January 2009
Junior Member
Quote:


Quote:


-Out of desperation I turned off iptables entirely (no rules, no modules loaded)
on both the HN & VPS, which appeared to clear up everything.
-The issue reappears with iptables turned ON on the HN but turned OFF on the VPS
(IPTABLES="" in /etc/vz/vz.conf).


Sorry, I didn't catch you. Why doesn't this mean that the problem is because of the iptables rules on the HN?


Actually that's a good question... I assumed the rules were ok since they're pretty simple and the problems aren't constant. I will try another test with just the modules loaded and no rules.

Quote:


Quote:


tcpdump (same thing seen on HN and VPS)


Are you sure that we get exactly the same output inside VPS?


I haven't done dumps simultaneously on the HN and VPS, but I do see the same behavior either way (syn+ack sent multiple times, responding ack ignored). I can do dumps simultaneously if you wish.

Quote:


Is there anything in logs or in dmesg?


The only thing I see are the infamous "Orphaned socket dropped" messages in dmesg. They start about 2-3 mins after the SYN_RECV states start to pile up and repeat a lot.

Quote:


could you please attach the full output of /proc/user_beancounters?
And please attach the output of the following files:

- /proc/net/netstat
- /proc/net/softnet_stat
- /proc/net/snmp
- /proc/net/sockstat

but not as is. Could you please collect information in the dynamics i.e. when a problem appears collect information to obtain what changes and what kind of errors increase.

P.S. You'd better attach the requested information to your bug report instead of attaching here.

Will do. Unfortunately I'm out of the office till Monday but I'll get to it asap.

Thank you very much for the help.
-JayKim
Re: TCP Packets get lost under moderate network load [message #34710 is a reply to message #34644] Tue, 27 January 2009 15:03 Go to previous messageGo to next message
klathor is currently offline  klathor
Messages: 3
Registered: January 2009
Junior Member
Ok so I'm an idiot.

I tried replicating my iptables failures but was surprised to find out that the problem happened regardless of whether or not iptables rules/modules were loaded. Looks like the times I tested and it succeeded were times of relatively low load.

Long story short, it was an apache misconfig. MaxClients was getting overridden by an apache include, made it seem that apache wasn't spawning new children but somehow dropping connections instead. I just happened to miss the error messages because the logs are filled with cruft.

In the end I just reset everything external to vz (iptables, sysctls, syncookies), tweaked the ubc limits and apache
maxclients/keepalivetimeout and everything works fine now. Sorry for the time wastage, and thanks for the help (at least I learned more in the process).
Re: TCP Packets get lost under moderate network load [message #34722 is a reply to message #34710] Wed, 28 January 2009 07:29 Go to previous message
maratrus is currently offline  maratrus
Messages: 1495
Registered: August 2007
Location: Moscow
Senior Member
Thanks for reporting the end of that story and the reason of that behavior. It's really important because I suppose somebody faced with or will face with similar problem. Thank you.
Previous Topic: Single IP not reachable from within VE
Next Topic: Ubuntu stopped working with 2.6.18-14-fza-686-bigmem
Goto Forum:
  


Current Time: Sat Nov 16 12:18:59 GMT 2024

Total time taken to generate the page: 0.03073 seconds