OpenVZ Forum


Home » General » Support » TCP Packets get lost under moderate network load
TCP Packets get lost under moderate network load [message #34644] Thu, 22 January 2009 22:22 Go to previous message
klathor is currently offline  klathor
Messages: 3
Registered: January 2009
Junior Member
Hello!

I've filed this as a bug (http://bugzilla.openvz.org/show_bug.cgi?id=1156),
but I was hoping to get some feedback/suggestions to make sure I'm not doing
something idiotic.
I apologize in advance for the length of this note...

Background:
I'm converting a fairly basic legacy webserver to an OpenVZ VPS. I've been using
OpenVZ for a few months on some infrastructure stuff and it's been working fine,
but this is the first production server I've tried converting. It's running
apache and php, gets about 2000 hits/min (mostly images, php is just for some
templates). The new OpenVZ machine is a Dell, dual Xeon 3G, 4G ram, this new VPS
is the only thing running on it.

Issue:
The morning after I converted, http monitors started to throw alerts just as
traffic was ramping up. Upon logging in, I saw 500+ sockets in SYN_RECV state.
Restarting the webserver would clear them out, but within a minute or two they
would reach 100+. Also, there were many "Orphaned socket dropped" messages in
the kernel logs, but no barriers were reached. After many false leads, I
discovered that the problem was very similar to the problem reported by Max
Deineko in January of 2008:
http://forum.openvz.org/index.php?t=msg&goto=25678
Unfortunately no resolution was posted.

I have reverted back to the original server and done some testing, but
unfortunately I have to break our live site to do so because the issue only
occurs when the VPS is hit with many different IPs. I can't replicate the
behavior even by running ab from ~7 machines. (Doing that only makes load climb
and the net link saturate, but SYN_RECV states do not pile up.)

Possible clues:
-Out of desperation I turned off iptables entirely (no rules, no modules loaded)
on both the HN & VPS, which appeared to clear up everything.
-The issue reappears with iptables turned ON on the HN but turned OFF on the VPS
(IPTABLES="" in /etc/vz/vz.conf).
-I've tried logging all INVALID state packets in case it was some sort of
conntrack issue, but I saw no hits for the problem connections captured.
-Happens whether syncookies are turned on or off.
-I've looked over the links referenced in Mr. Deineko's thread, but I don't
believe this is a buffer size or window size issue as the problem happens while
initiating the connection.

Does anyone have any hints? Anyone out there running very busy webservers? I've
seen some random google hits for people complaining about DDOS problems with
their OpenVZ sites which may be related since the issue looks like a DDOS at
first glance (many many SYN_RECV states). Any info at all would be greatly
appreciated.

Thanks,
-JayKim

Sample connection:
netstat -anp:
tcp        0      0 SERVER:80           CLIENT:41842          SYN_RECV    -


tcpdump (same thing seen on HN and VPS):
11:09:38.296692 IP CLIENT.41842 > SERVER.80: S 4100993638:4100993638(0) win 64512 <mss 1460,nop,nop,sackOK>
11:09:41.275425 IP CLIENT.41842 > SERVER.80: S 4100993638:4100993638(0) win 64512 <mss 1460,nop,nop,sackOK>
11:09:41.275468 IP SERVER.80 > CLIENT.41842: S 1902995022:1902995022(0) ack 4100993639 win 5840 <mss 1460,nop,nop,sackOK>
11:09:41.319954 IP CLIENT.41842 > SERVER.80: . ack 1 win 64512
11:09:41.346429 IP CLIENT.41842 > SERVER.80: P 1:462(461) ack 1 win 64512
11:09:44.982597 IP SERVER.80 > CLIENT.41842: S 1902995022:1902995022(0) ack 4100993639 win 5840 <mss 1460,nop,nop,sackOK>
11:09:45.025814 IP CLIENT.41842 > SERVER.80: . ack 1 win 64512
11:09:47.295413 IP CLIENT.41842 > SERVER.80: P 1:462(461) ack 1 win 64512
11:09:50.983478 IP SERVER.80 > CLIENT.41842: S 1902995022:1902995022(0) ack 4100993639 win 5840 <mss 1460,nop,nop,sackOK>
11:09:51.030111 IP CLIENT.41842 > SERVER.80: . ack 1 win 64512
11:09:59.334340 IP CLIENT.41842 > SERVER.80: P 1:462(461) ack 1 win 64512
11:10:03.185371 IP SERVER.80 > CLIENT.41842: S 1902995022:1902995022(0) ack 4100993639 win 5840 <mss 1460,nop,nop,sackOK>
11:10:03.226847 IP CLIENT.41842 > SERVER.80: . ack 1 win 64512
11:10:27.195543 IP SERVER.80 > CLIENT.41842: S 1902995022:1902995022(0) ack 4100993639 win 5840 <mss 1460,nop,nop,sackOK>


uname -a:
Linux SERVER 2.6.18-92.1.13.el5.028stab059.6PAE #1 SMP Fri Nov 14 20:46:53 MSK 2008 i686 i686 i386 GNU/Linux


rpm -qa | grep vz:
vzquota-3.0.12-1
vzctl-3.0.23-1
vztmpl-centos-5-2.0-3
ovzkernel-PAE-2.6.18-92.1.13.el5.028stab059.6
vzyum-2.4.0-11
vzctl-lib-3.0.23-1
vzpkg-2.7.0-18
vzrpm43-python-4.3.3-7_nonptl.6
vzrpm43-4.3.3-7_nonptl.6
vzrpm44-4.4.1-22.5
vzrpm44-python-4.4.1-22.5


/etc/redhat-release (both HN and VPS):
CentOS release 5.2 (Final)


various sysctls:
net.core.rmem_default = 113664
net.core.wmem_default = 113664
net.core.rmem_max = 131071
net.core.wmem_max = 131071
net.ipv4.tcp_mem = 16384     20480  24576
net.ipv4.tcp_rmem = 4096     87380  655360
net.ipv4.tcp_wmem = 4096     16384  655360
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_recv = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent = 120
net.ipv4.tcp_synack_retries = 5
net.ipv4.tcp_syn_retries = 5


VPS limits (generated from vzsplit):
       uid  resource           held    maxheld    barrier      limit    failcnt
     3000:  kmemsize        5098959   26116859   29720985   32693083          0
            numtcpsock            9        359       1333       1333          0
            tcpsndbuf         71552    2126436    4447027    9906995          0
            tcprcvbuf        132912     999560    4447027    9906995          0

 
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Single IP not reachable from within VE
Next Topic: Ubuntu stopped working with 2.6.18-14-fza-686-bigmem
Goto Forum:
  


Current Time: Fri Aug 01 15:17:07 GMT 2025

Total time taken to generate the page: 0.73345 seconds