Home » General » Support » Weird performance problem
Weird performance problem [message #35785] |
Wed, 22 April 2009 12:27 |
ugob
Messages: 271 Registered: March 2007
|
Senior Member |
|
|
Hi,
I'm running a CentOS 4. server and I sometimes face a weird problem.
It is a weird performance problem, and here is how I discovered it.
This server runs OpenVZ virtual machines, and one of them is an asterisk
server for my personal use. The first symptom of the problem is that
the voice quality became flaky. So I logged on the server to see what
could be eating cpu cycles, when I ran top, it took almost one minute
before top actually showed. Another hint is that when I run dstat (a
monitoring utility that is a mix of iostat and vmstat and other stats),
I often get a "missed xx ticks", where xx is a number.
Example (current) (sorry for the wrap):
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
3 2 93 2 0 0| 106k 273k| 0 0 | 0.2 0.4 |1039 389
3 6 91 0 0 0| 0 6416k| 276k 275k| 0 0 |2160 6822
missed 55 ticks
4 10 84 2 0 0|1200k 1992k| 82k 93k| 0 0 |1188 6275
missed 29 ticks
1 0 99 0 0 0| 0 1312k| 65k 66k| 0 0 |1050 1114
missed 38 ticks
2 1 96 0 0 0| 0 1168k| 57k 59k| 0 0 | 491 877
missed 13 ticks
3 1 94 1 0 0| 0 6016k| 181k 176k| 0 0 |2169 5996
missed 50 ticks
4 2 91 1 0 0| 28k 8744k| 216k 214k| 0 0 |2159 5438
missed 37 ticks
1 1 98 0 0 0| 0 2632k| 93k 91k| 0 0 | 983 1381
missed 34 ticks
1 1 98 1 0 0| 0 5624k| 113k 110k| 0 0 |1569 2643
missed 52 ticks
1 1 98 1 0 0| 0 2432k| 29k 28k| 0 0 | 679 647
missed 12 ticks
0 0 100 0 0 0| 0 0 | 60B 374B| 0 0 | 13 15
2 3 94 0 0 0| 0 1872k| 209k 210k| 0 0 |1375 3590
missed 30 ticks
The problem is currently occuring, but it doesn't seem to be affecting
voice quality for now, so I have some time to try to find the cause.
The only solution I've found up to now is to reboot... But hey, this
isn't a Windows 98 machine !
I tried restarting the VZ system, which restarts all the VMs, but it
didn't solve the problem. I can't tell if the problem occurs on a stock
centos kernel, because the server is running production (but
non-critical) virtual machines, so it is always running the openVZ kernel.
So here is what I've done for now:
- Top shows a load of about 0.4
- vmstat 1 10 shows this:
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
0 0 592 191092 381720 537956 0 0 53 68 4 3 3
2 93 2
0 0 592 190720 381720 537956 0 0 0 0 32 60 1
1 98 0
0 0 592 191092 381720 537956 0 0 0 0 41 59 0
0 100 0
1 0 592 191092 381728 537948 0 0 0 2584 311 96 10
4 66 19
0 0 592 189968 381732 537944 0 0 0 2080 222 174 2
3 79 16
0 1 592 189968 381732 537944 0 0 0 3244 170 73 10
4 73 12
0 0 592 190216 381732 537944 0 0 0 136 76 113 1
2 93 4
0 0 592 189844 381732 537944 0 0 0 0 33 69 1
1 98 0
0 0 592 189844 381732 537944 0 0 0 0 24 32 0
0 100 0
0 0 592 190340 381732 537944 0 0 0 0 28 42 0
0 100 0
iostat -x 1 (excerpt)
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 171.00 0.00 124.00 0.00 2368.00 0.00 1184.00
19.10 0.14 1.13 0.02 0.20
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdc 0.00 171.00 0.00 124.00 0.00 2368.00 0.00 1184.00
19.10 0.17 1.35 0.02 0.30
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 294.00 0.00 2352.00 0.00 1176.00
8.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 294.00 0.00 2352.00 0.00 1176.00
8.00 0.30 1.01 0.02 0.50
dm-3 0.00 0.00 0.00 294.00 0.00 2352.00 0.00 1176.00
8.00 0.30 1.01 0.02 0.50
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
Another hint is that pings are really slow. Even pinging localhost is
very long. The first reply is fast, but the second takes ages to come.
It seems to be blocking here:
recvmsg(3, 0xbfbf84b0, MSG_DONTWAIT) = -1 EAGAIN (Resource
temporarily unavailable)
gettimeofday({1239887784, 389347}, NULL) = 0
poll(
The rest comes as soon as there is another response:
[{fd=3, events=POLLIN|POLLERR}], 1, 999) = 0
gettimeofday({1239887903, 119727}, NULL) = 0
gettimeofday({1239887903, 119791}, NULL) = 0
sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("127.0.0.1")},
msg_iov(1)=[{" \10\0\335\2018)\0\4\0370\347I\357\323\1\0\10\t\n\v\f\r\16\17 \20\21\22\23\24\25\26\27 "...,
64}], msg_controllen=0, msg_flags=0}, MSG_CONFIRM) = 64
recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("127.0.0.1")},
msg_iov(1)=[{" E\0\0T\26\264\0\0@\1e\363\177\0\0\1\177\0\0\1\0\0\345\2018)\ 0\4\0370\347I "...,
192}], msg_controllen=20, {cmsg_len=20, cmsg_level=SOL_SOCKET,
cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 84
write(1, "64 bytes from hn01.domain"..., 82) = 82
recvmsg(3, 0xbfbf84b0, MSG_DONTWAIT) = -1 EAGAIN (Resource
temporarily unavailable)
gettimeofday({1239887903, 120785}, NULL) = 0
poll(
Then it blocks again...
This confuses Nagios that is running in a VM on this server.
Can the 'gettimeofday' be the problem? 'date' runs w/o delay.
Any insight would be greatly appreciated. It is not critical, but I'd
be glad to be able to finally pinpoint and solve the problem.
Hardware: HP Netserver, software raid, SCSI disks, 1.7 GB RAM.
I can provide more information if needed.
Thanks,
Ugo
Please read the manual before asking questions:
http://download.openvz.org/doc/OpenVZ-Users-Guide.pdf
Please have a look at the wiki before asking questions:
http://wiki.openvz.org/Main_Page
[Updated on: Wed, 22 April 2009 12:36] Report message to a moderator
|
|
|
|
|
|
Goto Forum:
Current Time: Wed Nov 13 13:12:32 GMT 2024
Total time taken to generate the page: 0.02999 seconds
|