OpenVZ Forum


Home » General » Support » nagios: Warning: The check of host '****' could not be performed due to a fork() error: (Cannot allocate memory)
nagios: Warning: The check of host '****' could not be performed due to a fork() error: [message #40380] Mon, 16 August 2010 06:22 Go to next message
romeor is currently offline  romeor
Messages: 11
Registered: April 2010
Junior Member
hello, sirs!

I've installed the nagios with nagvis plugin and from time to time it stops to respond, while i still can vzctl enter into this machine.
i receive this message in /var/log/messages

Warning: The check of host '****' could not be performed due to a fork() error: 'Cannot allocate memory'.

here is the conf of this container:

# Primary parameters
NUMPROC="8000:8000"
NUMTCPSOCK="9223372036854775807:9223372036854775807"
NUMOTHERSOCK="9223372036854775807:9223372036854775807"
VMGUARPAGES="603785:9223372036854775807"

# Secondary parameters
KMEMSIZE="9223372036854775807:9223372036854775807"
OOMGUARPAGES="603785:9223372036854775807"
PRIVVMPAGES="603785:664163"
TCPSNDBUF="9223372036854775807:9223372036854775807"
TCPRCVBUF="9223372036854775807:9223372036854775807"
OTHERSOCKBUF="9223372036854775807:9223372036854775807"
DGRAMRCVBUF="9223372036854775807:9223372036854775807"

# Auxiliary parameters
NUMFILE="9223372036854775807:9223372036854775807"
NUMFLOCK="9223372036854775807:9223372036854775807"
NUMPTY="512:512"
NUMSIGINFO="1024:1024"
DCACHESIZE="9223372036854775807:9223372036854775807"
LOCKEDPAGES="20126:20126"
SHMPAGES="9223372036854775807:9223372036854775807"
NUMIPTENT="9223372036854775807:9223372036854775807"
PHYSPAGES="0:9223372036854775807"

# Disk quota parameters
DISKSPACE="10485760:11534336"
DISKINODES="2000000:2200000"
QUOTATIME="0"
QUOTAUGIDLIMIT="0"

and

sisemon:~# cat /proc/bc/105/resources
kmemsize 30949316 42515317 9223372036854775807 9223372036854775807 0
lockedpages 0 0 20126 20126 0
privvmpages 327761 1143370 603785 664163 98961
shmpages 671 687 9223372036854775807 9223372036854775807 0
numproc 41 92 8000 8000 0
physpages 293667 294962 0 9223372036854775807 0
vmguarpages 0 0 603785 9223372036854775807 0
oomguarpages 293668 294963 603785 9223372036854775807 0
numtcpsock 16 19 9223372036854775807 9223372036854775807 0
numflock 17 21 9223372036854775807 9223372036854775807 0
numpty 1 2 512 512 0
numsiginfo 0 20 1024 1024 0
tcpsndbuf 320000 1016320 9223372036854775807 9223372036854775807 0
tcprcvbuf 262144 210688 9223372036854775807 9223372036854775807 0
othersockbuf 19968 81920 9223372036854775807 9223372036854775807 0
dgramrcvbuf 0 497920 9223372036854775807 9223372036854775807 0
numothersock 19 39 9223372036854775807 9223372036854775807 0
dcachesize 1405444 1438558 9223372036854775807 9223372036854775807 0
numfile 67141 67298 9223372036854775807 9223372036854775807 0
numiptent 14 14 9223372036854775807 9223372036854775807 0

my kernel is 2.6.24-11 and containers are managed by proxmox.
and the TOP output is:


PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
533 nagios 20 0 1229m 1.0g 2292 S 112.7 44.7 1060:29 nagios
1 root 20 0 10360 752 628 S 0.0 0.0 0:10.18 init
29 root 20 0 100 12 4 S 0.0 0.0 0:28.20 init-logger
79 root 16 -4 12616 672 352 S 0.0 0.0 0:00.00 udevd
344 root 20 0 5920 664 532 S 0.0 0.0 0:01.00 syslogd
366 root 20 0 62636 1212 652 S 0.0 0.1 0:00.00 sshd
375 root 20 0 21652 916 704 S 0.0 0.0 0:00.00 xinetd
408 root 20 0 11936 1408 1168 S 0.0 0.1 0:00.00 mysqld_safe
458 mysql 20 0 233m 25m 4936 S 0.0 1.1 6:19.71 mysqld
491 root 20 0 62808 2328 804 S 0.0 0.1 0:27.76 sendmail
499 smmsp 20 0 57704 1772 616 S 0.0 0.1 0:00.02 sendmail
509 root 20 0 251m 10m 5964 S 0.0 0.4 0:48.22 httpd
544 root 20 0 20876 1160 580 S 0.0 0.0 0:00.12 crond
562 xfs 20 0 20264 1244 752 S 0.0 0.1 0:00.08 xfs
570 root 20 0 46744 828 428 S 0.0 0.0 0:00.00 saslauthd
571 root 20 0 46744 560 160 S 0.0 0.0 0:00.00 saslauthd
622 root 20 0 86072 3352 2608 S 0.0 0.1 0:01.70 sshd
630 root 20 0 12200 1808 1300 S 0.0 0.1 0:00.26 bash
10143 apache 20 0 323m 15m 3124 S 0.0 0.7 0:27.96 httpd
16508 apache 20 0 253m 9380 2364 S 0.0 0.4 0:00.00 httpd
16512 apache 20 0 253m 9380 2364 S 0.0 0.4 0:00.02 httpd
16513 apache 20 0 253m 9388 2364 S 0.0 0.4 0:00.00 httpd
16514 apache 20 0 315m 9436 2364 S 0.0 0.4 0:00.02 httpd
16527 apache 20 0 253m 9408 2364 S 0.0 0.4 0:00.02 httpd
16534 apache 20 0 251m 4940 628 S 0.0 0.2 0:00.00 httpd
22034 apache 20 0 324m 17m 3132 S 0.0 0.7 2:18.70 httpd
25014 root 20 0 12620 1192 920 R 0.0 0.0 0:00.00 top
31795 apache 20 0 322m 16m 3124 S 0.0 0.7 1:30.98 httpd

considering this>
privvmpages 327761 1143370 603785 664163 98961
seems like there is a memory leak somewhere... why the hell it wants to use 4,4 GB of ram ? Sad


emmm... seems like i've got it. i gave to VEs total guaranteed memory more, than i have physically... how can i null those failcnt out, so it would be easy to monitor the changes?

[Updated on: Mon, 16 August 2010 09:08]

Report message to a moderator

Re: nagios: Warning: The check of host '****' could not be performed due to a fork() error: [message #40429 is a reply to message #40380] Tue, 17 August 2010 18:31 Go to previous messageGo to next message
curx
Messages: 739
Registered: February 2006
Location: Nürnberg, Germany
Senior Member

Hi,

use Nagios, Icinga or what kind of monitoring software ... Wink

See:

http://wiki.openvz.org/Category:Monitoring

Bye,
Thorsten
Re: nagios: Warning: The check of host '****' could not be performed due to a fork() error: [message #40444 is a reply to message #40380] Thu, 19 August 2010 06:13 Go to previous messageGo to next message
romeor is currently offline  romeor
Messages: 11
Registered: April 2010
Junior Member
nope, rearranging the memory didnt give any result. still the same problem after some time. i even changed the kernel back to 2.6.18 but it seems pointless. nagios still sends these messages, while it has a lot of free ram.

i've attached few graphs and other VEs confs. can any1 help plz?


the problem is with VE id=5
  • Attachment: Hnode_ram.PNG
    (Size: 121.38KB, Downloaded 203 times)
  • Attachment: nagios_ram.PNG
    (Size: 75.69KB, Downloaded 199 times)
  • Attachment: 105.txt
    (Size: 1.13KB, Downloaded 229 times)
  • Attachment: 104.txt
    (Size: 1.12KB, Downloaded 240 times)
  • Attachment: 103.txt
    (Size: 1.12KB, Downloaded 246 times)
Re: nagios: Warning: The check of host '****' could not be performed due to a fork() error: [message #40445 is a reply to message #40380] Thu, 19 August 2010 06:14 Go to previous messageGo to next message
romeor is currently offline  romeor
Messages: 11
Registered: April 2010
Junior Member
and two more config files, as it is allowed only 5 per message.

and the /proc/bc/resources the part with failcnt on 105 VE


       uid  resource                     held              maxheld              barrier                limit              failcnt
      105:  kmemsize                  7690217             19317234  9223372036854775807  9223372036854775807                    0
            lockedpages                     0                    0               131072               131072                    0
            privvmpages                124484               327251               262144               274644                  418



other VEs are fine.
how it comes, that maxheld is 327251 pages or 1278 MB, while in graphs there is no more then 500 MB !? is there is some kind of memory leak somewhere in ovz?
  • Attachment: 102.txt
    (Size: 1.12KB, Downloaded 242 times)
  • Attachment: 101.txt
    (Size: 1.13KB, Downloaded 225 times)

[Updated on: Thu, 19 August 2010 06:18]

Report message to a moderator

Re: nagios: Warning: The check of host '****' could not be performed due to a fork() error: [message #40448 is a reply to message #40380] Thu, 19 August 2010 08:32 Go to previous message
romeor is currently offline  romeor
Messages: 11
Registered: April 2010
Junior Member
made a bug report with a link here.
Previous Topic: NFS on Open VZ Host Issues
Next Topic: Can main node issue casue guest vps to lose all logs?
Goto Forum:
  


Current Time: Mon Aug 05 05:11:12 GMT 2024

Total time taken to generate the page: 0.03084 seconds