strange problem with nagios nrpe server [message #24520] |
Wed, 05 December 2007 20:19 |
Steve Wray
Messages: 18 Registered: August 2007
|
Junior Member |
|
|
Hi there,
I just took some filesystems from servers which have been running under
Xen for some time and converted them to openvz.
I've found a very strange issue.
We monitor our servers with Nagios and each server runs the Nagios nrpe
server.
Our config system for Nagios involves several Nrpe config files in
/etc/nagios/nrpd.d/ and a config directive in nrpe.cfg pointing to this
directory with:
include_dir=/etc/nagios/nrpe.d
I have found that while this directive works under Xen this does not
work under openvz.
I've tested this by taking the filesystem back and forth between Xen and
Openvz and it is definitely only a problem in Openvz.
Also, our VMs are running Debian.
Debian Etch does not exhibit this problem; only Debian Sarge.
I'm at a loss to explain this... it seems really wierd.
/proc/user_beancounters shows no failcnt for anything.
Surely I am missing something here? Any advice on debugging this would
be appreciated.
Thanks
|
|
|
|
|
|
|
|
|
|
|
|
Re: strange problem with nagios nrpe server [message #25000 is a reply to message #24976] |
Wed, 12 December 2007 20:38 |
Steve Wray
Messages: 18 Registered: August 2007
|
Junior Member |
|
|
Listaccount wrote:
> Zitat von Gregor Mosheh <gregor@hostgis.com>:
>
>> Kirill Korotaev wrote:
>>> Just for the history/other users the resolution of the problem Steve
>>> had:
>>> OpenVZ was installed on XFS
>>
>> WOW, good work Kirill. That must have been a gnarly one to figure out,
>> I never even thought of the filesystem type combined with a bug in NRPE.
>
> Sorry, have not listen close enough to this. Why is it a problem with
> XFS. The filesystem should not matter for applications running so it
> would be a XFS bug?
I'd say it was an NRPE bug as NRPE (as in Debian Sarge) wasn't handling
the value returned by the filesystem.
Hence, the NRPE option to read conf files from a directory didn't work;
the directory listing returned no entries. So far as NRPE was concerned,
the directory was empty.
Since later versions of NRPE appear to do so, I'd say it was a bug in NRPE.
When I was diagnosing the issue and comparing the Xen instances with VZ
I forgot that the important part of the VZ system (where the 'private'
and 'root' directories are) was under an XFS mount point rather than ext3.
Hence all my testing was, unwittingly, comparing OpenVZ VMs residing on
XFS against Xen VMs residing on ext3.
This made my diagnosis somewhat less than useful until I created a whole
new OpenVZ Debian Sarge VM in a fresh partition which was formatted with
ext3.
When this one worked fine I started to look more closely and realised my
blunder.
I have to say, I've used both Nagios and XFS for many many years and
never has something like this occured.
Had I realised that the important directories were XFS I may have found
this:
http://osdir.com/ml/network.nagios.devel/2004-05/msg00044.html
From 2004!!! Amazing this wasn't fixed in Debian Sarge really.
:(
Sorry to have wasted anyones time...
|
|
|
Re: strange problem with nagios nrpe server [message #25013 is a reply to message #24976] |
Wed, 12 December 2007 21:56 |
porridge
Messages: 23 Registered: October 2007 Location: London, UK
|
Junior Member |
|
|
On Wed, Dec 12, 2007 at 05:59:13PM +0100, Listaccount wrote:
> Zitat von Gregor Mosheh <gregor@hostgis.com>:
>
> >Kirill Korotaev wrote:
> >>Just for the history/other users the resolution of the problem Steve had:
> >>OpenVZ was installed on XFS
> >
> >WOW, good work Kirill. That must have been a gnarly one to figure out,
> >I never even thought of the filesystem type combined with a bug in NRPE.
>
> Sorry, have not listen close enough to this. Why is it a problem with
> XFS. The filesystem should not matter for applications running so it
> would be a XFS bug?
The application (nrpe) uses a non-portable feature of some filesystems
without paying attention whether a particualar filesystem actually
supported this feature. In this case, XFS happened not to support it,
which resulted in invalid behavior of the application.
To summarize: it's a bug in the application, but can be worked around by
using ext3.
--
Marcin Owsiany <marcin@owsiany.pl> http://marcin.owsiany.pl/
GnuPG: 1024D/60F41216 FE67 DA2D 0ACA FC5E 3F75 D6F6 3A0D 8AA0 60F4 1216
"Every program in development at MIT expands until it can read mail."
-- Unknown
|
|
|