OpenVZ Forum


Home » Mailing lists » Users » strange problem with nagios nrpe server
strange problem with nagios nrpe server [message #24520] Wed, 05 December 2007 20:19 Go to next message
Steve Wray is currently offline  Steve Wray
Messages: 18
Registered: August 2007
Junior Member
Hi there,
I just took some filesystems from servers which have been running under 
Xen for some time and converted them to openvz.

I've found a very strange issue.

We monitor our servers with Nagios and each server runs the Nagios nrpe 
server.

Our config system for Nagios involves several Nrpe config files in 
/etc/nagios/nrpd.d/ and a config directive in nrpe.cfg pointing to this 
directory with:

include_dir=/etc/nagios/nrpe.d

I have found that while this directive works under Xen this does not 
work under openvz.

I've tested this by taking the filesystem back and forth between Xen and 
Openvz and it is definitely only a problem in Openvz.

Also, our VMs are running Debian.

Debian Etch does not exhibit this problem; only Debian Sarge.

I'm at a loss to explain this... it seems really wierd.

/proc/user_beancounters shows no failcnt for anything.

Surely I am missing something here? Any advice on debugging this would 
be appreciated.



Thanks
Re: strange problem with nagios nrpe server [message #24522 is a reply to message #24520] Wed, 05 December 2007 20:28 Go to previous messageGo to next message
Gregor Mosheh is currently offline  Gregor Mosheh
Messages: 62
Registered: April 2007
Member
The good news is that I use Nagios with our VPSs, and it works brilliantly.

> include_dir=/etc/nagios/nrpe.d
> I have found that while this directive works under Xen this does not 
> work under openvz.

I find that surprising. Are you sure that the permissions didn't get 
mangled when you copied it over to ovz? That's the first thing I'd 
check: making sure that /etc/nagios/nrpe.d is in fact a directory, and 
that's readable by the user who runs nrpe (user nagios?).

-- 
Gregor Mosheh / Greg Allensworth
System Administrator, HostGIS cartographic development & hosting services
http://www.HostGIS.com/

"Remember that no one cares if you can back up,
  only if you can restore." - AMANDA
Re: strange problem with nagios nrpe server [message #24523 is a reply to message #24522] Wed, 05 December 2007 20:58 Go to previous messageGo to next message
Steve Wray is currently offline  Steve Wray
Messages: 18
Registered: August 2007
Junior Member
Gregor Mosheh wrote:
> The good news is that I use Nagios with our VPSs, and it works brilliantly.
> 
>> include_dir=/etc/nagios/nrpe.d
>> I have found that while this directive works under Xen this does not 
>> work under openvz.
> 
> I find that surprising. Are you sure that the permissions didn't get 
> mangled when you copied it over to ovz? That's the first thing I'd 
> check: making sure that /etc/nagios/nrpe.d is in fact a directory, and 
> that's readable by the user who runs nrpe (user nagios?).

Believe me, thats the first thing I checked.

I've run nrpd under strace and see nothing out of the ordinary; it finds 
the correct number of files in the nrpd.d directory. Not much of a whizz 
with strace tho so don't know where to go from here.
Re: strange problem with nagios nrpe server [message #24526 is a reply to message #24523] Wed, 05 December 2007 21:45 Go to previous messageGo to next message
Steve Wray is currently offline  Steve Wray
Messages: 18
Registered: August 2007
Junior Member
Just one other possible data point.

I may have just dismissed these problems as some kind of creeping 
senility but I've seen some other bizarre issues with VMs migrated into 
OpenVZ.

One of these is to do with Samba filesharing.

When the VM is migrated into OpenVZ from Xen, samba fileshares on the VM 
can be accessed from Windows *only* by FQDN not by bare hostname.

Note that this broke *existing* mapped network drives for Windows users.

Also note that this did *not* affect Linux nor OSX clients; only Windows.

Since I've verified that this wierdness is *only* apparent when the VM 
was running under OpenVZ not under Xen I'm not inclined to believe that 
I am going insane when I find that NRPE under Debian Sarge has a problem 
when running under OpenVZ and not under Xen.

It starts to seem that OpenVZ can produce all *kinds* of unpredictable 
behavior... either that or I really am going mad complete with 
hallucinations :-/ Not discounting that possibility out of hand...



Steve Wray wrote:
> Gregor Mosheh wrote:
>> The good news is that I use Nagios with our VPSs, and it works 
>> brilliantly.
>>
>>> include_dir=/etc/nagios/nrpe.d
>>> I have found that while this directive works under Xen this does not 
>>> work under openvz.
>>
>> I find that surprising. Are you sure that the permissions didn't get 
>> mangled when you copied it over to ovz? That's the first thing I'd 
>> check: making sure that /etc/nagios/nrpe.d is in fact a directory, and 
>> that's readable by the user who runs nrpe (user nagios?).
> 
> Believe me, thats the first thing I checked.
> 
> I've run nrpd under strace and see nothing out of the ordinary; it finds 
> the correct number of files in the nrpd.d directory. Not much of a whizz 
> with strace tho so don't know where to go from here.
> 
> 
> 
>
Re: strange problem with nagios nrpe server [message #24549 is a reply to message #24523] Thu, 06 December 2007 08:35 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

Steve Wray wrote:
> Gregor Mosheh wrote:
> 
>>The good news is that I use Nagios with our VPSs, and it works brilliantly.
>>
>>
>>>include_dir=/etc/nagios/nrpe.d
>>>I have found that while this directive works under Xen this does not 
>>>work under openvz.
>>
>>I find that surprising. Are you sure that the permissions didn't get 
>>mangled when you copied it over to ovz? That's the first thing I'd 
>>check: making sure that /etc/nagios/nrpe.d is in fact a directory, and 
>>that's readable by the user who runs nrpe (user nagios?).
> 
> 
> Believe me, thats the first thing I checked.
> 
> I've run nrpd under strace and see nothing out of the ordinary; it finds 
> the correct number of files in the nrpd.d directory. Not much of a whizz 
> with strace tho so don't know where to go from here.

Can you take
# strace -f -o somefile <nrpe>
from both working (Xen) and non-working (OVZ) installations?

Is it possible to get a login with exact instructions how to reproduce the issue
on your box? this would help to resolve it ASAP and make sure it's not a black magic :@)

Thanks,
Kirill
Re: strange problem with nagios nrpe server [message #24550 is a reply to message #24526] Thu, 06 December 2007 09:06 Go to previous messageGo to next message
Kirill Korotaev is currently offline  Kirill Korotaev
Messages: 137
Registered: January 2006
Senior Member
Steve Wray wrote:
> Just one other possible data point.
> 
> I may have just dismissed these problems as some kind of creeping 
> senility but I've seen some other bizarre issues with VMs migrated into 
> OpenVZ.
> 
> One of these is to do with Samba filesharing.
> 
> When the VM is migrated into OpenVZ from Xen, samba fileshares on the VM 
> can be accessed from Windows *only* by FQDN not by bare hostname.
> 
> Note that this broke *existing* mapped network drives for Windows users.
> 
> Also note that this did *not* affect Linux nor OSX clients; only Windows.
> 
> Since I've verified that this wierdness is *only* apparent when the VM 
> was running under OpenVZ not under Xen I'm not inclined to believe that 
> I am going insane when I find that NRPE under Debian Sarge has a problem 
> when running under OpenVZ and not under Xen.
>
>
> It starts to seem that OpenVZ can produce all *kinds* of unpredictable 
> behavior... either that or I really am going mad complete with 
> hallucinations :-/ Not discounting that possibility out of hand...

Oh, don't say so. Everything should have a logical explanation.
And I guess I know the answer to this one.

First of all, plz check that you don't have any kind of firewall
rules in host system and VE with 'iptables -L'.

But the real suspect is broadcast network messages from NetBIOS protocol.
Working FQDN means that host can be found via DNS and by IP.
Non-working short hostnames mean that your hosts are not setup
in default domain in DNS and that name resolution via netbios failed.

You need to connect your VE to ethX adapter using veth (virtual ethernet)
adapter and Linux bridge. This will allow use of network broadcasts.
The default venet networking is a secure IP-level networking which filters
out broadcasts.

http://wiki.openvz.org/Virtual_Ethernet_device
http://wiki.openvz.org/Differences_between_venet_and_veth
http://forum.openvz.org/index.php?t=msg&goto=7295&&srch=samba#msg_7295
http://en.wikipedia.org/wiki/NetBIOS

Forseeing your question about why venet is used as default networking type:
1. venet is more secure (see wiki).
2. venet is more scalable up to hundrends and thousands of VEs,
   while veth/ethernet/bridge broadcasts/multicasts will simply kill (DoS) the node
   in case of many VEs.

Thanks,
Kirill
Re: strange problem with nagios nrpe server [message #24551 is a reply to message #24550] Thu, 06 December 2007 09:34 Go to previous messageGo to next message
Kirill Korotaev is currently offline  Kirill Korotaev
Messages: 137
Registered: January 2006
Senior Member
BTW,

Do you use/have WINS server? it is usually used for names resolution
and can be used w/o broadcasts, so it should work even with your current
configuration.

http://www.oreilly.com/catalog/samba/chapter/book/ch07_03.html

Thanks,
Kirill


Kirill Korotaev wrote:
> Steve Wray wrote:
> 
>>Just one other possible data point.
>>
>>I may have just dismissed these problems as some kind of creeping 
>>senility but I've seen some other bizarre issues with VMs migrated into 
>>OpenVZ.
>>
>>One of these is to do with Samba filesharing.
>>
>>When the VM is migrated into OpenVZ from Xen, samba fileshares on the VM 
>>can be accessed from Windows *only* by FQDN not by bare hostname.
>>
>>Note that this broke *existing* mapped network drives for Windows users.
>>
>>Also note that this did *not* affect Linux nor OSX clients; only Windows.
>>
>>Since I've verified that this wierdness is *only* apparent when the VM 
>>was running under OpenVZ not under Xen I'm not inclined to believe that 
>>I am going insane when I find that NRPE under Debian Sarge has a problem 
>>when running under OpenVZ and not under Xen.
>>
>>
>>It starts to seem that OpenVZ can produce all *kinds* of unpredictable 
>>behavior... either that or I really am going mad complete with 
>>hallucinations :-/ Not discounting that possibility out of hand...
> 
> 
> Oh, don't say so. Everything should have a logical explanation.
> And I guess I know the answer to this one.
> 
> First of all, plz check that you don't have any kind of firewall
> rules in host system and VE with 'iptables -L'.
> 
> But the real suspect is broadcast network messages from NetBIOS protocol.
> Working FQDN means that host can be found via DNS and by IP.
> Non-working short hostnames mean that your hosts are not setup
> in default domain in DNS and that name resolution via netbios failed.
> 
> You need to connect your VE to ethX adapter using veth (virtual ethernet)
> adapter and Linux bridge. This will allow use of network broadcasts.
> The default venet networking is a secure IP-level networking which filters
> out broadcasts.
> 
> http://wiki.openvz.org/Virtual_Ethernet_device
> http://wiki.openvz.org/Differences_between_venet_and_veth
> http://forum.openvz.org/index.php?t=msg&goto=7295&&srch=samba#msg_7295
> http://en.wikipedia.org/wiki/NetBIOS
> 
> Forseeing your question about why venet is used as default networking type:
> 1. venet is more secure (see wiki).
> 2. venet is more scalable up to hundrends and thousands of VEs,
>    while veth/ethernet/bridge broadcasts/multicasts will simply kill (DoS) the node
>    in case of many VEs.
> 
> Thanks,
> Kirill
> 
>
Re: strange problem with nagios nrpe server [message #24928 is a reply to message #24520] Wed, 12 December 2007 08:57 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

Just for the history/other users the resolution of the problem Steve had:

OpenVZ was installed on XFS, while Xen used ext3 file system.
XFS doesn't support filetype feature (d_type field returned by readdir()/getdents64()),
so d_type is always reported as DT_UNKNOWN.

nrpe application from Debian Sarge repository has a bug
(fixed in Etch?), so it didn't handle files with unknown types and thus
didn't saw config files.

Steve switched to ext3 file system and the problem has gone.

Thanks,
Kirill



Steve Wray wrote:
> Hi there,
> I just took some filesystems from servers which have been running under 
> Xen for some time and converted them to openvz.
> 
> I've found a very strange issue.
> 
> We monitor our servers with Nagios and each server runs the Nagios nrpe 
> server.
> 
> Our config system for Nagios involves several Nrpe config files in 
> /etc/nagios/nrpd.d/ and a config directive in nrpe.cfg pointing to this 
> directory with:
> 
> include_dir=/etc/nagios/nrpe.d
> 
> I have found that while this directive works under Xen this does not 
> work under openvz.
> 
> I've tested this by taking the filesystem back and forth between Xen and 
> Openvz and it is definitely only a problem in Openvz.
> 
> Also, our VMs are running Debian.
> 
> Debian Etch does not exhibit this problem; only Debian Sarge.
> 
> I'm at a loss to explain this... it seems really wierd.
> 
> /proc/user_beancounters shows no failcnt for anything.
> 
> Surely I am missing something here? Any advice on debugging this would 
> be appreciated.
> 
> 
> 
> Thanks
Re: strange problem with nagios nrpe server [message #24975 is a reply to message #24928] Wed, 12 December 2007 16:30 Go to previous messageGo to next message
Gregor Mosheh is currently offline  Gregor Mosheh
Messages: 62
Registered: April 2007
Member
Kirill Korotaev wrote:
> Just for the history/other users the resolution of the problem Steve had:
> OpenVZ was installed on XFS

WOW, good work Kirill. That must have been a gnarly one to figure out, I 
never even thought of the filesystem type combined with a bug in NRPE.

Hopefully, now Steve can join the ranks of us highly satisfied OpenVZ 
users. )

-- 
Gregor Mosheh / Greg Allensworth
System Administrator, HostGIS cartographic development & hosting services
http://www.HostGIS.com/

"Remember that no one cares if you can back up,
  only if you can restore." - AMANDA
Re: strange problem with nagios nrpe server [message #24976 is a reply to message #24975] Wed, 12 December 2007 16:59 Go to previous messageGo to next message
lst_hoe01 is currently offline  lst_hoe01
Messages: 15
Registered: February 2007
Junior Member
Zitat von Gregor Mosheh <gregor@hostgis.com>:

> Kirill Korotaev wrote:
>> Just for the history/other users the resolution of the problem Steve had:
>> OpenVZ was installed on XFS
>
> WOW, good work Kirill. That must have been a gnarly one to figure out,
> I never even thought of the filesystem type combined with a bug in NRPE.

Sorry, have not listen close enough to this. Why is it a problem with  
XFS. The filesystem should not matter for applications running so it  
would be a XFS bug?

Regards

Andreas
Re: strange problem with nagios nrpe server [message #25000 is a reply to message #24976] Wed, 12 December 2007 20:38 Go to previous messageGo to next message
Steve Wray is currently offline  Steve Wray
Messages: 18
Registered: August 2007
Junior Member
Listaccount wrote:
> Zitat von Gregor Mosheh <gregor@hostgis.com>:
> 
>> Kirill Korotaev wrote:
>>> Just for the history/other users the resolution of the problem Steve 
>>> had:
>>> OpenVZ was installed on XFS
>>
>> WOW, good work Kirill. That must have been a gnarly one to figure out,
>> I never even thought of the filesystem type combined with a bug in NRPE.
> 
> Sorry, have not listen close enough to this. Why is it a problem with 
> XFS. The filesystem should not matter for applications running so it 
> would be a XFS bug?

I'd say it was an NRPE bug as NRPE (as in Debian Sarge) wasn't handling 
the value returned by the filesystem.

Hence, the NRPE option to read conf files from a directory didn't work; 
the directory listing returned no entries. So far as NRPE was concerned, 
the directory was empty.

Since later versions of NRPE appear to do so, I'd say it was a bug in NRPE.


When I was diagnosing the issue and comparing the Xen instances with VZ 
I forgot that the important part of the VZ system (where the 'private' 
and 'root' directories are) was under an XFS mount point rather than ext3.

Hence all my testing was, unwittingly, comparing OpenVZ VMs residing on 
XFS against Xen VMs residing on ext3.

This made my diagnosis somewhat less than useful until I created a whole 
new OpenVZ Debian Sarge VM in a fresh partition which was formatted with 
ext3.

When this one worked fine I started to look more closely and realised my 
blunder.

I have to say, I've used both Nagios and XFS for many many years and 
never has something like this occured.

Had I realised that the important directories were XFS I may have found 
this:

http://osdir.com/ml/network.nagios.devel/2004-05/msg00044.html

 From 2004!!! Amazing this wasn't fixed in Debian Sarge really.

:(

Sorry to have wasted anyones time...
Re: strange problem with nagios nrpe server [message #25013 is a reply to message #24976] Wed, 12 December 2007 21:56 Go to previous message
porridge is currently offline  porridge
Messages: 23
Registered: October 2007
Location: London, UK
Junior Member
On Wed, Dec 12, 2007 at 05:59:13PM +0100, Listaccount wrote:
> Zitat von Gregor Mosheh <gregor@hostgis.com>:
> 
> >Kirill Korotaev wrote:
> >>Just for the history/other users the resolution of the problem Steve had:
> >>OpenVZ was installed on XFS
> >
> >WOW, good work Kirill. That must have been a gnarly one to figure out,
> >I never even thought of the filesystem type combined with a bug in NRPE.
> 
> Sorry, have not listen close enough to this. Why is it a problem with  
> XFS. The filesystem should not matter for applications running so it  
> would be a XFS bug?

The application (nrpe) uses a non-portable feature of some filesystems
without paying attention whether a particualar filesystem actually
supported this feature. In this case, XFS happened not to support it,
which resulted in invalid behavior of the application.

To summarize: it's a bug in the application, but can be worked around by
using ext3.

-- 
Marcin Owsiany <marcin@owsiany.pl>              http://marcin.owsiany.pl/
GnuPG: 1024D/60F41216  FE67 DA2D 0ACA FC5E 3F75  D6F6 3A0D 8AA0 60F4 1216
 
"Every program in development at MIT expands until it can read mail."
                                                              -- Unknown
Previous Topic: Network configuration
Next Topic: Question on openvz install on ubuntu
Goto Forum:
  


Current Time: Tue Mar 19 02:50:22 GMT 2024

Total time taken to generate the page: 0.02313 seconds