Nagios

Using Nagios plugins and NRPE to check network services and metrics on remote hosts

This article is part of our Academy Course titled Nagios Tutorial for IT Monitoring.

In this course, we provide a compilation of Nagios tutorials that will help you set up your own monitoring infrastructure. We cover a wide range of topics, from installation and configuration, to plugins and NRPE. With our straightforward tutorials, you will be able to get your own projects up and running in minimum time. Check it out here!

In our previous article (“Nagios Core Installation and Configuration on Ubuntu Server“), we defined Nagios Core plugins as extensions (either in the form of compiled binaries or executable scripts) that process command-line arguments, perform specific checks, and then return the results to the main program. Under the hood, plugins are an abstraction layer that link the Nagios daemon and the hosts and services being monitored.

In this guide we will dive more deeply into Nagios Core plugins and what you can do with them.

Additionally, we will introduce you to NRPE (Nagios Remote Plugin Executor), an utility that will allow you to run Nagios Core plugins remotely both in Linux and Microsoft Windows machines to check metrics such as disk space usage and CPU load.

To better be able to go through this tutorial and understand the concepts and practices presented here, it is highly recommended that you have installed Nagios Core and the plugins package as explained earlier. To do so, make sure you have followed Steps 1 through 10 in the previous guide before proceeding.

A closer look at Nagios Core plugins

In the first guide, we also downloaded and installed the official Nagios Core plugins, which consists of ~50 binary files (located in /usr/local/nagios/libexec) that are officially developed and maintained by the project. With these plugins you can check the status of common services like SMTP, SSH, HTTP, DNS (to name a few examples), plus some other machine-specific information such as uptime and disk space usage.

In addition, there are around 3000 plugins that have been developed and are maintained by the community under the name of Nagios Exchange. Together with the official plugins, they can be used to monitor “just about everything that runs on electricity”, as the Nagios project advertises.

Remember that without Nagios plugins, you can only monitor whether a host or device connected to the network is up or down. We need plugins to check on the status of services and metrics as explained in the above paragraph.

Let’s take a closer look at the configuration file for the CentOS 7 system we defined previously. The service definition is of particular interest:

define service {
use                     local-service
host_name               centos7
service_description     HTTP
check_command           check_http!-I 192.168.0.29 -u /index.html
notifications_enabled   1
}

Let’s examine the line beginning with check_command. This directive is followed by the command Nagios will use to check the status of the service in the remote host (check_http) and the required parameters after the exclamation sign (-I 192.168.0.29 -u /index.html in this case).

Fortunately, plugins are highly customizable in that you can pass parameters to indicate exactly what you need to monitor and where. For example, what happens if the remote HTTP service is listening on a port other than the default 80? What if you need to check whether the certificate in a SSL enabled server will still be valid after a given number of days? The check_http plugin allows to perform these checks and many more.

Let’s find out more about the usage of the plugin:

cd /usr/local/nagios/libexec
./check_http --usage
A more detailed help guide is available at http://nagios-plugins.org/doc/man/index.html or via the command line (./check_plugin --help, where plugin can be dns, ftp, ssh, load, ups, or any of the plugins inside /usr/local/nagios/libexec).

The help guide indicates that you can use the -p option (followed by the port number) to indicate that the HTTP server is running on a different port. Likewise, the -S option tells Nagios to connect via SSL. To point out the version you can optionally use a number between 1 and 3 (1 = TLSv1, 2 = SSLv2, and 3 = SSLv3).

Let’s change the port where Apache is listening on the CentOS 7 host to 8080. This will require the following change in the service definition:

check_command           check_http!-I 192.168.0.29 -u /index.html -p 8080
You can refer to the Apache HTTP server tutorial if you feel you could use a little help to change the default port or to set up a SSL enabled server. In addition, make sure that connections from the Nagios server to your remote host / new port are not blocked by an active firewall rule.

The first 2 notifications in Fig. 1 shows the results of the checks BEFORE and AFTER port 8080 was enabled in the remote host’s firewall:

Figure 1: Viewing Nagios notifications BEFORE and AFTER enabling traffic through the remote port in the firewall
Figure 1: Viewing Nagios notifications BEFORE and AFTER enabling traffic through the remote port in the firewall

If you later enable SSL on the HTTP server, modify the check_command directive in the service definition as follows:

check_command           check_http!-I 192.168.0.29 -u /index.html -S 1

Fig. 2 shows what you can expect in the Apache logs in the remote host under this scenario:

Figure 2: The check_http plugin in action: viewing the Apache logs on the remote host
Figure 2: The check_http plugin in action: viewing the Apache logs on the remote host

As you can see in Fig. 2 above, checks are performed at 5-minute intervals by default. If you want to change this setting, add the check_interval directive to the service definition followed by the desired number of minutes.

We can even go one step further and check if the SSL certificate in the remote host will still be valid for the next 366 days (it won’t if we followed the instructions given in Apache enable SSL / TLS tutorial). Feel free to choose a higher number if you wish.

To do this, change the check_command line as follows (please note that, according to the help guide, the URL is not checked when verifying the validity of a certificate – that is why we removed the -u option):

check_command           check_http!-I 192.168.0.29 -C 366

Nagios will perform the check and return a warning, as you can see in Fig. 3:

Figure 3: A warning message is issued when the certificate validity is not within the specified limit
Figure 3: A warning message is issued when the certificate validity is not within the specified limit

After you’re done with the above test, you should change the certificate expiration check to a more reasonable value (60 days, for example). This will warn you to renew the certificate when its expiration date is less than 2 months ahead:

check_command           check_http!-I 192.168.0.29 -C 60

Introducing Nagios Remote Plugin Executor (NRPE)

Up to this point you have learned how to use plugins to check the status of network services on remote hosts from a centralized Nagios server, without having to install any agents on those remote hosts. If, additionally, you need to verify other metrics such as disk usage, CPU load, number of logged-on users, you will need to consider using NRPE.

This tool consists of a service that allows the Nagios server to execute plugins on remote machines and report the results back to the server. In short, the check_nrpe plugin on the Nagios server communicates with the NRPE service running on the remote host, which in turn runs the desired plugin (check_load, for example) locally.

In order to use NRPE (the package name is nrpe in Fedora-based distributions and nagios-nrpe-server in Debian and derivatives), let’s set up the remote CentOS 7 host:

Step 1 – Download and compile NRPE and configure it to run under xinetd:

yum install xinetd
wget https://sourceforge.net/projects/nagios/files/nrpe-2.x/nrpe-2.15/nrpe-2.15.tar.gz
tar xvf nrpe-2.15.tar.gz
cd nrpe-2.15
./configure
make all
make install-plugin
make install-daemon
make install-daemon-config
make install-xinetd

Edit /etc/xinetd.d/nrpe and allow connections from the Nagios server (192.168.0.32), as shown in Fig. 4:

only_from       = 127.0.0.1 192.168.0.32
Figure 4: Allowing connections to the xinetd daemon from the Nagios server (192.168.0.32)
Figure 4: Allowing connections to the xinetd daemon from the Nagios server (192.168.0.32)

Step 2 – Create an user account and group (nagios)

useradd nagios

Step 3 – Define the commands to be used in the service declaration in the server:

Open /usr/local/nagios/etc/nrpe.cfg and make sure the following lines are present (see Fig. 5). You will want to replace the partition to be checked (/dev/mapper/centos_centos7–2-root) with the one that applies to your case.

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/mapper/centos_centos7--2-root
Figure 5: Viewing command definitions in the NRPE configuration file
Figure 5: Viewing command definitions in the NRPE configuration file

In each case, the -w and -c flags indicate the warning and critical thresholds, respectively. In other words, when the free disk space reaches the 20% threshold, NRPE will raise a warning message, whereas if the available space is 10% or less, a critical message will be issued. Likewise, you will receive warning and critical notifications when the CPU load reaches 15,10,5 and 30,25,20. The same is true for the number of logged-on users – warning and critical messages for 5 and 10 users.

Step 4 – Enable port 5666/tcp (the default port where NRPE listens on) in the built-in firewall:

firewall-cmd --add-port=5666/tcp
firewall-cmd --add-port=5666/tcp --permanent

And add the service description at the bottom of /etc/services:

echo "nrpe 		5666/tcp 		# NRPE" >> /etc/services

Finally, start / enable xinetd to start on boot(this will manage the NRPE daemon):

systemctl start xinetd
systemctl enable xinetd

In this example, we are using xinetd to provide access control based on the IP address of the Nagios server, as we can see in Figure 4 above.

Step 5 – Set up the service definitions in the Nagios server (/usr/local/nagios/etc/servers/centos7.cfg):

define service {
use                     generic-service
host_name               centos7
service_description     Check load
check_command           check_nrpe!check_load
notifications_enabled   1
}
define service {
use                     generic-service
host_name               centos7
service_description     Check disk
check_command           check_nrpe!check_disk
notifications_enabled   1
}
define service {
use                     generic-service
host_name               centos7
service_description     Check logged-on users
check_command           check_nrpe!check_users
notifications_enabled   1
}

Please note how the parameters being passed to check_nrpe match the command definition in the remote host, as shown in Step 3.

Testing NRPE

As always, don’t forget to restart Nagios for the changes to take effect. Then go to the Nagios web interface and check the status of the metrics being monitored (refer to Fig. 6 for details):

Figure 6: Checking metrics in the remote hosts via the Nagios web interface
Figure 6: Checking metrics in the remote hosts via the Nagios web interface

At this point, your Nagios server should be monitoring the disk usage, CPU load, and number of logged-on users in the remote CentOS 7 server.

If you face any issues, you can refer to the Nagios log (/usr/local/nagios/var/nagios.log) in the server and the generic message log (/var/log/messages) in the remote host as the first source of information for troubleshooting.

Summary

In this article we explained how to use Nagios plugins to monitor network services on remote hosts, and NRPE to monitor several machine-specific metrics on those hosts. How do you decide which one (generic plugins or NRPE) you should use? For simplicity, use the answers to the following questions to determine which approach applies to your case:

  • Do you need to verify the status of network services running on remote hosts? Use plugins running on the central Nagios server.
  • Do you need to check machine-specific metrics on the remote hosts? Use NRPE to run plugins on the remote hosts.

Last but not least, remember that Nagios is a monitoring tool and not Aladdin’s lamp. It does still require user intervention to prevent hardware damage when metrics go beyond the established limits, and human brains in action to troubleshoot issues when they occur. As with any other tool, you need to learn how to use it in order to make the most of it. We hope that this series will provide you with the necessary starting skills to accomplish that goal.

Happy monitoring!

Gabriel Canepa

Gabriel Canepa is a Linux Foundation Certified System Administrator (LFCS-1500-0576-0100) and web developer from Villa Mercedes, San Luis, Argentina. He works for a worldwide leading consumer product company and takes great pleasure in using FOSS tools to increase productivity in all areas of his daily work. When he's not typing commands or writing code or articles, he enjoys telling bedtime stories with his wife to his two little daughters and playing with them, the great pleasure of his life.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button