Nagios Powered

Nagios is a powerful monitoring system that enables organizations to identify and resolve IT infrastructure problems before they affect critical business processes.

After experiencing problems with a HP Proliant DL380G6 that unexpectedly restarts, caused by a Automated Server Recovery (ASR), monitoring the status of the Citrix XenServers running on HP Proliant Servers is required in Nagios.

Nagios is a flexible solution that can be expanded with plugins. Plugins can be found at Nagios Exchange, this is where I found the check check_hpasm plugin (direct link). Unfortunately this plugin does not check the ASR status.

In this article I will describe how I’ve configured Groundwork (using Nagios) to monitor the health of HP Proliant Servers and expanded the check_hpasm plugin to check for ASR health.

check_hpasm operation modes

The check_hpasm plugin can operate in two modes: local and remote.

check_hpasm modesIn local mode the check_hpasm plugin is installed on the Citrix XenServer accompanied with a Nagios plugin. The Nagios plugin (nrpe) queries the check_hpasm plugin. This requires at least two additional agents to be installed on the Citrix XenServer which is not recommended.

The remote mode uses SNMP to query the HP System Health Monitor. This prevents installing plugins on the Citrix XenServer but requires SNMP to be configured and accessible from the Nagios server.

For this setup I’ve configured the remote mode using SNMP (and preventing installing plugins on the Citrix XenServers).

 

Prerequisite – HP System Health Monitor (hpasmd)

On the Citrix XenServer the “HP Proliant Support Pack” and “HP Health Application and Insight Management Agent” needs to be installed. These can be downloaded from hp.com under “Support & Drivers”. Or you can download the “HP SNMP Agent for Citrix XenServer 5.x” here.

Before continuing to installing the check_hpasm plugin you need to make sure the hpasm agent (daemon) is accessible via SNMP from the Nagios server.

# snmpwalk -c public –v1 <IP-address of your XenServer> 1.3.6.1.4.1.232

SNMPv2-SMI::enterprises.232.1.1.1.0 = INTEGER: 1
SNMPv2-SMI::enterprises.232.1.1.2.0 = INTEGER: 23
SNMPv2-SMI::enterprises.232.1.1.3.0 = INTEGER: 2
SNMPv2-SMI::enterprises.232.1.2.1.4.1.0 = INTEGER: 30
SNMPv2-SMI::enterprises.232.1.2.1.4.2.1.1.1 = INTEGER: 1
SNMPv2-SMI::enterprises.232.1.2.1.4.2.1.2.1 = STRING: "Compaq Standard Equipment Agent for Linux"

Installing check_hpasm plugin

Since where using the remote operation mode of the check_hpasm plugin the plugin needs to be installed on the Nagios (or Groundwork) server.

The check_hpasm plugin can be downloaded from Console Labs. Either download the files from the Nagios server or upload them, for instance using WinSCP.

Connect via a Secure Shell (SSH) to the Nagios server, for instance with PuTTY, and change the directory to the location where you placed the tarball  (.tar.gz).

Unpack the tarball

# tar xzvf check_hpasm-4.1.2.tar.gz

Configure the plugin with the following options

Option Explanation
–disable-hwinfo If you don’t want to see type, serial number and biosrelease in the output, you can switch this off by using this option.
–enable-hpacucli Activate checking of RAID controllers.
–enable-perfdata Add performance data to it’s output by default.
–with-degrees Temperatures values displayed in celcius instead of fahrenheit.
–with-nagios-user Defines the Nagios user
–with-nagios-group Defines the Nagios group
–with-noinst-level Defines the exit code if no hpasm rpm was installed.

 

./configure --prefix=/opt/plugins/custom/hp-insight --with-nagios-user=monitor --with-nagios-group=users –enable-hpacucli --enable-perfdata –with-degrees

Compile

# make
...
# make install
...

And finally test the plugin

# /opt/plugins/custom/hp-insight/libexec/check_hpasm -H 10.0.0.1 -C public

OK - System: 'proliant dl360 g3', S/N: '7J31LMW6N01D', ROM: 'P31 01/28/2004', hardware working fine, da: 1 logical drives, 1 physical drives | fan_1=50% fan_2=50% temp_1_cpu=16;50;50 temp_2_cpu=15;65;65 temp_3_ioBoard=21;56;56 temp_4_cpu=20;65;65

You now have configured the check_hpasm plugin and the hardware monitoring running.

 

Altering check_hpasm to incorporate ASR

The check_hpasm plugin checks the health of Processors, Power supplies, Memory modules, Fans, CPU- and board-temperatures and Raids. Unfortunately the status of ASR isn’t part of the health check.

Determine ASR status

I’ve downloaded the cpqhlth.mib MIB (part of the HP Insight Management MIB Kit) and ran a query via SNMP on one of the Citrix XenServers via iReasoning MIB Browser (Free and really useful). If found that there is an OID that specifies the overall condition of the ASR feature (1.3.6.1.4.1.232.6.2.5.17.0 or .iso.dod.internet.private.enterprise.compaq.cpqHealth.cpqHeComponent.cpqHeAsr.cpqHeAsrCondition.0).

SNMP Query cpqHeAsrCondition - VH04

I’ve checked the status of the VH04 (one of the servers that unexpectedly restarts) and saw that the status reported was degraded (3). A check on the VH01, which didn’t suffer problems with ASR showed me a different result : ok (2), as expected.

SNMP Query cpqHeAsrCondition - VH01

Editing script

I’ve downloaded the check_hpasm script via WinSCP (which can be found in /opt/plugins/custom/hp-insight/libexec) and it in Perl Express (A Free Perl IDE/Editor for Windows).

I noticed there was a procedure called ‘overall_check’ where I could incorporate a simple check for the ASR status. I’ve altered the script on a few places to incorporate the check via the OID I found in the MIB (still following me?).

sub overall_init
sub overall_init {
...
  my $cpqHeAsrCondition = '1.3.6.1.4.1.232.6.2.5.17.0';
  my $cpqHeAsrConditionValue = {
    1 => 'other',
    2 => 'ok',
    3 => 'degraded',
    4 => 'failed',
  };
...
  $self->{asrstatus} = lc SNMP::Utils::get_object_value(
      $snmpwalk, $cpqHeAsrCondition,
      $cpqHeThermalSystemFanStatusValue);
...
sub overall_check
sub overall_check {
...
  if ($self->{asrstatus}) {
    if ($self->{asrstatus} eq 'degraded') {
      $result = 1;
      $self->add_message(WARNING,
          sprintf 'ASR overall status is %s', $self->{asrstatus});
    } elsif ($self->{asrstatus} eq 'failed') {
      $result = 2;
      $self->add_message(CRITICAL,
          sprintf 'ASR overall status is %s', $self->{asrstatus});
    }
  } else {
    $self->add_info('This system does not have ASR.');
  }
...
sub collect
sub collect {
...
      my $cpqHeAsr     = "1.3.6.1.4.1.232.6.2.5";
...
      # Walk for ASR
      $tic = time;
      my $response2a = $session->get_table(
          -maxrepetitions => 1,
          -baseoid => $cpqHeAsr);
      if (scalar (keys %{$response2a}) == 0) {
        $self->trace(2, sprintf "maxrepetitions failed. fallback");
        $response2a = $session->get_table(
            -baseoid => $cpqHeAsr);
      }
      $tac = time;
      $self->trace(2, sprintf "%03d seconds for walk $cpqHeAsr (%d oids)",
          $tac - $tic, scalar(keys %{$response2a}))
...
      map { $response->{$_} = $response2a->{$_} } keys %{$response2a};
...

The end result would look like this:

check_hpasm - collect 1check_hpasm - collect 2check_hpasm - collect 3check_hpasm - overall_checkcheck_hpasm - overall_init

 

 

 

 

The script is uploaded again and tested from the Nagios server to determine the health of the VH04 server.

check_hpasm from Nagios on VH04

The result is as expected, the check_hpasm gives a warning about a degraded ASR on the VH04.

 

Groundwork / Nagios

Now the plugin has to be incorporated in the Nagios (or in my case Groundwork which uses Nagios) configuration. This can be done by editing the configuration files or via the GUI.

Edit configuration files

The command ‘check_hpasm’ is stored in the checkcommands.cfg file. This file is located in the ‘?/nagios/etc/ directory for Nagios and in the ‘/usr/local/groundwork/core/monarch/workspace/’ directory for Groundwork.

checkcommands.cfg
# command 'check_hpasm'
define command{
    command_name                   check_hpasm
    command_line                   /opt/plugins/custom/hp-insight/libexec/check_hpasm -H $HOSTADDRESS$ -C $ARG1$
    }
GUI

Since I’m using Groundwork I prefer to add the command and the service via the GUI.

Command ‘check_hpasm’

  1. Open the ‘Configuration’ tab
  2. Click on the ‘Commands’ tab
  3. On the left pane click ‘Commands > New’
  4. Enter the details of the command
    • Name of the command ‘check_hpasm’
    • Type : check
    • Command line : /opt/plugins/custom/hp-insight/libexec/check_hpasm –H $HOSTADDRESS$ –C $ARG1$
  5. Click Save

GroundWork Monitor Enterprise 6 - Commands 1GroundWork Monitor Enterprise 6 - Commands 2

Service ‘HP_Insight_Manager’

  1. Open the ‘Configuration’ tab
  2. Click on the ‘Services’ tab
  3. On the left pane click ‘Services > New Service’
  4. Enter the basics of the service
    • Service name : HP_Insight_Manager
    • Service template : generic-servic
  5. Click Add
  6. Enter the details of the service
    • Check command : check_hpasm
    • Command line : <SNMP Read-Only communitry string> (for instance Public)
  7. Click Save (important!)
  8. Select ‘Service Profiles’ tab
  9. Add the service profile you want to add it to (in my case ‘Xen service profile’)
  10. Click Save

GroundWork Monitor Enterprise 6 - Services 1GroundWork Monitor Enterprise 6 - Services 2GroundWork Monitor Enterprise 6 - Services 3GroundWork Monitor Enterprise 6 - Services 4

 

Hosts

Since the change in the service profile (adding a service) isn’t pushed to all hosts by default I will add them manually

  1. Open the  ‘Configuration’ tab
  2. Click on the ‘Hosts’ tab
  3. Expand ‘Hosts’ until you reach the node requested.
  4. Click on the ‘Detail’ node
  5. Click on the ‘Services’ tab
  6. Select the service ‘HP_Insight_Manager’
  7. Click ‘Add Service(s )

Repeat the task for all Citrix XenServers (…)

GroundWork Monitor Enterprise 6 - Hosts 1GroundWork Monitor Enterprise 6 - Hosts 2GroundWork Monitor Enterprise 6 - Hosts 3

 

Apply configuration

The changes in the configuration needs to be applied to the Groundwork / Nagios engine.

Pre flight test

Before the configuration is applied a Pre flight test can be executed to determine if the configuration is working correctly.

  1. Open the ‘Configuration’ tab
  2. Click on the ‘Control’ tab
  3. On the left pane click ‘Pre flight check’
  4. If the result in the right pane says ‘Success’ your good to go!

GroundWork Monitor Enterprise 6 - Pre-flight check

Commit

After a successful Pre flight check the configuration can be applied to the Groundwork / Nagios engine.

  1. Open the ‘Configuration’ tab
  2. Click on the ‘Control’ tab
  3. On the left pane click ‘Commit’
  4. On the right pane click ‘Backup’ (just to be sure)
  5. On the right pane click ‘Commit’
  6. If the result in the right pane says ‘Success’ you don’t have to restore the backup Knipogende emoticon

GroundWork Monitor Enterprise 6 - Commit 1GroundWork Monitor Enterprise 6 - Commit 2GroundWork Monitor Enterprise 6 - Commit 3

Check!

Final step is to check if the configuration is applied and the servers are indeed monitored with the HP_Insight_Manager service.

  1. Open the ‘Status’ tab
  2. Expand the Hosts treeview until you reach a Citrix XenServer (VH01 in my case)
  3. Determine if there is a ‘HP_Insight_Manager’ service

GroundWork Monitor Enterprise 6 - Status

 

Edit 22-07-2010:

Just received an e-mail from Gerhard Lausser (author of check_hpasm). He updated the plugin (v4.2.4) and included the ASR check (changelog).

 

Ingmar Verheij

2 Reacties

    1. Hi Mike,

      Sorry that the post didn’t suit your needs. In this case I monitored the servers with Groundwork and as such I used the Groundwork for explaining the steps. Groundwork uses Nagios for monitoring so the servers are monitored by Nagios (the plugin is a Nagios plugin).
      In the text i’ve explained which files you should change to get the plugin working. If you need any help, let me know.

      Regards,
      Ingmar

Geef een reactie

Het e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *

Deze site gebruikt Akismet om spam te verminderen. Bekijk hoe je reactie-gegevens worden verwerkt.

nl_NLNederlands