Posts Tagged ‘Heartbeat’

Failed heartbeat unnoticed in Distributed Application

Written by Ingmar Verheij on July 12th, 2011. Posted in Monitoring

Server down

System Center Operations Manager (SCOM) monitors the health of systems with an agent. One of the most basic checks that is executed is a health check of the agent itself. One of the checks is a heartbeat between the agent and the RMS (Root Management Server). If the heartbeat is lost for three times (configurable), the agent is considered unavailable.Health Service Heartbeat Failure

An alert is generated and (if configured) a notification is send to inform the administrator that there is a problem.

But if a Distributed Application is configured to monitor a chain of components, this failure remains unnoticed.

Node state 'Healthy'

Nodes that are unmonitored are grey and appear to be ‘Healthy’, which is strange for a node who’s heartbeat haven’t reported for quite some time.

Donate