Maintenance Mode History with SQL

Unfortunately SCOM 2012R2 does not have a native report and/or view that allows you quickly view the maintenance history on a specific server or servers. This handy SQL query I have used many times over to get the history of a given server or servers to find out when the machine entered MM (Maintenance Mode). Using the query below, run against the OperationsManager (or Data Warehouse) DB, and specify the server(s) you are interested with the date range:

---
USE OperationsManagerDW
SELECT ManagedEntity.DisplayName, MaintenanceModeHistory.*
FROM ManagedEntity WITH (NOLOCK)
INNER JOIN
MaintenanceMode ON ManagedEntity.ManagedEntityRowId = MaintenanceMode.ManagedEntityRowId
INNER JOIN
MaintenanceModeHistory ON MaintenanceMode.MaintenanceModeRowId = MaintenanceModeHistory.MaintenanceModeRowId

WHERE DisplayName Like 'server%.domain.net' AND ScheduledEndDateTime BETWEEN 'fromDateRange' AND 'toDateRange'
Advertisements

Wintel Gray Agents Runbook Automation

This Orchestrator Runbook, “SCOM2012R2_Check_HealthService” is setup to capture a “Health Service Heartbeat Failure” for Windows machines, and restart the HealthService and/or delete the corrupted HealthService cache folder and restart the service.

The Runbook will capture the alert from SCOM, once captured, it will wait 60 seconds, it will then ping the machine, and if the ping is successful then it will then wait for 180 seconds, then check to see if the HealthService on the machine is running. If the ping is unsuccessful, it will send an email indicating the machine is actually offline.

If the HealthService is running, then it is possibly a corrupted cache folder. It will then stop the HealthService, delete the cache folder, and restart the service.

If the HealthService is not running, it will then start the service.

In both events, an email will be sent out as an information alert, to indicate that the Runbook resolved the issue.

1

Details of Configuration

Monitor Alert Properties:

2

Link from Monitor Alert to Run Program:

3

Link from Run Program to Get HealthService Status:

4

Link from Get HealthService Status.

If Not running:

5

 Start HealthService Properties:

6

Since the Stop HealthService Properties are almost the same as Start HealthService, we have omitted this.

Delete Folder Properties

This pertains to SCOM 2012R2. There is a duplicate run book with the same configuration that checks against the old folder structure:

7

8

SCOM Wintel Gray Agents Health State and Cache Flush – Part II Automation

In the previous post, we learned we can clear the agents cache, recycle the health service, and this will (hopefully) resolve our gray agent issue. But, what happens when we have to do this for hundreds of agents? One word, PowerShell. PowerShell allows us to automate this task over hundreds of servers to make this very tedious task, actually very quick!Here is the code I use.

Just make sure all of your servers are within the list you are providing, and of course the account you are running as has Local Administrative rights on each server.

$list = gc “.\list.txt”
foreach ($server in $list)
{
       Write-Host $server Check Service: “ -NonewLine
if ((gwmi win32_service -computer $server -filter “name=’HealthService'” | %{$_.State}) -eq “Running”)
{
       gwmi win32_service -computer $server -filter “name=’HealthService'” | %{$_.StopService()}  | findstr ReturnValue | Tee-Object -var service | out-null
       $serviceResult = $service.split(“:”)
       if ($serviceResult[1] -eq ” 0″)
              { Write-Host “Successful” -f Green}
       else
{Write-Host “Failed” -f Red}
start-sleep 5
$a = gwmi win32_Directory -computer $server -filter “Name=’C:\\PROGRA~1\\SYSTEM~1\\Agent\\HEALTH~1\\HEALTH~1′”
                           $a.DeleteEx() | findstr ReturnValue Tee-Object -var status out-null
                           $statusresult $status.split(“:”)
                                  if ($statusResult[1] -eq ” 0″)
                                         {Write-Host “Successful”}
                                  else {Write-Host “Failed”}
       gwmi win32_service -computer $server -filter “name=’HealthService'” | %{$_.StartService()}  | findstr ReturnValue | Tee-Object -var service | out-null
       $serviceResult = $service.split(“:”)
       if ($serviceResult[1] -eq ” 0″)
              { Write-Host “Successful” -f Green}
       else
{Write-Host “Failed” -f Red}
       }
else
{Write-Host Stopped}
}

SCOM Wintel Gray Agents Health State and Cache Flush

Problem, you launch your SCOM console, and your server is in the following state. You browse the server, and check out the health service, and it is clearly running… So now what?

1 (1)

A Windows based machine appears in a “Not monitored” state. While SCOM thinks the machine is un-responsive, we can confirm this is not the case, as we can ping to the machine; in addition we are able to login to the machine.

2 (1)

This is a result of the SCOM health service needing its cache to be cleared.

SCOM has a built in task to do exactly what we want; however, since SCOM believes the machine is in an offline state, it will not be able to trigger the task to the “Not monitored” machine.

3 (1)

SOLUTION – MANUAL PROCESS

  1. Remote into the machine, and launch the services (services.msc). Locate the Microsoft Monitoring Agent service, and stop the service.

4 (1)

  1. Once the service has stopped, browse the following folder, “C:\Program Files\Microsoft Monitoring Agent\Agent\

5 (1)

Delete the entire (Health Service State) folder.

  1. Go back to the Windows Services (services.msc) and start the Microsoft Monitoring Agent service. This will rebuild the folder we just deleted.

Give SCOM a few seconds, maybe a few minutes, and the Health State of our machine will turn back to healthy!