SCOM – Powershell Recovery Action – Stopped Windows Service

August 31, 2017

Hi,

Today I was at a customer who had a really specific question regarding monitoring of Windows Services with Operations Manager (SCOM).

We had already set up some basic recovery actions which restart the service automatically after it was stopped.

For some other services the customer wanted to add extra functionality: The recovery action should retry starting the service a maximum of 3 times, if the service wasn’t started after 3 tries the customer wanted to receive an email telling them the recovery action failed. Out-of-the-box SCOM is unable to do stuff like that, therefore I used Powershell to accomplish this.

Sidenote: To be able to use Powershell as a recovery action you can use the free management pack provided by the community & SquaredUp, it can be downloaded from this website: https://squaredup.com/free-powershell-management-pack/. This management pack adds Powershell everywhere it is missing in Operations Manager, this is one of the default management packs I always install at customers.

 

To be fully functional different components are needed:

  • A monitor that checks the status of the service
    • This monitor can be created from the Authoring pane of the SCOM console using the Windows Service template

3

  • A recovery action for the monitor created previously
    • The recovery action can be created from health explorer1
  • A rule that picks up the event created by the recovery action Powershell script
    • This is an Alert Generating Rule (NT Event Log), the configuration is linked to the type and location of the event logged during the script2
  • A subscription on the rule to send the email.

The powershell script:

# Fill in the service name here

$ServiceName = “LPD Service”

$ServiceStarted = $False

$i =0;

#Create Eventlog source, erroraction Ignore is neededbecause once the source is created an error is thrown because the source already exists

New-Eventlog -LogName Application -Source “Powershell – Restart Service” -ErrorAction Ignore

Do{

# In second or third run, wait a minute before trying
to start the service

if ($i -gt 0){Start-Sleep -s 60}

#Try to start the service

Start-Service $ServiceName

$Service Get-Service -Name $ServiceName

     if($Service.Status -eq “Running”)

    {

    $ServiceStarted = $true

     }

    $i++

    if (($i -eq 3) -and ($ServiceStarted $false))

    {

    $eventmessage = $Servicename failed to restart after $i attempts, exiting script”

    #Log error event in eventviewer

    Write-Eventlog -LogName Application -Source “Powershell – Restart Service” -EntryType Error -Eventid 101 -Message $eventmessage

    exit

    }

 }

Until ($ServiceStarted = $true)

 $eventmessage = $ServiceName restarted after $i attempt(s)”

Write-Eventlog -LogName Application -Source “Powershell – Restart Service” -EntryType Information -Eventid100 -Message $eventmessage

 If you have any difficulties doing this, don’t hesitate to drop a comment below.

If you find this post useful, please consider buying me a virtual beer with a bitcoin donation: 3QhpQ5z5hbPXXRS8x6R5RagWVrRQ5mDEZ1

 

Best regards,

Bert

Advertisements

Defining Service Levels in SCSM2012

May 24, 2012

Hello,
Today we show how to define service levels for incidents in the Service Manager 2012. Step by step …
Step 1: Define the priorities of your incidents.
Open Administration -> Incident Settings -> Priorities


Select the option to not use the legacy ResolutionTime.

Step 2 : Define the service levels

Open Administration -> Service level Management -> Service Level Objectives -> create new :
We will create an slo for each defined priority. This is the one for the priority 3 incidents.

Create a new Queue for the P3 incidents


All the Priority 3 incidents


Set correct priority


You can create a working hours calendar for defining the hours for SLO calculating. Here a working hours calendar has allready been created.
The metric used here is the default resolution time meaning the time between created time and resolved time. You can create your own metric if required.
Here we will set the SLO to 48 hours and a warning will be created 2 hours before the SLO will be breached.



Done.


Now we can use the default views for warning and breached incidents.



You can also see notifications when the incident is in warning or breached.


Step 4 : Create an email subscription for notifications on incidents
Administration -> Notifications -> Subscriptions -> New subscription
When object is updated, use class Service Level Instance time information.


When status is changed from “no warning” to “warning”


Create an email template containing the necesarry attributes.



And select the recipients. In this case a specific escalation group and the assigned to User.

And test …


Enjoy.