SCOM Alert Handling Matrix

November 14, 2012

An Operations Manager Administrator shouldn’t be responsible for resolving all alerts that rise within an organization. This is why i started to work on an “Alert Handling Matrix” which defines the responsabilities for resolving alerts.

SCOM is a great monitoring tool! In some cases it’s even smarter than the people managing the monitored components! How is that possible?? Well, Product teams & 3th party vendors create Management Packs that have deep insight in their products. IT departments use these products, but they probably don’t have the same level of expertise that the product developers have. This means that IT departments must go through a learning curve just to understand the alerts that occur. There’s a lot to learn out of alerts!!! It can really improve your skills in managing products.
The person that’s managing the SCOM environment is not the one managing all products running in the environment, this is why we must define responsabilities. To do this I use a “Monitor all Layers” approach.

Each Layer has several components each having a responsible. The responsible person must be able to see alerts for his component (through the SCOM webconsole, Operations Console, Sharepoint,…) and he must be able to respond to the alert. The Matrix can also contain components that are not monitored yet, this will be the feature requests that must be implemented in SCOM. The Matrix would then look like this and you could start assigning responsibles.


We can also start using Operations Managers Role Based Access based upon this or make alert views corresponding to responsabilities. Report upon alerts on a monthly base, etc…
Once we start getting control over alerts (document them), we can start thinking about delegating alert resolutions to a first line helpdesk. Ultimately ITIL or MOF will come into the picture. But for starters we MUST get control over all alerts and this is a TEAM effort. This team effort will get the monitoring up to a higher level.

Such a team exists out of a few people that agree upon implementation, design and changes over the monitoring.


It goes without saying that one person will probably be in charge of multiple “monitored components” and that the “monitoring owner” is most likely to be the SCOM administrator.
This is my vision on “how to get control over your alerts”, you may use it or feel free to have another opinion.

Samuel.
Alert Handling.xls


SCOM: Tuning/Managing Alerts

October 24, 2012

Recently, i came at a customer site where the SCOM administrator left the firm, so Operations Manager was running on autopilot for quite a while. In fact they had over 8000 of unclosed alerts comming from rules!
For those of you who aren’t to familiar between the differences between rules and monitors. Here is an important one you should know.

  • Rules generate alerts. They do not make up the Health State of objects. So closing an alert comming from a rule is not too big of an issue. The rule will probably trigger a new alert if the bad condition still exists.
  • Monitors decide on the Health State of objects (green or red). They don’t necessarily generate alerts, but many of them do (i.e. free diskspace monitor)
    This is why you shouldn’t close alerts that are comming from a monitor. Monitors are self-healing, if i.e. free diskspace is back to normal, the alert will be automatically get closed and the objects health state will turn from red into green.

So how do find out wheter it’s a monitor or rule alert? Click on an alert in the “Active Alerts” pane.


Back to my case.

What i did first was closing all alerts comming from a rule. Of course by the use of our friend POWERSHELL.

#Resolve alerts that are created by a rule
get-scomalert -criteria ‘ResolutionState = ”0”’ | where-object {($_.IsMonitorAlert -eq $False)}| set-scomalert -ResolutionState 255

Secondly I ran the “Most common Alerts” Report for the last month or so.

The “Most commen Alerts” report is very useful in helping you with alert tuning/managing. If you address the most common alerts first, you’ll get an immediate gain resulting in less alerts. You could also schedule this report based on a Management Pack i.e. Active Directory. Persons responsible for Active Directory can then see on which alerts they have to work first.




Next thing to do is set responsibilities. As a SCOM administrator you shouldn’t have to worry too much about alerts comming out of the sql Management Pack. But we’ll address this issue some other time.

Grtz,
Samuel.