Thursday, February 26, 2009

EventID 4502

This post is in addition to an earlier post about moving the OpsMgr-database to another SQL-server. It seems that there is one more thing which needs to be done.

Normally when one creates a group which will be dynamically populated and this group is being saved, all is well. So it was until I moved the OpsMgr-database to another SQL-server. When I made a new group and configured this group to be dynamically populated and tried to save this group, I got this error message:


I was a bit suprised at first but since I had created many dynamically populated groups already without any error I knew the error had to be related with the move of the OpsMgr-database. More over since that was the only change in the SCOM-environment.

The OpsMgr eventlog showed this event:


How nice! This event told me what was wrong: The execution of usercode in the .NET Framework is disabled. And that is default on a freshly installed SQL 2005 server. When one installs OpsMgr this feature is enabled. But in this case that is a different SQL-server now.

Cause
By default SQL accepts only Stored Procedures based on T-SQL. But when one creates a group which is to e bdynamically populated, certain Stored Procedures are used which are based on any language but T-SQL.

Solution
In order to resolve this issue, Common Language Runtime has to be enabled on the SQL instance where the OpsMgr database resides. Look here how to enable it. (Also a good source for any other programming and/or SQL related information)

See screendump below how the CLR must be set on the SQL-server:

Wednesday, February 25, 2009

SMS-enabled devices attaching to the RMS

On many occasions customers ask me about notifications in general and SMS-notifications particularly and how to make a SMS-enabled device work in conjunction with the RMS.

Actually it is very straightforward. Windows Server has to detect the SMS-enabled device (a mobile telephone) - for instance - as a modem. That is all there is to it. When this happens, SCOM will be able to send SMS messages. Nothing more, nothing less.

However there is some advice

Drivers:
Where do you find a good driver (x86 / x64, W2K03 and W2K08)? I myself do have good experiences with a universal cable driver of Nokia. No, I do not want to start advertising Nokia but this cable driver really works very good with many types of Nokia's. A networkengineer of a customer of mine pointed me to this driversoftware and from that moment on, it has become a regular tool in my personal SCOM toolbox. It can be found here. Also drivers for other brands can be found there.

Type of Mobile
Just a plain one. No fancy stuff on it like a dual screen or clamshell (breaks easily when attaching it to the RMS with Velcro). But just a simple straightforward type of Mobile telephone will do the work. Hey, we do want to send SMS not to play all kind of fancy games or to impress our colleagues with a bling bling mobile do we? And the simpler the telephone is, the better it will work. Not all too many bugs in it.

Prepaid or Not?
This really happened. I got it out of the field. Attaching a prepaid mobile to the RMS, forgetting to check its balance so it run out of its credits and not a single SMS message got out. Before this was found out, it took the people involved many frustrating hours. So do not use a prepaid mobile it is bound to go wrong.

Why SMS-enabled devices must be attached to the RMS, explained for managers...

On many occasions customers ask me about notifications. For instance, why - when using sms - they must attach a SMS-enabled device (like a mobile telephone) to the RMS and not just any other Management Server.

Since this question pops up quite often I have decided to write a posting about it, eventhough there are many good (technical) postings already about it.

So I will stay away from the technical details and try to give a more down to earth explanation (more for IT managers..) instead of a deep dive in the techniques behind it. For the techies who want to know 'just' that, go here. It is a good startingpoint for a technical deep dive into the world of notifications.

RMS & notifications
For starters, a RMS is not just a Management Server. Without it, the underlying SCOM Management Group wouldn't exist. Therefore the RMS runs some special services. (These services are present on any other Management Server but disabled by default.)

One of these special services is the SDK Service. This service plays an important role, also in handling notifications. Whenever an Alert is raised, it is showed in the console. When a SCOM administrator makes a subscription he/she doesn't do anything more then creating a filter which 'tells' SCOM when certain Alerts have to be put into a message and send off to SMS, E-mail or IM.

This very same subscription checks the SDK service to see whether any new raised alerts matches the applied filters. When it does, the Alert is send off to SMS, E-mail or IM (as specified in the subscription).

This also explains why a SMS-enabled device must be attached to a RMS, since the RMS is the only Management Server in a Management Group with the SDK Service in a running state.

Monday, February 23, 2009

Agent cannot be installed: not manually nor pushed from a MS - rectified

Thanks to input from the community I have rectified this posting. It seems that the recommended practice by Microsoft is not to delete lines from the DB but to use the stored procedure for it. Kevin Holman has a posting about it which can be found here. It covers in detail what steps have to be taken.

It is good to get input like this from the community since I can't possibly know everything and therefore I am prone to error. So please send feedback and keep me sharp.
At a customer’s site I bumped into this situation where one couldn’t install a SCOM Agent on a particular server. No matter what they tried, it just didn’t work. It took me some time to figure it out but I solved it. (the wrong way, but hey, I got it working...)

Situation
When one tries to install an Agent on a server it fails. A manual installation seems to go right but when one approves or rejects the installation this error is being shown:
The service threw an unknown exception. See inner exception for details
The OpsMgr eventlog of the RMS shows EventID 26319 with a content like this:
An exception was thrown while processing ApproveAgentPendingActions for session id uuid
When one tries to push the Agent this error is being shown:
One or more computers you are trying to manage are already in the process of being managed. Please resolve these issues via the Pending Management view in Administration, prior to attempting to manage them again.
The OpsMgr eventlog of the RMS shows EventID 33333 with a content like this:
~ Request: AgentPendingActionProcessChange -- (AgentName=), (PendingActionType=1), (AgentPendingActionId=11536ed1-29bc-d397-5396-2e6fb1e20fdc),~

Cause
It seemed that the server was an Agentless managed system before. So the system was already present in the OpsMgr database.

Before the Agent was installed on the very same server, they had forgotten to remove it from the Agentless Managed systems.

When the installation of the SCOM Agent failed (manually or pushed) they removed the system from the Agentless Managed systems.

But from that point on it was already 'too late': the OpsMgr database did have entries about this particular server which didn’t match with the real situation.

Solution
Instead of my previous information where one had to delete lines from the OpsMgr database, it is better to use the stored procedure as described by Kevin Holmans blogposting, found here

Sunday, February 22, 2009

SRS Server Validation Error

Situation
When one wants to install the Reportingfunctionality of SCOM this error pops up:
SRS Server Validation Error
Setup was unable to validate SQL Reporting Services
Please use the Reporting Services Configuration tool which is installed with SQL Server to validate your SQL Reporting Services configuration.
Cause
The account of the Web Service Identity of SRS doesn't match with the account being used in SCOM.

Solution
Of course running the Reporting Services Configuration tool is the first step in the process of making it all work.

But sometimes this will not suffice. Therefore Microsoft has included a special tool on the installationmedia of SCOM. it is located in the folder 'SupportTools' and named ResetSRS.exe

Run it from the commandprompt with this syntax: ResetSRS.exe MSSQLSERVER.

When this is done start the Reporting Services Configuration tool again, go to the option WebService Identity and click the Apply button. This way SRS has been brought back to the state as it was before the SCOM Reporting installation was ran.

Tuesday, February 17, 2009

Moving the SCOM database to another server

Sometimes one bumps into a situation where one has to move the SCOM database to another SQL-server. On itself it is not a big issue. When one follows these guidelines, there is nothing amiss.

However, there is one small pitfall. It seems that the Masterdatabase on the orginal SQL-server has been changed during the installation of SCOM. The application eventlog of the new SQL-server will display this event:
Event Type: Error
Event Source: MSSQLSERVER
Event Category: (2)
Event ID: 18054
Description:
Error 7779800008, severity 16, state 1 was raised, but no message with that error number was found in sys.messages. If error is larger than 50000, make sure the user-defined message is added using sp_addmessage.
Matt Goedtel has posted an article about it how to solve it. One must run a sql-script on the new SQL-server and all is well again.

Look here for the article and the sql-script

Friday, February 13, 2009

SCOM and virtualization

'Should it be virtualized or not?'
is a question that comes to mind when designing SCOM environments.

On the internet many topics can be found on this subject but - as far as I am concerned - the articles posted on System Center Forum are the best since they cover this topic in great detail and are objective as well.

These people have gone to great lengths in order to get answers by building themselves a lab based on VMware and Hyper-V.

A must-read for everyone designing SCOM environments.

Look here for the lab they built and here for part 1.

More topics will arrive: one about storage and another is a discussion about Virtualization Architecture.

Thursday, February 12, 2009

EventID 31400

Situation
The OpsMgr eventlog of the RMS shows this event:
Event Type: Error
Event Source: Health Service Modules
Event Category: None
Event ID: 31400
Date: xx-xx-2009
Time: xx:xx:xx
User: N/A
Computer:
Description:
An exception occured processing a group membership rule. The rule will be unloaded.
Subscription ID: f0aaa92f-8a7a-4b91-98ff-6b931463b6a3
Rule ID: e51c2c21-2456-0738-3051-b1a7f5bb687b
Group ID: 88340b40-e0a4-1909-499d-777927ca3c80
Group type name: bwren.MaintenanceGroup1
Exception: System.ArgumentOutOfRangeException: Index was out of range. Must be non-negative and less than the size of the collection.
Cause
There are two possibilities.

First one has to check whether the OpsMgr database supports regular expressions. It can be done by a SQL query but also by GUI:

- Start SQL Server Surface Area Configuration Tool
- Select Surface Area Configuration for Features
- Select component Surface Area Configuration for Features
- Check whether Enable CLR Integration is selected. If not, do so and apply changes

When this already the case, there is something else amiss: a collectionrule of a certain MP isn't configured properly.By running two or three queries one can find the culprit:

Query 1
The event contains an entry Rule ID. The GUID of this entry is needed here:
SELECT * FROM dbo.[Discovery]
where [DiscoveryId] = 'GUID Rule ID'
From this result, copy the GUID of the column ManagementPackID. This GUIS is needed for the second query

Query 2
Run this query:
SELECT * from dbo.[ManagementPack]
WHERE [ManagementPackId] = 'GUID ManagementPackID'
This result will have a column named MPFriendlyName. This will show the MP containing the faulty collectionrule.

Query 3
A third query will show the xml-output of the problematic MP:
DECLARE @mpxml xml;
SELECT @mpxml = [MPXML] FROM dbo.[ManagementPack]
WHERE [ManagementPackId] = 'GUID ManagementPackID';
SELECT @mpxml as MPXML;
The column DiscoveryName shows the culprit.

Source: Eggheadcafe

Tuesday, February 10, 2009

Steps for building monitor to check membership of Domain Admin Global Groups

  1. In the Console go to Authoring, Management Pack Objects, Monitors. Go to 'Change Scope'. In the 'Look For' box type 'Windows Server 2003 Computer'. Only select this target, click OK.

  2. Collapse 'windows server 2003 computer', 'Entity Health', right click 'Security', select 'Create a monitor', 'Unit Monitor'.

  3. Create a new MP with a logical name. Select as type of monitor 'Windows Events', 'Simple Event Detection', 'Timer Reset' , Next

  4. Specify a logical name, for instance 'Domain Admins Watcher' when this particular monitor checks the Global Group 'Domain Admins'. Deselect 'Monitor is Enabled', Next

  5. Log name 'Security', Next

  6. EventID equals '632'. Besides Parameter Name 'EventSource' is a button with 3 dots. Click it. Select 3rd option 'Use parametername not specified above', type 'EventDescription', OK. By Operator select 'Contains'. By Value type 'Domain Admins', Next


  7. Set Auto Reset Timer to 10 minutes. (When notifications are configured and this Alert will be sent by mail/sms/pager, this time can be reduced to two minutes), Next>

    A timer has been set since otherwise this monitor won't fire a new alert until it has been reset manually. Al the time the monitor isn't reset, membership of the monitored global group can be changed without having SCOM to alert upon it...

  8. Set Health State to 'Critical' when an Event is raised, Next

  9. Select 'Generate alerts for this monitor'. Set Priority to 'High'. Set as Alert Description '$Data/Context/EventDescription$' (the eventdescription of the eventid will be displayed in the Alert), Create.

Create two more monitors, each following these steps but instead of 'Domain Admins' for the EventDescription, one uses 'Enterprise Admins' for the second monitor and 'Schema Admins' for the third monitor. Be sure to put these monitors into the same Management Pack.

Enabling these monitors for the DCs
By using overrides these monitors have to be enabled for the Domain Controllers. Do this by selecting 'Overrides', 'Override the Monitor', 'For a Group…’ and select the group ‘AD Domain Controller Group (Windows 200x Server)’, (when the AD MP is loaded). Otherwise create group containing the DCs (store this group into the same MP as these monitors) and use that Group as the override target.

This article is based upon a blog posting of Kevin Holman. Look here for this blog posting.

Monitoring membership of Domain Admin Global Groups

Situation
When one wants to monitor whether a user is added to the AD global groups 'Domain Admins', 'Enterprise Admins' & 'Schema Admins' it can be a challenge to make this monitor work.

However, when one follows the steps in this blogpost of mine, the monitor will run like clockwork. Mostly I prefer to use monitors since they are nicely displayed within the HealthExplorer of the monitored object so it easy to see whether the monitor is being used.

Example of the Alerts raised in SCOM
Of course, there are multiple ways to make these monitors even better. One can add monitors to watch whether members are removed from these Global Gorups (EventID 633), or one can change the description of the Alerts, only displaying the name of the Global Group and the name of the user being added/removed. This can be done by using the correct parameter. For this a logfile parser is needed in order to findout the correct parameter numbers. But above steps are a way to make things work and later on one can adjust everything as needed.
This article is based upon a blogposting of Kevin Holman. Look here for this blogpost.

Wednesday, February 4, 2009

EventID 7024

Symptom
The SCOM Agent won't run on a monitored system. When one tries to start the OpsMgr HealthService this event is logged:

Log Name: System
Source: Service Control Manager
Date: xx/xx/xxxx xx:xx:xx
Event ID: 7024
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: xxxxx.xxx
Description: The OpsMgr Health Service service terminated with service-specific error 2147500037 (0x80004005).

Cause
Most of the times, one or more registrykeys are missing:
  1. HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\State Directory
    This entry points to where the Agent is installed and where the State Directory is to be found. This directory is the repository for the SCOM Agent. When this entry is missing, import it from a system where a working Agent is installed.

  2. HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Group\<Management Group >\Windows AccountLockDownSD
    This entry is needed by the Agent service to run. When it is not there import it from a a system where a working Agent is installed.

The Agent should work now.

26-08-2009 Update: PFE Jimmy Harper also ran into this issue and solved it in a different manner. He also blogged about it, found here.