Tuesday, January 26, 2016

Lessons learned - FIM_TemporalEventsJob

Here are some lessons learned while troubleshooting the titular SQL Agent job.  If you think to yourself, "Self, why would I ever need to care about this?" then you're either (a) lucky or (b) blissfully ignorant.  Don't worry though, you can blame FIM for your ignorance.  ;)  This is one of those features of FIM that won't actually tell you that something's wrong until you stumble upon the fact that, for example, your managers stopped getting notifications when contractor accounts expired.

So, where do you start troubleshooting such a problem?  Well, the keyword "expiration" may trigger thoughts of whether you remembered to renew your car's registration, but in FIMLand it should make you think, "temporal set."  And how are temporal events triggered?  Well, one way is when a datetime attribute on a user is updated and it falls into the scope of a temporal set.  But think about the other way that temporal events are triggered: the user's datetime attribute has been the same for several weeks, and now it's been long enough that the user now satisfies some expiration criteria.  If you've never had to consider how that happens, then I envy you!  But if you really want to know, or if you've read this far and feel like you can't turn back now, then I'll tell you that those events are triggered by a SQL Agent job that's installed with the FIM Service product.

If you run SQL Server Management Studio (SSMS) and connect to the database engine on the FIM SQL server, you can expand the SQL Server Agent -> Jobs branch, right click on FIM_TemporalEventsJob and select View History.  If you see all green checkmarks, then congratulations on keeping your set criteria simple enough for FIM to handle!  However, if you see red icons like below, then congratulations on giving FIM a challenge!  (But unfortunately you're going to have to fix it.)




Lesson 1: FIM_TemporalEventsJob SQL Agent job continues after error in step 1, but fails if error in step 2

The steps of the FIM_TemporalEventsJob SQL Agent job are shown below.  The first lesson I learned in this troubleshooting exercise is that, if the job encounters an error in step 1, the job still continues on to the next step.  However, if an error is encountered in step 2, the job is stopped and the last two steps aren't executed.  (And in case you're not SQL literate, the steps below all execute SQL stored procedures.)

Step 1
EXECUTE [fim].[TriggerTemporalEvents] @extendedOutput = 0
Step 2
EXECUTE [fim].[MaintainSets]
Step 3
EXECUTE [fim].[MaintainGroups]
Step 4
EXECUTE [fim].[OptimizeSetMembershipConditionsUsingPartitions]

Step 1 sample error

Here's an example of an error that you might see in step 1 of the job history.  If you've got a keen eye, you may ask yourself, "What the heck is a SetKey???"  Don't worry, keep reading... 

Executed as user: DOMAIN\SVC-SQLAgent. Reraised Error 50000, Level 16, State 1, Procedure TriggerTemporalEvents, Line 667, Message: <_x0040_failedTable SetKey="6062948" ErrorMsg="Reraised Error 50000, Level 16, State 1, Procedure ReRaiseException, Line 37, Message: Reraised Error 8623, Level 16, State 1, Procedure ?, Line 2, Message: The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information."/> [SQLSTATE 42000] (Error 50000).  The step failed.

Step 2 sample error

Here's a similar error that you might see in step 2 of the job history.  This is essentially the same error, but it occurred in a different stored procedure.

Executed as user: DOMAIN\SVC-SQLAgent. Reraised Error 50000, Level 16, State 1, Procedure MaintainSets, Line 556, Message: <Failures><_x0040_failedSetCorrections SetKey="6062948" ErrorMessage="Reraised Error 8623, Level 16, State 1, Procedure ?, Line 2, Message: The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information."/></Failures> [SQLSTATE 42000] (Error 50000).  The step failed.

Lesson 2: Step 2 (fim.MaintainSets) will continue after individual errors

Okay, we've learned that the job will continue after errors in step 1 but not step 2.  But what about the substeps within each step?  I.e., if an error occurs while evaluating one set, will it stop or continue to evaluate the rest of the sets?  The answer is that it will evaluate *all* sets, even if one set causes an error.

To prove that to myself, I added some print statements to the fim.MaintainSets stored procedure and ran it manually...

Sample of adding print statements to fim.MaintainSets procedure

SUCCESS: @setKey                         = 2732
SUCCESS: @setKey                         = 2733
SUCCESS: @setKey                         = 2734
...
SUCCESS: @setKey                         = 6062663
SUCCESS: @setKey                         = 6062802
SUCCESS: @setKey                         = 6062814
FAIL: @setKey                         = 6062948
SUCCESS: @setKey                         = 6063066
SUCCESS: @setKey                         = 7185483
SUCCESS: @setKey                         = 7185525
...
SUCCESS: @setKey                         = 10711150
SUCCESS: @setKey                         = 11024517
SUCCESS: @setKey                         = 11596415

Lesson 3: Find problematic set

Okay, so how do you find which set is causing the problem?  In the error message above we saw a "SetKey," but what the heck do you do with that?  That's a great question, because all identifiers in the FIM Service are GUIDs.  Well, the FIM product group certainly did you no favors here.  In order to find the problematic set, you have to run a SQL query against the FIMService database:

SELECT
    s.SetKey
    ,o.ObjectID
    ,x.DisplayName
FROM [fim].[Set] s
join fim.[Objects] o on s.SetKey = o.ObjectKey
CROSS APPLY
(
    SELECT
        ValueString as DisplayName
    FROM fim.ObjectValueString ovs
    join fim.AttributeInternal ai on ai.[Key] = ovs.AttributeKey
    WHERE Name = 'DisplayName'
    and ovs.ObjectKey = o.ObjectKey
) x
where s.SetKey = '6062948'

Et voila!  Here's your problematic set!

SetKey
ObjectID
DisplayName
6062948
0DD436A3-C7B5-4D9E-974C-B24E546ED789
Contractor Extended Date 56 days without login

"Great!  Thanks Joe!  ...  Wait, how do I fix it???"  Well, this is out of scope of this blog post, but here are a few web resources to help you.  Good luck!




1 comment: