The AppFabric Distributed Cache Service is an important yet fragile part of SharePoint 2013. To get the last status and for example use it in your monitoring might be a wanted solution to get a trigger when something is wrong with it.
The command “Get-CacheClusterHealth” gives a lot of textual results about the current state of all caches and cachehosts, which can be good, but is difficult to use and to interpret. The result is of type “Microsoft.ApplicationServer.Caching.Commands.ClusterHealth”.
The formatting is text, not objects as we are used to in Powershell. Furthermore, the structure is not consistent as such to easily convert to objects, the hostname is mentioned once and then all NamedCaches are listed without referencing the hostname again.
The output is explained in this Microsoft article: technet.microsoft.com/en-us/library/ff921010.aspx
The result is nicely described in this table on referenced TechNet article:
Health Category | Description |
---|---|
Healthy |
The cache is operating normally. This is the target state for all caches. |
UnderReconfiguration |
The cache is under reconfiguration. This is an internal state that may have several causes, but it should be temporary and resolve to healthy. |
NotPrimary |
The cache is not currently available. This can happen when secondary copies are promoted to primary. During this transition, the cache may temporarily have a state of NotPrimary . This state should typically resolve to healthy. |
NoWriteQuorum |
The cache is read-only, because the cache is unable to create the required number of replicas on secondary cache hosts. This occurs when the cache has the high availability option enabled (Secondaries = 1). In this scenario, there must be at least two running cache hosts in the cluster, one for the primary copy of the cached item and another for the secondary copy. |
Throttled |
The cache is read-only, because the cache host is in a throttled memory state. This is a low-memory condition. |
The actual output looks like this (fragment):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
PS D:\> Use-CacheCluster PS D:\> Get-CacheClusterHealth Cluster health statistics ========================= HostName = SP2013FE2.local.nl ------------------------- NamedCache = DistributedSecurityTrimmingCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 Healthy = 5,00 UnderReconfiguration = 0,00 NotPrimary = 0,00 InadequateSecondaries = 0,00 Throttled = 0,00 ... NamedCache = DistributedActivityFeedCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 Healthy = 5,00 UnderReconfiguration = 0,00 NotPrimary = 0,00 InadequateSecondaries = 0,00 Throttled = 0,00 PS D:\> |
Writing code to convert the result
I couldn’t find a working solution to handle the data, so I wanted to create my own. Its based on regular expressions. My inspiration comes from this example: msgoodies.blogspot.nl/2008/12/matching-multi-line-text-and-converting.
It took me some time to get the regex working, but I finally found a way in which the CacheClusterHealth can be used in an automated script that can regularly check the Health and trigger an alert if anything is wrong.
In each line of output, the “property name” is stated before the value, like “HostName = SP2013FE2.local.nl” and “Healthy = 5,00”, in the end result we want to have each property associated to each NamedCache per Host. That last part needed some more fine-tuning.
HostName | NamedCache | Healthy | UnderReconfiguration | NotPrimary | InadequateSecondaries | Throttled |
---|---|---|---|---|---|---|
SP2013Fe | DistributedSecurityTrimmingCache_<<GUID>> | 5 | 0 | 0 | 0 | 0 |
SP2013FE | DistributedSearchCache_<<GUID>> | 5 | 0 | 0 | 0 | 0 |
SP2013FE | DistributedViewStateCache_<<GUID>> | 5 | 0 | 0 | 0 | 0 |
… | … | |||||
SP2013FE2 | DistributedSecurityTrimmingCache_<<GUID>> | 5 | 0 | 0 | 0 | 0 |
The script: regex
The first part consists of the regex, it has 2 blocks surrounded by round brackets, the difference between the 2 is that the first one is with hostname and the second without. That’s because we first encounter a namedcache that is preceded by the hostname, and after that the other namedcaches are listed for that same hostname without referencing the hostname again.
In this regex we lookup each property by it’s name and the equal sign, the \s+ is used as “one or more whitespace characters”. The actual value is extracted using /S+ which means “one or more characters of anything but whitespace type”. We make sure that the result includes each property name enclosed in the lesser-than and greater-than signs, which will be used in the Powershell Object creation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
$regex = [regex] "(?msx) ( ( ^HostName\ =\ (?<HostName>\S+) .+? ^\s+NamedCache\s+=\s+(?<NamedCache>\S+) .+? \s+Healthy\s+=\s+(?<Healthy>\S+) .+? \s+UnderReconfiguration\s+=\s+(?<UnderReconfiguration>\S+) .+? \s+NotPrimary\s+=\s+(?<NotPrimary>\S+) .+? \s+InadequateSecondaries\s+=\s+(?<InadequateSecondaries>\S+) .+? \s+Throttled\s+=\s+(?<Throttled>\S+) ) | ( ^\s+NamedCache\s+=\s+(?<NamedCache>\S+) .+? \s+Healthy\s+=\s+(?<Healthy>\S+) .+? \s+UnderReconfiguration\s+=\s+(?<UnderReconfiguration>\S+) .+? \s+NotPrimary\s+=\s+(?<NotPrimary>\S+) .+? \s+InadequateSecondaries\s+=\s+(?<InadequateSecondaries>\S+) .+? \s+Throttled\s+=\s+(?<Throttled>\S+) ) ) " |
The script: matches to objects
The next part handles the results of the regex matches, important is we group by hostname because we only get the hostname once for all caches that are related to that hostname, so we safe it in the $hostname variable.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
$ClusterHealthResult = ( $regex.Matches( (Get-CacheClusterHealth | Out-String) ) | ForEach-Object { # Save current pipeline object, so it is available from inside the next foreach-object $match = $_ # Construct a new, empty object to return for each NamedCache. $obj = New-Object object # Determine if current match has a HostName, not all NamedCaches have a HostName line before it. # We need to re-use the last hostname when the hostname is absent. $currentHostName = $match.groups[$regex.GroupNumberFromName("HostName")].value if($currentHostName -ne ""){$hostname = $currentHostName} # Add this hostname as property to the result object Add-Member -inputobject $obj NoteProperty "HostName" $hostname # Get all the group names defined in the pattern - ignore the numeric, auto ones, and ignore the hostname, we already have it. $regex.GetGroupNames() | Where-Object {$_ -notmatch '^\d+$' -and $_ -ne "HostName"} | Foreach-Object { # Convert values to numbers when applicable $simple_value = $match.groups[$regex.GroupNumberFromName($_)].value [ref]$res = $null; If([double]::TryParse($simple_value,$res)){ $simple_value = [double]::Parse($simple_value , [CultureInfo]::CurrentCulture) } # And add each match as a property. When multiple results are returned, the # value must be picked up using an index number hence the GroupNumberFromName call Add-Member -inputobject $obj NoteProperty $_ $simple_value -ErrorAction:SilentlyContinue } # emit the object to the pipeline $obj } ) |
Examples to determine the health based on the script
Update: CAUTION: This next part is fundamentally wrong!
The last part consists of some examples to use the data:
1 2 3 4 5 6 7 8 9 |
$ClusterHealthResult | ft $MeasureClusterHealth = ($ClusterHealthResult | measure -Property Healthy -Average -Maximum -Minimum) if($MeasureClusterHealth.Minimum -ne $MeasureClusterHealth.Maximum){ "Unhealthy:";$ClusterHealthResult | ?{$_.Healthy -ne $MeasureClusterHealth.Maximum} }else{ "All healthy" } |
The end result might look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
PS C:\> $ClusterHealthResult | ft -AutoSize HostName NamedCache Healthy UnderReconfiguration NotPrimary InadequateSecondaries Throttled -------- ---------- ------- -------------------- ---------- --------------------- --------- SP2013FE.local.nl DistributedDefaultCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE.local.nl DistributedSearchCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE.local.nl DistributedViewStateCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE.local.nl DistributedBouncerCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE.local.nl DistributedLogonTokenCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE.local.nl DistributedSecurityTrimmingCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE.local.nl DistributedActivityFeedCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE.local.nl DistributedActivityFeedLMTCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE.local.nl DistributedAccessCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE.local.nl DistributedServerToAppServerAccessTokenCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE2.local.nl DistributedSecurityTrimmingCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE2.local.nl DistributedActivityFeedLMTCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE2.local.nl DistributedLogonTokenCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE2.local.nl DistributedBouncerCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE2.local.nl DistributedDefaultCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE2.local.nl DistributedServerToAppServerAccessTokenCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE2.local.nl DistributedViewStateCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE2.local.nl DistributedSearchCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE2.local.nl DistributedAccessCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 SP2013FE2.local.nl DistributedActivityFeedCache_7c230f58-7204-4c04-88f2-0b6ca0da5ad3 5 0 0 0 0 |