Sunday, August 4, 2013

Alwayson availability groups failover monitoring - Part 1

There are a lot of ways to monitor AG, sql server errorlog, window event log, cluster log......, here I list another way to check and monitor AG state with the log files below:

1. AlwaysOn Health Diagnostics Log
These files in the SQL Server Log directory have the following format: <HOSTNAME>_<INSTANCENAME>_SQLDIAG_X_XXXXXXXXX.xel.

2. Extended Events session "AlwaysOn_health" log
it is auto created when setup alwayson AG,

and by default it does not auto startup, so you can enable it manually

Now, let's use the 2 log files to monitor alwayson AG failover
>2 nodes  alwayson AG, SQL2012-01(Principle), SQL2012-02(Secondary)
>AG Name: MyAG
>Availability mode: Synchronous commit
>Failover mode:Automatica

1. Manually Failover :   Failover AG from SQL2012-01(01) to SQL2012-02(02)
Please note: Do not use the Failover Cluster Manager to move availability groups to different nodes or to fail over availability groups. The Failover Cluster Manager is not aware of the synchronization status of the availability replicas, and doing so can lead to extended downtime.
ou must use Transact-SQL or SQL Server Management Stud

We run the T-sql below on 02

a) On 01, Open "Merge Extended Event Files"

b)Open "AlwaysOn Health Diagnostics Log" file and Extended Events session "AlwaysOn_health" log

c)Sort result by "timestamp"

you can also click "Choose Columns" to add more column to display.

d)Check the log by failover time.
> the first log we find is AVAILABILITY GROUP "offline Pending" State

so AG is taking offline.

>Next, the replica state on 01 is changed from "Primary" to "Resolving"

>Next, AG is offline completely

>finally, replica state is changed from "Resolving" to "Secondary"

e) On 02, use the same way to check the log file
>first, the failover command was issue on 02, begin failover

>Next, replica state is changed from "Secondary" to "Resolving_Pending_Failover"

there are also some "info" message among all events, for instance, it shows the "Diagnostics" is started on new primary 02

>Next, replica state is changed from "Resolving_Pending_Failover" to "Resolving_Normal", then changed to "Primary_Pending"

>Next, AG came online on 02

>Finally, "Alter availability group" command was commit, and replica became to "Primary"

Next, I will try to monitor the AG in other conditions.


  1. Awesome Artical Really i have searching this type of valuable information From a lot of days i found satisfaction when Read your blog Thanks for giving this type blog and also please Read link bvba Woodstone which provide information server monitoring software & server monitoring tools

  2. Awsome article, was looking for an article on AlwaysON AG event monitoring...Thank you

  3. Good Blog, well descrided, Thanks for sharing this information.
    Big Data and Hadoop Online Training