Monday, 8 October 2012

Part 3- 4 NODE MULTI-SITE DISASTER,DYNAMIC QUORUM


In Part 3 of this 4- Node Multi-Site clustering series I will go over various Disaster recovery scenarios







Powershell cmdlet to verify NODE WEIGHT:


CLUSTER TO BE ONLINE : 3 OUT OF 5 NODE WEIGHT.

COMBINED NODE WEIGHT = 5 (fooprimary+foosecondary+drprimary+drsecondary+quorum)

That being said we can lose 2 Nodes at a time before cluster goes down.


MULTI-SITE SCENARIO:


  1. You want cluster to be online on DR Site only when all the Server in Primary site is down.

This is possible by settings the Preferred owner option in properties of the cluster name


so by settings this preferred owners option what would happen is as along we have any 1 server online of the required 3 Node weight from Primary site, SQL resources will be residing only on either FOOPRIMARY OR FOOSECONDARY NODE.
DRPRIMARY/DRSECONDARY NODE will kick inn only when all the Nodes in primary site is down.

  1. Fail-back option :


Let assume a scenario if both Nodes on Primary Site fails and resources are in Secondary site.
Now if a Node in Primary site comes online, how you want the resources to be handled.

Option :

  1. Prevent fail-back: This settings requires manually intervention for moving the resources to Primary site
  2. All fail-back: Either immediately or schedule a time. If Primary Site comes online, what you want to do , immediately move the resource to Primary Site or schedule at a Particular time.

The recommended option is “Prevent fail-back  in MULTI-SITE , because after the Primary site is online you want to test make sure site is all stable before moving the resources.


DISASTOR SCENARIO CLUSTER ONLINE


DRSECONDARY NODE DOWN:



Cluster : Total Node weight is 4 so cluster will be online.




Cluster: Total Node weight is 2 so the Cluster is down.



Now to bring the cluster online we need to force the cluster without quorum and then add the other nodes in prevent quorum option and recreate quorum.

This where if you have DYNAMIC QUORUM enabled will reconfigure the quorum on the fly and will keep the cluster working even if only 1 Node is online.

HOW TO ENABLE: Please check my Part 1 of this series in the section where I configure quorum.




DYNAMIC QUORUM

Note: http://technet.microsoft.com/en-us/library/jj612870.aspx , highly recommend going through the link which explains in deep.
So Dynamic Quorum will work only
  1. The cluster should have achieved the quorum meaning there should already be a quorum configured before a Node goes down.
  2. Nodes should fail sequentially. If couple of Nodes fail simultaneously , then Dynamic quorum will not recalculate the vote, instead Nodes will regroup with remain Nodes and re-asses if Quorum can be configured and then dynamic quorum will kick inn for any more Node failure.


Dynamic Quorum is enabled by default. In short what is does is on the fly it recalculates the online Nodes and configure the quorum accordingly to keep the cluster online.



Let see the DYNAMIC QUORUM in action


DISASTER SCENARIO 1: DRSECONDARY down




Dynamic weight for DRSECONDARY is 0. So the Dynamic quorum kick's inn...
As the Total Node weight is 4 with file share the cluster in up and running.

DISASTER SCENARIO 2: DRPRIMARY,.DRSECONDARY down





Total weight is 3 with File share so the cluster is working.

DISASTER SCENARIO 3: FOOSECONDARY,.DRPRIMARY,.DRSECONDARY down

Note: I took he 3 node sequentially down.





Note: I have setup a continues ping to SQL CLUSTER NAME FOOMULTISQL to see when 3 nodes goes down.



Note: Did not see a single packet loss...







So to keep cluster we need 3 votes, so Dynamic quorum kicked inn and reconfigured the votes on the fly to keep the cluster working.( 2 Node weight + 1 File share witness)


DISASTER SCENARIO 4: FOOSECONDARY,.DRPRIMARY,.DRSECONDARY down +
FILE SHARE quorum SITE down




During this scenario cluster fails because there is no Dynamic Weight for File share witness therefore dynamic quorum is not calculated, majority of 2 is 2 therefore the cluster goes down..


More information check this very nice article:



Earlier FOR the same scenario without Dynamic weight we need to have 3 Nodes weight all the time, but with new Dynamic Weight configuration we can have either 2 Nodes or 1 Node and 1 File share online and still we can have the cluster up and running.



STEPS TO BRING CLUSTER ONLINE FOR DISASTER SCENARIO 4 :


  1. Investigate why quorum share couldn't be brought online before you perform the next step
  2. Forcing the cluster to start on the last Node online. The cluster will basically use the copy of the cluster configuration and replicate to other noes when it comes online.
    Net start clussvc /fq or start-clusternode -fixqorum


Note: We had dynamic quorum enabled, so when we forced started the quorum, It reconfigured the quorum and brought the cluster online. If no dynamic quorum enabled then the cluster would have started in force cluster mode and any node we add next needs to be added in prevent quorum mode to prevent the remaining nodes from forming a split cluster .
Now we keep adding back the Nodes and file



    1. Lets bring foosecondary online...



As soon as Cluster saw the other node coming online, Cluster service on this server was started with prevent quorum mode and made to join the existing cluster. Now we see a warning that Node and File share Majority is in failed state.

Note: If at this point a Node fails then the whole cluster will go down, because majority of 2 is 2 .

Microsoft doesn't recommend Node majority quorum model for Mufti-site clustering, if configured on 4 Node cluster I found out that the cluster is online even if 3 Nodes fails.


Dynamic Quorum configuration I think is a welcome option in Windows 2012 .


3 comments:

  1. Thanks a lot for info!!! U save my life! :)

    ReplyDelete
  2. Very useful article. Thanks Naveen. I will have to agree with MSFT as node majority is not a suitable quorum model for multi-site clusters. If you have 4 node multi site cluster (2 nodes per site),
    you can sustain 3 node failures but only if they are sequential. If you have a disaster in one of the sites and two of the nodes go down at the same time, dynamic quorum will not be re-calculated cluster
    will be down as you will have just 2 nodes up from 4. For multi-site clusters I prefer to use node and file share majority with 'Cluster managed voting' enabled.

    ReplyDelete
  3. Hi Naveen thanks for wonderfull blog I need have some queries on multisite cluster ...what is the best way to contact you email or phone .. any help is appreciated.

    ReplyDelete