In Part 3 of this 4- Node Multi-Site
clustering series I will go over various Disaster recovery scenarios
Powershell cmdlet to verify NODE
WEIGHT:
CLUSTER TO BE ONLINE : 3
OUT OF 5 NODE WEIGHT.
COMBINED NODE WEIGHT = 5
(fooprimary+foosecondary+drprimary+drsecondary+quorum)
That being said we can lose 2 Nodes at
a time before cluster goes down.
MULTI-SITE SCENARIO:
- You want cluster to be online on DR Site only when all the Server in Primary site is down.
This is possible by settings the
Preferred owner option in properties of the cluster name
so by settings this preferred owners
option what would happen is as along we have any 1 server online of
the required 3 Node weight from Primary site, SQL resources will be
residing only on either FOOPRIMARY OR FOOSECONDARY NODE.
DRPRIMARY/DRSECONDARY NODE will kick
inn only when all the Nodes in primary site is down.
- Fail-back option :
Let assume a scenario if both Nodes on
Primary Site fails and resources are in Secondary site.
Now if a Node in Primary site comes
online, how you want the resources to be handled.
Option :
- Prevent fail-back: This settings requires manually intervention for moving the resources to Primary site
- All fail-back: Either immediately or schedule a time. If Primary Site comes online, what you want to do , immediately move the resource to Primary Site or schedule at a Particular time.
The recommended option is “Prevent fail-back in MULTI-SITE , because after the Primary site is online
you want to test make sure site is all stable before moving the
resources.
DISASTOR
SCENARIO CLUSTER ONLINE
DRSECONDARY
NODE DOWN:
Cluster
: Total Node weight is 4 so cluster will be online.
Cluster:
Total Node weight is 2 so the Cluster is down.
Now
to bring the cluster online we need to force the cluster without
quorum and then add the other nodes in prevent quorum option and
recreate quorum.
This
where if you have DYNAMIC QUORUM enabled will reconfigure the quorum
on the fly and will keep the cluster working even if only 1 Node is
online.
HOW
TO ENABLE: Please check my Part 1 of this series in the section
where I configure quorum.
DYNAMIC
QUORUM
Note:
http://technet.microsoft.com/en-us/library/jj612870.aspx
, highly recommend going through the link which explains in deep.
So
Dynamic Quorum will work only
- The cluster should have achieved the quorum meaning there should already be a quorum configured before a Node goes down.
- Nodes should fail sequentially. If couple of Nodes fail simultaneously , then Dynamic quorum will not recalculate the vote, instead Nodes will regroup with remain Nodes and re-asses if Quorum can be configured and then dynamic quorum will kick inn for any more Node failure.
Dynamic
Quorum is enabled by default. In short what is does is on the fly it
recalculates the online Nodes and configure the quorum accordingly
to keep the cluster online.
Let
see the DYNAMIC QUORUM in action
DISASTER
SCENARIO 1: DRSECONDARY down
Dynamic
weight for DRSECONDARY is 0. So the Dynamic quorum kick's inn...
As
the Total Node weight is 4 with file share the cluster in up and
running.
DISASTER
SCENARIO 2: DRPRIMARY,.DRSECONDARY down
Total
weight is 3 with File share so the cluster is working.
DISASTER
SCENARIO 3: FOOSECONDARY,.DRPRIMARY,.DRSECONDARY down
Note:
I took he 3 node sequentially down.
Note:
I have setup a continues ping to SQL CLUSTER NAME FOOMULTISQL to see
when 3 nodes goes down.
Note:
Did not see a single packet loss...
So
to keep cluster we need 3 votes, so Dynamic
quorum kicked inn and reconfigured the votes on the fly to keep the
cluster working.( 2 Node weight + 1 File share witness)
DISASTER
SCENARIO 4: FOOSECONDARY,.DRPRIMARY,.DRSECONDARY down +
FILE
SHARE quorum SITE down
During
this scenario cluster fails because there is no Dynamic Weight for
File share witness therefore dynamic quorum is not calculated,
majority of 2 is 2 therefore the cluster goes down..
More
information check this very nice article:
Earlier FOR the same scenario without Dynamic weight we need to have 3 Nodes
weight all the time, but with new Dynamic Weight configuration we can
have either 2 Nodes or 1 Node and 1 File share online and still we
can have the cluster up and running.
STEPS
TO BRING CLUSTER ONLINE FOR DISASTER SCENARIO 4 :
- Investigate why quorum share couldn't be brought online before you perform the next step
- Forcing the cluster to start on the last Node online. The cluster will basically use the copy of the cluster configuration and replicate to other noes when it comes online.Net start clussvc /fq or start-clusternode -fixqorum
Note:
We
had dynamic quorum enabled, so when we forced started the quorum, It
reconfigured the quorum and brought the cluster online. If no dynamic
quorum enabled then the cluster would have started in force cluster
mode and any node we add next needs to be added in prevent quorum
mode to prevent the remaining nodes from forming a split cluster .
Now
we keep adding back the Nodes and file
- Lets bring foosecondary online...
As
soon as Cluster saw the other node coming online, Cluster service on this server was started
with prevent quorum mode and made to join the existing cluster. Now we see a
warning that Node and File share Majority is in failed state.
Note:
If at this point a Node fails then the whole cluster will go down,
because majority of 2 is 2 .
Microsoft doesn't recommend Node majority quorum model for Mufti-site
clustering, if configured on 4 Node cluster I found out that the
cluster is online even if 3 Nodes fails.
Dynamic
Quorum configuration I think is a welcome option in Windows 2012 .