Monday 13 August 2012

Part 2- 2 NODE MULTI-SITE CLUSTER,WINDOWS 2008R2SP1


In the Part 1 of the series went over steps on setting up iscsi SAN HA on starwind software for configuring a 2 Node Multi site cluster. This part I will go over steps for adding the iscsi SAN HA to Cluster Nodes and configure Multi-site cluster and will test it… so let’s jump in…


Step-by-Step to configuring Multi-site Failover cluster:

1.       Configuring Primary Site Cluster Node:
Server Name: VM2008c
Ø  Install failover clustering feature.
Ø  Install MPIO feature.
Ø  Go to services and start Microsoft iSCSI Initiator Service and set startup to be automatic.
Ø  Click, start, ISCSI initiator and under target put the IP address of the Primary site ISCSI SAN (10.92.76.1) and say quick connect.


You will see 3 disks, all inactive .Click on Connect and enable the checkbox “Enable multipath” click ok and again ok.
Ø  Click MPIO under Administrative Tools and click on the Discover Multi-paths.

You will see option to enable “ADD support for iSCSI device” , click enable and it will reboot the server. After reboot you will see something like this when you login to MPIO



2.       Configuring Secondary Site Cluster Node:
Server: VM2008d
Please follow the same steps of  Node VM2008C server, except that when giving the IP Address of the iSCSI SAN you would provide the iSCSI SAN details of local site  that is 172.168.0.1(Note: this is the partner server we configured when creating the iSCSI target)


3.       Disk Management and Configuring:
Note: You need to perform these steps only on a single Cluster Node.
Go to disk Management on any cluster Node, ex I am performing the steps on the Node VM2008c.
Ø  Right click disk management and say refresh, now you will see 3 disks, status Unknown and offline.



Ø  Right click on each disk, say online and initialize the disk. Format the disk using New Simple Volume and then assign a drive letters to all 3 disks.
Ø  Now go to disk Management on Secondary site Cluster Node and enable all the 3 disk which are offline, you will notice that all 3 disk are online with some dummy driver letters. The important point to notice here is both servers can see the same disk.
       We will configure the drive letters of the disk using Failover Manager later.


Ex: this shows that both cluster Nodes VM2008C and VM2008D can see the disks.

Ø  Now go to disk management on both servers and make the disk offline




Step 4:
Launch Failover Cluster Manager on any Node server. Click Validate a configuration under Management tab, which will launch Validate configuration Wizard.

Select the Nodes which are part of cluster and click Next


select the option which says Run only tests i select and click next



Uncheck Storage test and click next.  Note: In multi site cluster configuration the storage test will fail, so we are avoiding it.

Please check this KB Article which goes over it  http://support.microsoft.com/kb/943984


If you see something like this that means when we have a errors in validations, so click on View report and fix all the errors. When done re-run the validation Wizard.


After re-run you should see something like this.
Click on” Create the cluster now using the validated nodes” or you can click finish and go back to Failover cluster manager and click on “Create Cluster” under management tab.
I will just click Finish and launch “Create Cluster” from Cluster Manager.



Click Next


Add the cluster Nodes and click next 


Click option “No” and click next.
Note: Even though we saying No, this Cluster is fully supported by Microsoft for multi site clustering I think as they are aware of process.



Now you need to provide a Cluster Name and 2 unique IP address for each subnet.
Cluster Name: MULTICLUSTER
IP Address: 10.92.76.21
IP Address: 172.168.0.21
Note: Before you can click next, make sure you are either domain administrator or pre populate the cluster name computer object.
Please refer to this technet article which explains in depth.
In my lab I am logged in as Domain admin so, I will proceed next….



Click next…



Cluster automatically picked the Quorum model as node and disk majority, we need to change it to Node and file share majority.
Click Finish….

Note: depending on AD replication, you will notice the cluster name computer object will show up in one site and not the other site. So just be patient and make sure you see the cluster computer object in all the sites.
An excellent video from Cluster MVP Symon Perriman who explains in detail about different quorum models, DNS replications issues on Multi site clustering. I highly encourage to check this video before proceeding further.http://technet.microsoft.com/en-us/video/disaster-recovery-cluster-deployment-demo-multi-site-failover-clustering

Step 5:
Changing the Quorum Model:
Note: best practices are to have file share on some common third site, but in my lab I am creating it on my Primary site.So if my Primary site goes down, then i have to manually force cluster online on other site.
Please check the below link which goes in details on Quorum model.
http://technet.microsoft.com/en-us/library/cc770620%28WS.10%29.aspx

Ø  Create a Folder and give read and write permission for Cluster Computer Object under File share and Security.

Step 6:  Changing the Quorum model to Node and file share Majority:



Select the Quorum model to Node and File share


   Click next


Click next….


Click Finish.

If we run Validate cluster now, you would see a warning on Quorum Configurations...
The rule of thumb in multi site clustering is if there are even number of always Node and File share Majority, if odd number of Nodes then its Node and disk Majority, there is a exception to this we have more than 1 node in the same site. I highly recommend going through these articles which go over it in more details…



Step 6:
This is the step we have been waiting for, testing the cluster.


Run Move-cluster group “cluster group” command from either Powershell or CMD to move the cluster group or you can just say right click on the node , under more action say stop cluster service, that would move the cluster group as well, then bring  Node back online by starting the cluster service.


So I could successfully move the cluster group to other subnet site.
The question most would ask, even though you see 1 IP address always is offline, how is the Cluster Service is up and running… because from 2008R2 SP1 Microsoft added the OR clause into clustering to support Multi-site cluster in cross subnet .




Step 7:  We have a problem.
Let’s see how many IP address are register In DNS for cluster name.




As you see only 1 IP address which is 172.168.0.21 (secondary site) subnet is been registered in DNS with cluster name “MULTICLUSTER”. When we failover the cluster, the Cluster Name(MULTICLUSTER) will update the DNS record on the Site 10.92.76.0 from 172.168.0.21 to 10.92.76.21.At this moment passive node and all the client computers on that subnet   will not be able to connect to Cluster  till  DNS gets replicated.
The other problem is Host record TTL, which by default is 1200sec (20 mints) so the clients computers have to wait for 20 mints before there Host record entry expires and request for updated Host record from DNS.



Steps to fix:
 We need to run 2 Powershell commands
1.
The first one Register’s All IP Address of the Cluster Name in DNS. So when ever client is requesting for Host record, both the IP address will be handed over to client computers.
Get-clusterresource “cluster name” | set-clusterparameter RegisterAllProvidersIP 1
After Power shell and failing over the cluster nodes, let’s check the DNS settings.

We see both IP Address are register in DNS for Cluster Name (MULTICLUSTER)


2.Second one is to change the Host record TTL, the default is 1200(20mints) and we change to 300 seconds (5 mints).What it means is the client will request for new Host record every 5 mints. So after a failover, clients will have to wait 5 mints to connect back.

Get-clusterresource “cluster name” | set-clusterparameter HostRecordTTL 300
After you run the Powershell commands on all the nodes in the cluster and failing over Nodes, the host record TTL now shows 300 seconds (5 mints)




Note: Microsoft Technical Evangelist Symon Perriman has an excellent video which does in to details about the 2 step process. I highly recommend seeing this video as there are couple of other settings like cross subnet delay, etc.. which needs to be looked into before putting the cluster to production.


Additional Step:
Reverse lookup for the Cluster name will fail. So to fix it, right click on the Cluster Name, go to properties and enable the check box “Publish PTR records”, apply it and failover the cluster nodes.


This ends Part 2 of the series and in the Part 3 I will go over Configuring and testing File Server services on Multi-site cluster.




Recommended Articles:


Ø  Cluster Resource Dependency Expressions blog: http://blogs.msdn.com/b/clustering/archive/2008/01/28/7293705.aspx
Ø  The Microsoft Support Policy for Windows Server 2008 or Windows Server 2008 R2 Failover Clusters: http://support.microsoft.com/kb/943984
Ø  What’s New in Failover Clusters for Windows Server 2008 R2: http://technet.microsoft.com/en-us/library/dd621586(WS.10).aspx
Ø  Failover Cluster Step-by-Step Guide: Configuring the Quorum in a Failover Cluster: http://technet.microsoft.com/en-us/library/cc770620(WS.10).aspx
Ø  Requirements and Recommendations for a Multi-site Failover Cluster: http://technet.microsoft.com/en-us/library/dd197575(WS.10).aspx
Ø  The Microsoft Support Policy for Windows Server 2008 or Windows Server 2008 R2 Failover Clusters: http://support.microsoft.com/kb/943984








No comments:

Post a Comment