In the Part 1 of the series went over steps on setting up
iscsi SAN HA on starwind software for configuring a 2 Node Multi site cluster.
This part I will go over steps for adding the iscsi SAN HA to Cluster Nodes and
configure Multi-site cluster and will test it… so let’s jump in…
Step-by-Step to
configuring Multi-site Failover cluster:
1. Configuring Primary Site Cluster Node:
Server Name: VM2008c
Ø
Install failover clustering feature.
Ø
Install MPIO feature.
Ø
Go to services and start Microsoft iSCSI
Initiator Service and set startup to be automatic.
Ø
Click, start, ISCSI initiator and under target
put the IP address of the Primary site ISCSI SAN (10.92.76.1) and say quick
connect.
You will see 3 disks, all inactive .Click on Connect and
enable the checkbox “Enable multipath” click ok and again ok.
Ø
Click MPIO under Administrative Tools and click
on the Discover Multi-paths.
You will see option to enable “ADD support for iSCSI device”
, click enable and it will reboot the server. After reboot you will see
something like this when you login to MPIO
2. Configuring Secondary Site Cluster Node:
Server: VM2008d
Please follow the same steps of Node VM2008C server, except that when giving
the IP Address of the iSCSI SAN you would provide the iSCSI SAN details of local site that is 172.168.0.1(Note: this is
the partner server we configured when creating the iSCSI target)
3.
Disk Management and Configuring:
Note:
You need to perform these steps only on a single Cluster Node.
Go to disk Management on any cluster Node, ex I am
performing the steps on the Node VM2008c.
Ø
Right click disk management and say refresh, now
you will see 3 disks, status Unknown
and offline.
Ø
Right click on each disk, say online and
initialize the disk. Format the disk using New Simple Volume and then assign a
drive letters to all 3 disks.
Ø
Now go to disk Management on Secondary site Cluster Node and
enable all the 3 disk which are offline, you will notice that all 3 disk are online with some dummy
driver letters. The important point to notice here is both servers can see
the same disk.
We will configure the drive letters of the disk using Failover Manager
later.
Ex: this shows that both cluster Nodes VM2008C and VM2008D can see the
disks.
Ø
Now go to disk management on both servers and
make the disk offline
Step 4:
Launch Failover
Cluster Manager on any Node server. Click Validate a configuration under
Management tab, which will launch Validate configuration Wizard.
Select the Nodes which are part of cluster and click Next
select the option which says Run only tests i select and click next
Uncheck Storage test and click next. Note: In multi site cluster configuration the storage test will fail, so we are avoiding it.
Please check this KB
Article which goes over it http://support.microsoft.com/kb/943984
If you see something like this that means when we have a errors in validations, so click on View report and fix all the errors. When done re-run the validation Wizard.
After re-run you should see something like this.
Click on” Create the cluster now using the validated nodes” or you can click finish and go back to Failover cluster manager and click on “Create Cluster” under management tab.
I will just click Finish and launch “Create Cluster” from Cluster Manager.
Click Next
Add the cluster Nodes and click next
Click option “No” and click next.
Note: Even though we saying No, this Cluster is fully supported by Microsoft for multi site clustering I think as they are aware of process.
Now you need to provide a Cluster Name and 2 unique IP address for each subnet.
Cluster Name: MULTICLUSTER
IP Address: 10.92.76.21
IP Address: 172.168.0.21
Note: Before you can click next, make sure you are either domain administrator or pre populate the cluster name computer object.
Please refer to this technet article which explains in depth.
In my lab I am logged in as Domain admin so, I will proceed next….
Click next…
Cluster automatically picked the Quorum model as node and
disk majority, we need to change it to Node and file share majority.
Click Finish….
Note: depending on AD
replication, you will notice the cluster name computer object will show up in
one site and not the other site. So just be patient and make sure you see the
cluster computer object in all the sites.
An excellent video
from Cluster MVP Symon Perriman who explains in detail about different quorum
models, DNS replications issues on Multi site clustering. I highly encourage to
check this video before proceeding further.http://technet.microsoft.com/en-us/video/disaster-recovery-cluster-deployment-demo-multi-site-failover-clustering
Step 5:
Changing the Quorum Model:
Note: best practices are to have file share on some common
third site, but in my lab I am creating it on my Primary site.So if my Primary site goes down, then i have to manually force cluster online on other site.
Please check the below link which goes in details on Quorum model.
http://technet.microsoft.com/en-us/library/cc770620%28WS.10%29.aspx
Please check the below link which goes in details on Quorum model.
http://technet.microsoft.com/en-us/library/cc770620%28WS.10%29.aspx
Ø
Create a Folder and give read and write
permission for Cluster Computer Object under File share and Security.
Step 6: Changing the
Quorum model to Node and file share Majority:
Select the Quorum model to Node and File share
Click next
Click next….
Click Finish.
If we run Validate cluster now, you would see a warning on
Quorum Configurations...
The rule of thumb in multi site clustering is if there are
even number of always Node and File share Majority, if odd number of Nodes then
its Node and disk Majority, there is a exception to this we have more than 1
node in the same site. I highly recommend going through these articles which go
over it in more details…
Step 6:
This is the step we have been waiting for, testing the
cluster.
Run Move-cluster
group “cluster group” command from either Powershell or CMD to move the
cluster group or you can just say right click on the node , under more action
say stop cluster service, that would move the cluster group as well, then
bring Node back online by starting the
cluster service.
So I could successfully move the cluster group to other
subnet site.
The question most would ask, even though you see 1 IP
address always is offline, how is the Cluster Service is up and running… because
from 2008R2 SP1 Microsoft added the OR clause into clustering to support
Multi-site cluster in cross subnet .
Step 7: We have a problem.
Let’s see how many IP address are register In DNS for
cluster name.
As you see only 1 IP address which is 172.168.0.21
(secondary site) subnet is been registered in DNS with cluster name
“MULTICLUSTER”. When we failover the cluster, the Cluster Name(MULTICLUSTER)
will update the DNS record on the Site 10.92.76.0 from 172.168.0.21 to
10.92.76.21.At this moment passive node and all the client
computers on that subnet will not be
able to connect to Cluster till DNS gets replicated.
The other problem is Host record
TTL, which by default is 1200sec (20 mints) so the clients computers have to
wait for 20 mints before there Host record entry expires and request for
updated Host record from DNS.
Steps to fix:
We need to run 2 Powershell
commands
1.
The first one Register’s All IP Address of the Cluster Name
in DNS. So when ever client is requesting for Host record, both the IP address
will be handed over to client computers.
Get-clusterresource
“cluster name” | set-clusterparameter RegisterAllProvidersIP 1
After Power shell and
failing over the cluster nodes, let’s check the DNS settings.
We see both IP
Address are register in DNS for Cluster Name (MULTICLUSTER)
2.Second one is to change the Host record TTL, the default is
1200(20mints) and we change to 300 seconds (5 mints).What it means is the
client will request for new Host record every 5 mints. So after a failover,
clients will have to wait 5 mints to connect back.
Get-clusterresource
“cluster name” | set-clusterparameter HostRecordTTL 300
After you run the Powershell commands on all the nodes in
the cluster and failing over Nodes, the host record TTL now shows 300 seconds (5
mints)
Note: Microsoft Technical Evangelist Symon Perriman has an excellent video which does in to details about the 2 step process. I highly recommend
seeing this video as there are couple of other settings like cross subnet delay, etc.. which needs to be looked into before putting the cluster to production.
Additional Step:
Reverse lookup for the Cluster name will fail. So to fix it,
right click on the Cluster Name, go to properties and enable the check box “Publish
PTR records”, apply it and failover the cluster nodes.
This ends Part 2 of the series and in the Part 3 I will go
over Configuring and testing File Server services on Multi-site cluster.
Recommended Articles:
Ø
Cluster Resource Dependency Expressions blog: http://blogs.msdn.com/b/clustering/archive/2008/01/28/7293705.aspx
Ø
The Microsoft Support Policy for Windows Server
2008 or Windows Server 2008 R2 Failover Clusters: http://support.microsoft.com/kb/943984
Ø
What’s New in Failover Clusters for Windows
Server 2008 R2: http://technet.microsoft.com/en-us/library/dd621586(WS.10).aspx
Ø Failover Cluster
Step-by-Step Guide: Configuring the Quorum in a Failover Cluster: http://technet.microsoft.com/en-us/library/cc770620(WS.10).aspx
Ø
Requirements and Recommendations for a
Multi-site Failover Cluster: http://technet.microsoft.com/en-us/library/dd197575(WS.10).aspx
Ø
The Microsoft Support Policy for Windows Server
2008 or Windows Server 2008 R2 Failover Clusters: http://support.microsoft.com/kb/943984
No comments:
Post a Comment