On December 18, 2012, Eucalyptus v3.2 was released. One of the main focuses on this release was to harden the High Availability design of Eucalyptus. I recently have been looking at additional configuration features of DRBD – which is used by Eucalyptus Walrus HA – that can be used in the Enterprise to help add robustness and more efficiency in disaster recovery efforts. *NOTE*This blog entry’s main focus is DRBD, which is separate from Eucalyptus. The goal is to shed light to the additional configuration options that helps DRBD be the robust and reliable product that it is, AND to show how Eucalyptus works with various open source products.
The Baseline
Before getting into the resource configuration options with DRBD, lets talk about the disk setup that was used. I recommend using LVM for the backing device used with DRBD. The features that LVM provides allows a cloud admin to add in additional backup measures (e.g. LVM snapshots), and recover from outages more efficiently – minimizing end-user perceived outages. For more information regarding what LVM is, and all its features, CentOS/RHEL provide great documentation around this application. Once you feel comfortable with LVM, check out DRBD’s LVM Primer to see how you can leverage LVM with DRBD. For this blog entry, the LVM/DRBD setup implemented was using a Logical Volume as a DRBD backing device.
Additional Resource Configuration Options in DRBD
The additional resource configuration options cover the following areas:
All of these options, except the last, is covered in the DRBD 8.3 User Guide. Although the scripts associated with the automated LVM snapshots are mentioned in the DRBD 8.4 User Guide, the scripts are available and can be used in DRBD 8.3.x. *NOTE*When enabling these options, make sure that Eucalyptus Walrus is stopped. Also, makes sure the configuration options are done on both nodes. After the configurations have been done, just run drbdadm adjust [resource name] on both nodes for them to take effect. Typically, I like to test out the configuration changes, then test failover of the DRBD nodes, before starting Eucalyptus Walrus.
Traffic Integrity Checking
Making sure that all the data replicated between the DRBD is very important. DRBD has the resource configuration option to use cryptographic message digest algorithms such as MD5, SHA-1 or CRC-32C for end-to-end message integrity checking. If verification fails for the replicated block against the digest, the peer requests retransmission.
To enable this option for SHA-1 integrity checking, add the following entry to the resource configuration file:
.....
net {
......
......
data-integrity-alg sha1;
}
......
DRBD offers checksum-based synchronization to help with making syncing between the DRBD nodes more efficient. As mentioned in the section “Efficient Synchronization” in the DRBD 8.3 User Guide:
When using checksum-based synchronization, then rather than performing a brute-force overwrite of blocks marked out of sync, DRBD reads blocks before synchronizing them and computes a hash of the contents currently found on disk. It then compares this hash with one computed from the same sector on the peer, and omits re-writing this block if the hashes match. This can dramatically cut down synchronization times in situation where a filesystem re-writes a sector with identical contents while DRBD is in disconnected mode.
To enable this configuration option, add the following to the resource configuration file:
Automated LVM Snapshots During DRBD Synchronization
When doing DRBD synchronization between nodes, there is chance that if the SyncSourcefails, the result will be a node, with good data, being dead, and a surviving node with bad data. When serving DRBD off an LVM Logical Volume, you can mitigate this problem by creating an automated snapshot when synchronization starts, and automatically removing that same snapshot once synchronization has completed successfully.
There are a couple of things to keep in mind when configuring this option:
Make sure the volume group has enough space on each node to handle the LVM snapshot
You should review dangling snapshots as soon as possible. A full snapshot causes both the snapshot itself and its origin volume to fail.
To enable this configuration option, do the following to the resource configuration file:
High availability for applications and physical machines is key to having services “appear” to never be down. With cloud computing, deploying failure resilient applications is needed for services that need to be always available.
The purpose of this blog is to provide more of the technical information for HA Open iSCSI that good friend and colleague, Lester, mentioned in his blog. Our goal was to setup HA with Open iSCSI without access to a SAN. To accomplish this, we used Pacemaker, Open iSCSI, and DRBD. The great guys from Linbit provided us with the documentation to deploy this environment.
Resource Group: rg_clustervol
p_lvm_clustervol (ocf::heartbeat:LVM): Started viking-07.eucalyptus-systems.com
p_target_clustervol (ocf::heartbeat:iSCSITarget): Started viking-07.eucalyptus-systems.com
p_lu_clustervol_lun1 (ocf::heartbeat:iSCSILogicalUnit): Started viking-07.eucalyptus-systems.com
p_ip_clustervolip (ocf::heartbeat:IPaddr2): Started viking-07.eucalyptus-systems.com
Master/Slave Set: ms_drbd_clustervol
Masters: [ viking-07.eucalyptus-systems.com ]
Slaves: [ viking-08.eucalyptus-systems.com ]
Installation
The installation instructions are pretty straight forward. The instructions will be the same for both machines – unless otherwise noted. These instructions assume that CentOS 5.7 has already been installed. If CentOS 5.7 is not already installed, please go to the CentOS 5 Documentation for installation instructions.
Both Nodes
Installing SCSI Target Framework
The SCSI Target Framework (tgt) is needed for the iSCSI servers we will use in the cluster setup. To install, run the following command:
yum install scsi-target-utils
Once you have done this, make sure that the tgtd service is part of the system startup:
/sbin/chkconfig tgtd on
Installing Pacemaker cluster manager
Pacemaker is an open source, high availability resource manager. The packages for the Pacemaker project are provided by clusterlabs.org. To install Pacemaker, do the following:
Download the clusterlabs.repo file with wget or curl to the /etc/yum.repos.d directory:
Make sure pacemaker is part of the system startup:
/sbin/chkconfig corosync on
Install DRBD
DRBD provides us the storage backend for the cluster. It mirrors the data written to the disk to the peer node. For more information about what DRBD does, refer to the Mirroring section on the DRBD site. To install DRBD, run the following command:
yum install drbd83 kmod-drbd83
Configuration
Configure DRBD resource
In order to configure DRBD, we need to create and edit a resource file clustervol.res under /etc/drbd.d on both nodes. One thing to note here, we used a separate device (/dev/sdd2) that uses LVM to be utilized by DRBD (for syncing the content served up by tgtd). We did this to make it easier for recovery of disks in case of failure.
Both Nodes
First use an editor (such as VI) to open the file clustervol.res under /etc/drbd.d:
vi /etc/drbd.d/clustervol.res
Edit the file accordingly to match the environment. Our resource file looks like the following:
on viking-07.eucalyptus-systems.com {
address 192.168.39.107:7790;
}
on viking-08.eucalyptus-systems.com {
address 192.168.39.108:7790;
}
}
The main parts of the configuration file are as follows:
resource – refers to the resource managed by DRBD
disk – refers to the device that DRBD will use
address – IP Address/port that DRBD will use
For performance gains for disk syncing and failover responsiveness, DRBD can be configured to match those needs. For more information on configuring DRBD, please refer to the sections Configuring DRBD and Optimizing DRBD performance of the DRBD 8.3 User’s Guide. This document is a *must have* as a reference source. I suggest reading it and trying out different configurations before putting any service using DRBD in a production environment.
LVM Configuration
There are a few ways that LVM can be utilized with DRBD. They are as follows:
For our setup, we configured a DRBD resource as a Physical Volume, as as described in the documentation provided by Linbit. For more information concerning using LVM with DRBD, please refer to section entitled Using LVM with DRBD in the DRBD 8.3 User’s Guide.
We need to make sure to instruct LVM to read the Physical Volume signatures from the DRBD devices only.
Both Nodes
Configure LVM to look at Physical Volume signatures from DRBD devices only by editing the /etc/lvm/lvm.conf file:
filter = [ "a|/dev/drbd.*|", "r|.*|" ]
Disable LVM cache (in /etc/lvm/lvm.conf):
write_cache_state = 0
After disabling the LVM cache, make sure to remove any stale cache entries by deleting the /etc/lvm/cache/.cache
After this is done on both nodes, we need to create an LVM Volume Group by initializing the DRBD resource as an LVM Physical Volume. In order to do so, creation of the metadata for the resource is needed. Our resource name is clustervol.
Both Nodes
Run the following command:
# drbdadm create-md clustervol
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
Next, put the resource up:
# drbdadm up clustervol
Primary Node
Now we need to do the initial sync between the nodes. This needs to be done on the primary node. For us, this is viking-07:
Finally, we need to add a logical volume to represent the iSCSI Logical Unit (LUs). There can be multiple LUs, but in our setup, we created one 10 Gig logical volume for testing purposes. We created the LV with the following command:
Pacemaker is a cluster resource manager which handles resource level failover. Corosync is the messaging layer which handlers node membership in the cluster and node failure at the infrastructure level.
To configure Pacemaker, do the following:
Primary Node
Generate corosync key:
[root@viking-07 ~]# corosync-keygen
chmod authkey to read-only by root, then copy authkey file to the other node:
Now we are ready to configure the Active/Passive iSCSI cluster. The following cluster resources are needed for an active/passive iSCSI Target:
A DRBD resource to replicate data. This is controlled by the cluster manager by switching between the Primary and Secondary roles.
An LVM Volume Group, which will be available on whichever node currently holds the DRBD resource in Primary Role
A virtual, floating IP for the cluster. This will allow initiators to connect to the target no matter which physical node it is running on
iSCSI Target
At least one iSCSI LUs that corresponds to a Logical Volume in the LVM Volume Group
In our setup, the Pacemaker configuration has 192.168.44.30 as the virtual IP address to use the target with iSCSI Qualified Name (IQN)iqn.1994-05.com.redhat:cfd95480cf87.clustervol. (An important note here is to make sure both nodes have the same initiatorname. The initiatorname for this configuration is iqn.1994-05.com.redhat:cfd95480cf87. This information is in the /etc/iscsi/initiatorname.iscsi file.)
The target contains the Logical Unit with LUN1, mapping to the Logical Volume named lun1.
To begin with the configuration of the resource, open the crm shell as root, and issue the following commands on the Primary node (i.e. viking-07):
crm(live)# configure
crm(live)configure# primitive p_drbd_clustervol \
ocf:linbit:drbd \
params drbd_resource="clustervol" \
op monitor interval="29" role="Master" \
op monitor interval="31" role="Slave"
crm(live)configure# ms ms_drbd_clustervol p_drbd_clustervol \
meta master-max="1" master-node-max="1" clone-max="2" \
clone-node-max="1" notify="true"
Create a master/slave resource mapping to the DRBD resource clustervol:
To bring it all together, we need to create a resource group from the resource associated with our iSCSI target:
crm(live)configure# group rg_clustervol \
p_lvm_clustervol \
p_target_clustervol p_lu_clustervol_lun1 p_ip_clustervolip
The Pacemaker default for the resource group is ordered and co-located. This means resources contained in the resource group will always run on the same physical machine, will be started in the same order as specified, and stopped in reverse order.
To wrap things up, make sure that the resource group is started on the node where DRBD is in the Primary role:
We have now finished our configuration. All that is left to do is for it to be activated. To do so, run the issue the following command in the crm shell:
Resource Group: rg_clustervol
p_lvm_clustervol (ocf::heartbeat:LVM): Started viking-07.eucalyptus-systems.com
p_target_clustervol (ocf::heartbeat:iSCSITarget): Started viking-07.eucalyptus-systems.com
p_lu_clustervol_lun1 (ocf::heartbeat:iSCSILogicalUnit): Started viking-07.eucalyptus-systems.com
p_ip_clustervolip (ocf::heartbeat:IPaddr2): Started viking-07.eucalyptus-systems.com
Master/Slave Set: ms_drbd_clustervol
Masters: [ viking-07.eucalyptus-systems.com ]
Slaves: [ viking-08.eucalyptus-systems.com ]
A more complex test is to do an md5sum of a file (e.g. an ISO) and copy it from a desktop/laptop to the machine that has the iSCSI targeted mounted (e.g. viking-09). While the copy is happening, you can use the crm CLI to failover back and forth. You will see there is no delay. You can monitor the status of the cluster by using crm_mon. After the copy is complete, do an md5sum of the ISO on the machine to where it was copied. The md5sums should match.
The next phase of this blog will be to script as much of this install and configuration as possible (e.g. using Puppet or Chef). Stayed tuned to more information about this. Hope you enjoyed this blog. Let me know if you have questions, suggestions, and/or comments.