Advanced Configuration of DRBD: Eucalyptus 3.2 Walrus High Availability

On December 18, 2012, Eucalyptus  v3.2 was released.  One of the main focuses on this release was to harden the High Availability design of Eucalyptus.   I recently have been looking at additional configuration features of DRBD – which is used by Eucalyptus Walrus HA – that can be used in the Enterprise to help add robustness and more efficiency in disaster recovery efforts.  *NOTE* This blog entry’s main focus is DRBD, which is separate from Eucalyptus.  The goal is to shed light to the additional configuration options that helps DRBD be the robust and reliable product that it is, AND to show how Eucalyptus works with various open source products.

The Baseline

Before getting into the resource configuration options with DRBD, lets talk about the disk setup that was used.  I recommend using LVM for the backing device used with DRBD.  The features that LVM provides allows a cloud admin to add in additional backup measures (e.g. LVM snapshots), and recover from outages more efficiently – minimizing end-user perceived outages.  For more information regarding what LVM is, and all its features, CentOS/RHEL provide great documentation around this application.  Once you feel comfortable with LVM, check out DRBD’s LVM Primer to see how you can leverage LVM with DRBD.  For this blog entry,  the LVM/DRBD setup implemented was using a Logical Volume as a DRBD backing device.

Additional Resource Configuration Options in DRBD

The additional resource configuration options cover the following areas:

All of these options, except the last, is covered in the DRBD 8.3 User Guide.  Although the scripts associated with the automated LVM snapshots are mentioned in the DRBD 8.4 User Guide, the scripts are available and can be used in DRBD 8.3.x. *NOTE* When enabling these options, make sure that Eucalyptus Walrus is stopped.  Also, makes sure the configuration options are done on both nodes.  After the configurations have been done, just run  drbdadm adjust [resource name]  on both nodes for them to take effect.  Typically, I like to test out the configuration changes, then test failover of the DRBD nodes, before starting Eucalyptus Walrus. 

Traffic Integrity Checking

Making sure that all the data replicated between the DRBD is very important.   DRBD has the resource configuration option to use cryptographic message digest algorithms such as MD5, SHA-1 or CRC-32C for end-to-end message integrity checking.  If verification fails for the replicated block against the digest, the peer requests retransmission.

To enable this option for SHA-1 integrity checking, add the following entry to the resource configuration file:

.....
net {
......
......
 data-integrity-alg sha1;
 }
......

For more information regarding this resource configuration option, please refer to the section “Configuring replication traffic integrity checking” in the DRBD 8.3 User Guide.

Efficient Synchronization

DRBD offers checksum-based synchronization to help with making syncing between the DRBD nodes more efficient.  As mentioned in the section “Efficient Synchronization” in the DRBD 8.3 User Guide:

When using checksum-based synchronization, then rather than performing a brute-force overwrite of blocks marked out of sync, DRBD reads blocks before synchronizing them and computes a hash of the contents currently found on disk.  It then compares this hash with one computed from the same sector on the peer, and omits re-writing this block if the hashes match. This can dramatically cut down synchronization times in situation where a filesystem re-writes a sector with identical contents while DRBD is in disconnected mode.

To enable this configuration option, add the following to the resource configuration file:

........
syncer {
........
 csums-alg sha1;
 }
.........

To learn more about this option, please refer to the “Configuring checksum-based synchronization”  in the DRBD 8.3 User Guide.

Automated LVM Snapshots During DRBD Synchronization

When doing DRBD synchronization between nodes, there is chance that if the SyncSourcefails, the result will be a node, with good data, being dead, and a surviving node with bad data.   When serving DRBD off an LVM Logical Volume, you can mitigate this problem by creating an automated snapshot when synchronization starts, and automatically removing that same snapshot once synchronization has completed successfully.

There are a couple of things to keep in mind when configuring this option:

  • Make sure the volume group has enough space on each node to handle the LVM snapshot
  • You should review dangling snapshots as soon as possible. A full snapshot causes both the snapshot itself and its origin volume to fail.

To enable this configuration option, do the following to the resource configuration file:

.......
handlers {
 before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh";
 after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-lvm.sh";
 }
.........

To learn more about this option, please refer to the section “Using automated LVM snapshots during DRBD synchronization” in the DRBD User Guide.

Conclusion

After enabling these options, the Walrus DRBD resource configuration file will look similar to the following:

resource r0 {

 on viking-01.eucalyptus-systems.com {
 device /dev/drbd1;
 disk /dev/vg02/lv_srv;
 address 192.168.39.101:7789;
 meta-disk internal;
 }
on viking-02.eucalyptus-systems.com {
 device /dev/drbd1;
 disk /dev/vg02/lv_srv;
 address 192.168.39.102:7789;
 meta-disk internal;
 }
syncer {
 rate 40M;
 csums-alg sha1;
 }
net {
 after-sb-0pri discard-zero-changes;
 after-sb-1pri discard-secondary;
 data-integrity-alg sha1;
 }
handlers {
 before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh";
 after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-lvm.sh";
 }
}

As mentioned earlier, after making these changes – and making sure both DRBD resource files look the same – just run drbdadm adjust [resource name].

Enjoy!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s