On December 18, 2012, Eucalyptus v3.2 was released. One of the main focuses on this release was to harden the High Availability design of Eucalyptus. I recently have been looking at additional configuration features of DRBD – which is used by Eucalyptus Walrus HA – that can be used in the Enterprise to help add robustness and more efficiency in disaster recovery efforts. *NOTE* This blog entry’s main focus is DRBD, which is separate from Eucalyptus. The goal is to shed light to the additional configuration options that helps DRBD be the robust and reliable product that it is, AND to show how Eucalyptus works with various open source products.
The Baseline
Before getting into the resource configuration options with DRBD, lets talk about the disk setup that was used. I recommend using LVM for the backing device used with DRBD. The features that LVM provides allows a cloud admin to add in additional backup measures (e.g. LVM snapshots), and recover from outages more efficiently – minimizing end-user perceived outages. For more information regarding what LVM is, and all its features, CentOS/RHEL provide great documentation around this application. Once you feel comfortable with LVM, check out DRBD’s LVM Primer to see how you can leverage LVM with DRBD. For this blog entry, the LVM/DRBD setup implemented was using a Logical Volume as a DRBD backing device.
Additional Resource Configuration Options in DRBD
The additional resource configuration options cover the following areas:
- Replication traffic integrity checking
- Efficient Synchronization
- Using automated LVM snapshots during DRBD synchronization
All of these options, except the last, is covered in the DRBD 8.3 User Guide. Although the scripts associated with the automated LVM snapshots are mentioned in the DRBD 8.4 User Guide, the scripts are available and can be used in DRBD 8.3.x. *NOTE* When enabling these options, make sure that Eucalyptus Walrus is stopped. Also, makes sure the configuration options are done on both nodes. After the configurations have been done, just run drbdadm adjust [resource name]
on both nodes for them to take effect. Typically, I like to test out the configuration changes, then test failover of the DRBD nodes, before starting Eucalyptus Walrus.
Traffic Integrity Checking
Making sure that all the data replicated between the DRBD is very important. DRBD has the resource configuration option to use cryptographic message digest algorithms such as MD5, SHA-1 or CRC-32C for end-to-end message integrity checking. If verification fails for the replicated block against the digest, the peer requests retransmission.
To enable this option for SHA-1 integrity checking, add the following entry to the resource configuration file:
..... net { ...... ...... data-integrity-alg sha1; } ......
For more information regarding this resource configuration option, please refer to the section “Configuring replication traffic integrity checking” in the DRBD 8.3 User Guide.
Efficient Synchronization
DRBD offers checksum-based synchronization to help with making syncing between the DRBD nodes more efficient. As mentioned in the section “Efficient Synchronization” in the DRBD 8.3 User Guide:
When using checksum-based synchronization, then rather than performing a brute-force overwrite of blocks marked out of sync, DRBD reads blocks before synchronizing them and computes a hash of the contents currently found on disk. It then compares this hash with one computed from the same sector on the peer, and omits re-writing this block if the hashes match. This can dramatically cut down synchronization times in situation where a filesystem re-writes a sector with identical contents while DRBD is in disconnected mode.
To enable this configuration option, add the following to the resource configuration file:
........ syncer { ........ csums-alg sha1; } .........
To learn more about this option, please refer to the “Configuring checksum-based synchronization” in the DRBD 8.3 User Guide.
Automated LVM Snapshots During DRBD Synchronization
When doing DRBD synchronization between nodes, there is chance that if the SyncSource
fails, the result will be a node, with good data, being dead, and a surviving node with bad data. When serving DRBD off an LVM Logical Volume, you can mitigate this problem by creating an automated snapshot when synchronization starts, and automatically removing that same snapshot once synchronization has completed successfully.
There are a couple of things to keep in mind when configuring this option:
- Make sure the volume group has enough space on each node to handle the LVM snapshot
- You should review dangling snapshots as soon as possible. A full snapshot causes both the snapshot itself and its origin volume to fail.
To enable this configuration option, do the following to the resource configuration file:
....... handlers { before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh"; after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-lvm.sh"; } .........
To learn more about this option, please refer to the section “Using automated LVM snapshots during DRBD synchronization” in the DRBD User Guide.
Conclusion
After enabling these options, the Walrus DRBD resource configuration file will look similar to the following:
resource r0 { on viking-01.eucalyptus-systems.com { device /dev/drbd1; disk /dev/vg02/lv_srv; address 192.168.39.101:7789; meta-disk internal; } on viking-02.eucalyptus-systems.com { device /dev/drbd1; disk /dev/vg02/lv_srv; address 192.168.39.102:7789; meta-disk internal; } syncer { rate 40M; csums-alg sha1; } net { after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; data-integrity-alg sha1; } handlers { before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh"; after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-lvm.sh"; } }
As mentioned earlier, after making these changes – and making sure both DRBD resource files look the same – just run drbdadm adjust [resource name]
.
Enjoy!