Another Great Example of AWS Fidelity – Neo4j, Cloud-Init and Eucalyptus

I recently ran across a blog entry entitled Neo4j 1.9.M01 – Self-managed HA.  I found the concept of graph databases storing data really interesting and reached out to the guys at Neo4j to get some insight on how to deploy their HA solution on Eucalyptus.   Amongst the resources that they provided,  they shared this little gem – how to deploy Neo4j on EC2.  In order to run first, you need to know how to walk – so before going down the path of standing up HA Neo4j, I decided to be influenced by the DIY on EC2 article provided by Neo4j and deploy Neo4j on Eucalyptus  – with a little help from Cloud-Init.  The follow-up blog will show how to use the same setup, and deploy an HA Neo4j environment.

The Setup

Eucalyptus

The Eucalyptus cloud I used is configured using Eucalyptus High-Availability.  Its running on CentOS 6.3, running KVM.  Its also running in Managed networking mode, so that we can take advantage of network isolation of the VMs, and the use of security groups  – interacting very much in the same way as its done in the security groups provided in AWS EC2.

Ubuntu Cloud Image – 12.04 LTS Precise Pangolin

The image that we will use is the Ubuntu 12.04 LTS Cloud image.  The reasons for using this image is as follows:

  • Ubuntu cloud images come pre-packaged with cloud-init, which helps with bootstrapping the instance.
  • I wanted to have the solution work on AWS EC2 and Eucalyptus; since Ubuntu cloud images work on both, its a great choice.

Registering the Ubuntu Cloud Image with Eucalyptus

In order for us to get started, we need to get the Ubuntu Cloud image into Eucalyptus so that we can use it for our instance.  To upload, bundle and register the Ubuntu Cloud image, ramdisk and kernel, do the following:

  1. Download current version of  Ubuntu Precise Server AMD64 from the Ubuntu Cloud Image – Precise page, then unpack (ungzip, unarchive) the tar-gzipped file.  

    $ tar -zxvf precise-server-cloudimg-amd64.tar.gz
    x precise-server-cloudimg-amd64.img
    x precise-server-cloudimg-amd64-vmlinuz-virtual
    x precise-server-cloudimg-amd64-loader
    x precise-server-cloudimg-amd64-floppy
    x README.files

  2. Make sure to download and source your Eucalyptus credentials.
  3. We need to bundle, upload, and register precise-server-cloudimg-amd64-loader (ERI), precise-server-cloudimg-amd64-vmlinuz-virtual (EKI), and precise-server-cloudimg-amd64.img (EMI).  For more information regarding this, please refer to the “Image Overview” section of the Eucalyptus 3.1 User Guide.  

    $ euca-bundle-image -i precise-server-cloudimg-amd64-loader --ramdisk true
    $ euca-upload-bundle -b latest-ubuntu-precise -m /tmp/precise-server-cloudimg-amd64-loader.manifest.xml
    $ euca-register -a x86_64 latest-ubuntu-precise/precise-server-cloudimg-amd64-loader.manifest.xml
    $ euca-bundle-image -i precise-server-cloudimg-amd64-vmlinuz-virtual --kernel true
    $ euca-upload-bundle -b latest-ubuntu-precise -m /tmp/precise-server-cloudimg-amd64-vmlinuz-virtual.manifest.xml
    $ euca-register -a x86_64 latest-ubuntu-precise/precise-server-cloudimg-amd64-vmlinuz-virtual.manifest.xml
    $ euca-bundle-image -i precise-server-cloudimg-amd64.img
    $ euca-upload-bundle -b latest-ubuntu-precise -m /tmp/precise-server-cloudimg-amd64.img.manifest.xml
    $ euca-register -a x86_64 latest-ubuntu-precise/precise-server-cloudimg-amd64.img.manifest.xml

After bundling, uploading and registering the ramdisk, kernel and image, the latest-ubuntu-precise bucket in Walrus should have the following images:

$ euca-describe-images | grep latest-ubuntu-precise
IMAGE eki-0F3937E9 latest-ubuntu-precise/precise-server-cloudimg-amd64-vmlinuz-virtual.manifest.xml 345590850920 available public x86_64 kernel instance-store

IMAGE emi-C1613E67 latest-ubuntu-precise/precise-server-cloudimg-amd64.img.manifest.xml 345590850920 available public x86_64 machine instance-store

IMAGE eri-0BE53BFD latest-ubuntu-precise/precise-server-cloudimg-amd64-loader.manifest.xml 345590850920 available public x86_64 ramdisk instance-store

Cloud-init Config File

Now that we have the image ready to go, we need to create a cloud-init config file to pass in using the –user-data-file option that is part of euca-run-instances.  For more examples of different cloud-init files, please refer to the cloud-init-dev/cloud-init repository on bazaar.launchpad.net.  Below is the cloud-init.config file I created for bootstrapping the instance with an install of Neo4j, using ephemeral disk for the application storage, and installing some other packages (i.e. latest euca2ools, mlocate, less, etc.). The script can be also accessed from github as well – under the eucalptus/recipes repo.

#cloud-config
apt_update: true
apt_upgrade: true
disable_root: true
package_reboot_if_required: true
packages:
 - less
 - bind9utils
 - dnsutils
 - mlocate
cloud_config_modules:
 - ssh
 - [ apt-update-upgrade, always ]
 - updates-check
 - runcmd
runcmd:
 - [ sh, -xc, "if [ -b /dev/sda2 ]; then tune2fs -L ephemeral0 /dev/sda2;elif [ -b /dev/vda2 ]; then tune2fs -L ephemeral0 /dev/vda2;elif [ -b /dev/xvda2 ]; then tune2fs -L ephemeral0 /dev/xvda2;fi" ]
 - [ sh, -xc, "mkdir -p /var/lib/neo4j" ]
 - [ sh, -xc, "mount LABEL=ephemeral0 /var/lib/neo4j" ]
 - [ sh, -xc, "if [ -z `ls /var/lib/neo4j/*` ]; then sed --in-place '$ iMETA_HOSTNAME=`curl -s http://169.254.169.254/latest/meta-data/local-hostname`\\nMETA_IP=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4`\\necho ${META_IP}   ${META_HOSTNAME} >> /etc/hosts; hostname ${META_HOSTNAME}; sysctl -w kernel.hostname=${META_HOSTNAME}\\nif [ -d /var/lib/neo4j/ ]; then mount LABEL=ephemeral0 /var/lib/neo4j; service neo4j-service restart; fi' /etc/rc.local; fi" ] 
 - [ sh, -xc, "META_HOSTNAME=`curl -s http://169.254.169.254/latest/meta-data/local-hostname`; META_IP=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4`; echo ${META_IP}   ${META_HOSTNAME} >> /etc/hosts" ]
 - [ sh, -xc, "META_HOSTNAME=`curl -s http://169.254.169.254/latest/meta-data/local-hostname`; hostname ${META_HOSTNAME}; sysctl -w kernel.hostname=${META_HOSTNAME}" ]
 - [ sh, -xc, "wget -O c1240596-eucalyptus-release-key.pub http://www.eucalyptus.com/sites/all/files/c1240596-eucalyptus-release-key.pub" ]
 - [ apt-key, add, c1240596-eucalyptus-release-key.pub ]
 - [ sh, -xc, "echo 'deb http://downloads.eucalyptus.com/software/euca2ools/2.1/ubuntu precise main' > /etc/apt/sources.list.d/euca2ools.list" ]
 - [ sh, -xc, "echo 'deb http://debian.neo4j.org/repo stable/' > /etc/apt/sources.list.d/neo4j.list" ]
 - [ apt-get, update ]
 - [ apt-get, install, -y, --force-yes, euca2ools ]
 - [ apt-get, install, -y, --force-yes, neo4j ]
 - [ sh, -xc, "sed --in-place 's/#org.neo4j.server.webserver.address=0.0.0.0/org.neo4j.server.webserver.address=0.0.0.0/' /etc/neo4j/neo4j-server.properties" ]
 - [ sh, -xc, "service neo4j-service restart" ]
 - [ sh, -xc, "export LANGUAGE=en_US.UTF-8" ]
 - [ sh, -xc, "export LANG=en_US.UTF-8" ]
 - [ sh, -xc, "export LC_ALL=en_US.UTF-8" ]
 - [ locale-gen, en_US.UTF-8 ]
 - [ dpkg-reconfigure, locales ]
 - [ updatedb ]
mounts:
 - [ ephemeral0, /var/lib/neo4j, auto, "defaults,noexec" ]

Now, we are ready to launch the instance.

Putting It All Together

Before launching the instance, we need to set up our keypair and security group that we will use with the instance.

  1. To create a keypair, run euca-create-keypair.  *NOTE* Make sure you change the permissions of the keypair to 0600 after its been created.

    euca-create-keypair  neo4j-user > neo4j-user.priv; chmod 0600 neo4j-user.priv

  2. Next, we need to create a security group for our instance.  To create a security group, use euca-create-group.  To open any ports you  need for the application, use euca-authorize.  The ports we will open up for the Neo4j application are SSH (22), ICMP, HTTP( 7474), and HTTPS (7473).
    • Create security group:

      # euca-create-group neo4j-test -d "Security for Neo4j Instances"

    • Authorize SSH:

      # euca-authorize -P tcp -p 22 -s 0.0.0.0/0 neo4j-test

    • Authorize HTTP:

      # euca-authorize -P tcp -p 7474 -s 0.0.0.0/0 neo4j-test

    • Authorize HTTPS:

      # euca-authorize -P tcp -p 7473 -s 0.0.0.0/0 neo4j-test

    • Authorize ICMP:

      # euca-authorize -P icmp -t -1:-1 -s 0.0.0.0/0 neo4j-test

  3. Finally, we use euca-run-instances to launch the Ubuntu Precise image, and use cloud-init to install Neo4j:

    # euca-run-instances -k neo4j-user --user-data-file cloud-init-neo4j.config emi-C1613E67 --kernel eki-0F3937E9 --ramdisk eri-0BE53BFD --group neo4j-test

To check the status of the instance, use euca-describe-instances.

# euca-describe-instances i-A9EF448C
RESERVATION r-ED8E4699 345590850920 neo4j-test
INSTANCE i-A9EF448C emi-C1613E67 euca-192-168-55-104.wu-tang.euca-hasp.eucalyptus-systems.com 
euca-10-106-69-154.wu-tang.internal running admin 0 m1.small 2012-12-04T03:13:13.869Z 
enter-the-wu eki-0F3937E9 eri-0BE53BFD monitoring-disable 
euca-192-168-55-104.wu-tang.euca-hasp.eucalyptus-systems.com euca-10-106-69-154.wu-tang.internal instance-store

Because I added in the cloud-init config file to do an “apt-get upgrade”, it takes about 5 to 7 minutes until the instance is fully configured and Neo4j is running.  Once you have it running, go to https://<ip-address of instance>:7473.  It will direct you to the web administration page for monitoring and management of the Neo4j instance.  In this example, the URL will be https://euca-192-168-55-104.wu-tang.euca-hasp.eucalyptus-systems.com:7473

Neo4j Monitoring and Management Tool
Neo4j Monitoring and Management Tool

Thats it!  The cool thing about this too, is that you can find an Ubuntu Precise AMI on AWS EC2, use the same cloud-init script, use euca2ools, and follow these instructions to get the same deployment on AWS EC2.

As mentioned before, the follow-up blog will be how to deploy the HA solution of Neo4j on Eucalyptus. Enjoy!

Another Great Example of AWS Fidelity – Neo4j, Cloud-Init and Eucalyptus

Eucalyptus Recipe of the Month Challenge

Its time!

Eucalyptus Hoodie, Shirt, Coffee Mug, Frisbee, and Stickers
Eucalyptus Swag

We are issuing a challenge to the Open Source community.  The challenge is called the “Recipe of the Month Challenge”.

The Rules:

  • Documentation on what application(s) the recipe deploys and how to use it. This also includes what is the purpose of the application(s). (Documentation should be easy to follow and straight-forward.  This will help with the judging and testing of the recipe)
  • Any scripting language/configuration management software can be used. Some examples are as follows:
  • Mention the Image (EMI) used by recipe.  (for example: UEC image, Turnkey image or EMI from emis.eucalyptus.com)

The following categories will be used for judging [scale from 1 (lowest) to 10 (highest):

  • Complexity (the more simple and elegant, the better)
  • Deployment speed/efficiency
  • Failure resiliency (how quickly can the solution return to a healthy operational state after an outage)
  • Creativity

Available swag awards:

  • Eucalyptus Hoodie (color available – dark grey)
  • Eucalyptus Electric Cloud Shirt (colors available – black or white)
  • Eucalyptus Contributor’s Coffee Mug
  • Various Eucalyptus stickers

The winning recipe will be made available on the Eucalyptus Recipes Github repository.  Submissions will be accepted at the beginning of each month.  All submissions must be made to the Recipes mailing list.  The last day allowed for submissions will be the 25th of the month (this allows us time to test out each recipe and grade accordingly).  Feel free to use the Eucalyptus Community Cloud as a testbed for your cloud recipe.   Results announcing the winner will be posted to the Recipes, Images, and Eucalyptus Open Community mailing lists (for more information concerning the mailing lists, please visit the Eucalyptus Mailing Lists page).

Look forward to seeing the recipes.  Remember, its all about creativity, deployment speed, and failure resiliency.  Please send all questions, suggestions, additional ideas to the Recipes Mailing list.

Good luck!  Let the Challenge begin!

Eucalyptus Recipe of the Month Challenge

Brewing is Always Welcome on Cloud 10 Too: Eutester, Brew, and Mac OS X Lion

When I finished my last blog, I realized I left out a key group of developers on Mac OS X.  I didn’t want to leave out my Homebrew fans, so this blog is dedicated to the brew users out there in the world.  For those who don’t know, Homebrew is an alternative package manager to MacPorts on the Mac OS X platform.  I am a MacPorts person myself, but I always believe that you should challenge yourself by learning different tools.

Since this is a follow-up blog, it will be short and sweet.  The blog entitled “This is the Green Room” has a great entry on setting up Homebrew on Mac OS X Lion.  We will reference this blog, with some minor tweaks of course..:-).

Prerequisites

As in my last blog, there are prerequisites.  The following is needed:

After the prereqs are met, its time to get to brewing…

Dependencies

As referenced in here, install Homebrew.

Next we will install Python. This can also be referenced in the blog as well. Run the following commands:

brew install readline sqlite gdbm pkg-config
brew install python --framework --universal

To make sure that the newly installed python is used, create or edit the .bash_profile file by adding the following line:

export PATH=/usr/local/share/python:/usr/local/bin:$PATH

Once that is done, source the .bash_profile file:

source .bash_profile

Next, we need to change Lion’s symlink to point to the new Python installation.  Run the following commands:

cd /System/Library/Frameworks/Python.framework/Versions/
sudo rm -rf Current
sudo ln -s /usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/Current

Next we install Pip.  This also is referenced in the blog. To install Pip, run the following command:

easy_install pip

After Pip is done installing, its time to install virtualenv:

pip install virtualenv

From here on out, you just need to reference my previous blog, starting from the section “Setting Up Your Virtual Env and Installing the Required Modules”.

As always, I hope you enjoyed this blog.  On to more Eutester-ing…

Brewing is Always Welcome on Cloud 10 Too: Eutester, Brew, and Mac OS X Lion

Adventures in High Availability: HA iSCSI with DRBD, iSCSI, and Pacemaker

High availability for applications and physical machines is key to having services “appear” to never be down. With cloud computing, deploying failure resilient applications is needed for services that need to be always available.

The purpose of this blog is to provide more of the technical information for HA Open iSCSI that good friend and colleague, Lester, mentioned in his blog. Our goal was to setup HA with Open iSCSI without access to a SAN.  To accomplish this, we used PacemakerOpen iSCSI, and DRBD.  The great guys from Linbit  provided us with the documentation to deploy this environment.

Setup

HA iSCSI Diagram - viking-07 active
HA iSCSI diagram where viking-07 is active node

In our setup, we used the following:


[root@viking-07 ~]# crm status
============
Last updated: Tue Apr 3 20:28:29 2012
Stack: openais
Current DC: viking-07.eucalyptus-systems.com - partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ viking-07.eucalyptus-systems.com viking-08.eucalyptus-systems.com ]

Resource Group: rg_clustervol
p_lvm_clustervol (ocf::heartbeat:LVM): Started viking-07.eucalyptus-systems.com
p_target_clustervol (ocf::heartbeat:iSCSITarget): Started viking-07.eucalyptus-systems.com
p_lu_clustervol_lun1 (ocf::heartbeat:iSCSILogicalUnit): Started viking-07.eucalyptus-systems.com
p_ip_clustervolip (ocf::heartbeat:IPaddr2): Started viking-07.eucalyptus-systems.com
Master/Slave Set: ms_drbd_clustervol
Masters: [ viking-07.eucalyptus-systems.com ]
Slaves: [ viking-08.eucalyptus-systems.com ]

Installation

The installation instructions are pretty straight forward.  The instructions will be the same for both machines – unless otherwise noted.  These instructions assume that CentOS 5.7 has already been installed.  If CentOS 5.7 is not already installed, please go to the CentOS 5 Documentation for installation instructions.

Both Nodes

Installing SCSI Target Framework

The SCSI Target Framework (tgt) is needed for the iSCSI servers we will use in the cluster setup.  To install, run the following command:

yum install scsi-target-utils

Once you have done this, make sure that the tgtd service is part of the system startup:

/sbin/chkconfig tgtd on

Installing Pacemaker cluster manager

Pacemaker is an open source, high availability resource manager.  The packages for the Pacemaker project are provided by clusterlabs.org.  To install Pacemaker, do the following:

  • Download the clusterlabs.repo file with wget or curl to the /etc/yum.repos.d directory:

    wget -O /etc/yum.repos.d/pacemaker.repo http://www.clusterlabs.org/rpm/epel-5/clusterlabs.repo

  • Install Pacemaker (and dependencies):

    yum install pacemaker pacemaker-libs corosync corosynclib

  • Make sure pacemaker is part of the system startup:

    /sbin/chkconfig corosync on

Install DRBD

DRBD provides us the storage backend for the cluster.  It mirrors the data written to the disk to the peer node.  For more information about what DRBD does, refer to the Mirroring section on the DRBD site. To install DRBD, run the following command:

yum install  drbd83 kmod-drbd83

Configuration

Configure DRBD resource

In order to configure DRBD, we need to create and edit a resource file clustervol.res under /etc/drbd.d on both nodes.  One thing to note here, we used a separate device (/dev/sdd2) that uses LVM to be utilized by DRBD (for syncing the content served up by tgtd). We did this to make it easier for recovery of disks in case of failure.

Both Nodes

First use an editor (such as VI) to open the file clustervol.res under /etc/drbd.d:

vi /etc/drbd.d/clustervol.res

Edit the file accordingly to match the environment.  Our resource file looks like the following:


# cat /etc/drbd.d/clustervol.res
resource clustervol {
device /dev/drbd1;
disk /dev/sdd2;
meta-disk internal;

on viking-07.eucalyptus-systems.com {
address 192.168.39.107:7790;
}
on viking-08.eucalyptus-systems.com {
address 192.168.39.108:7790;
}
}

The main parts of the configuration file are as follows:

  • resource – refers to the resource managed by DRBD
  • disk – refers to the device that DRBD will use
  • address – IP Address/port that DRBD will use

For performance gains for disk syncing and failover responsiveness, DRBD can be configured to match those needs. For more information on configuring DRBD, please refer to the sections Configuring DRBD and Optimizing DRBD performance of the DRBD 8.3 User’s Guide.  This document is a *must have* as a reference source.  I suggest reading it and trying out different configurations before putting any service using DRBD in a production environment.

LVM Configuration

There are a few ways that LVM can be utilized with DRBD.  They are as follows:

For our setup, we configured a DRBD resource as a Physical Volume, as as described in the documentation provided by Linbit.  For more information concerning using LVM with DRBD, please refer to section entitled Using LVM with DRBD in the DRBD 8.3 User’s Guide.

We need to make sure to instruct LVM to read the Physical Volume signatures from the DRBD devices only.

Both Nodes

Configure LVM to look at Physical Volume signatures from DRBD devices only by editing the /etc/lvm/lvm.conf file:

filter = [ "a|/dev/drbd.*|", "r|.*|" ]

Disable LVM cache (in /etc/lvm/lvm.conf):

write_cache_state = 0

After disabling the LVM cache, make sure to remove any stale cache entries by deleting the /etc/lvm/cache/.cache

After this is done on both nodes, we need to create an LVM Volume Group by initializing the DRBD resource as an LVM Physical Volume.  In order to do so, creation of the metadata for the resource is needed.  Our resource name is clustervol.

Both Nodes

Run the following command:


# drbdadm create-md clustervol
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.

Next, put the resource up:


# drbdadm up clustervol

Primary Node

Now we need to do the initial sync between the nodes.  This needs to be done on the primary node.  For us, this is viking-07:


[root@viking-07 ~]# drbdadm primary --force clustervol
[root@viking-07 ~]# drbdadm -- --overwrite-data-of-peer primary clustervol

To see monitor the status of the sync and status of the DRBD resource, run the following command:


[root@viking-07 ~]# drbd-overview verbose

Once the syncing has started, we are ready to initialize the DRBD resource as a LVM Physical Volume by running the following command:


[root@viking-07 ~]# pvcreate /dev/drbd/by-res/clustervol

Next, we create an LVM Volume Group that includes this PV:


[root@viking-07 ~]# vgcreate clustervol /dev/drbd/by-res/clustervol

Finally, we need to add a logical volume to represent the iSCSI Logical Unit (LUs). There can be multiple LUs, but in our setup, we created one 10 Gig logical volume for testing purposes.  We created the LV with the following command:


[root@viking-07 ~]# lvcreate -L 10G -n lun1 clustervol

When the DRBD sync has finally completed, when <code>drbd-overview verbose</code> is executed, the output should look similar to this:


[root@viking-07 ~]# drbd-overview verbose
1:clustervol Connected Primary/Secondary UpToDate/UpToDate C r----- lvm-pv: clustervol 930.99G 10.00G

Pacemaker Configuration

Pacemaker is a cluster resource manager which handles resource level failover.  Corosync is the messaging layer which handlers node membership in the cluster and node failure at the infrastructure level.

To configure Pacemaker, do the following:

Primary Node

Generate corosync key:


[root@viking-07 ~]# corosync-keygen

chmod authkey to read-only by root, then copy authkey file to the other node:


[root@viking-07 ~]# chmod 0400 /etc/corosync/authkey
[root@viking-07 ~]# scp /etc/corosync/authkey root@192.168.39.108:/etc/corosync/

Copy corosync.conf.example to corosync.conf under /etc/corosync:


[root@viking-07 ~]# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf

Edit the following fields (this example assumes eth0 is the primary interface):


[root@viking-07 ~]# export coro_port=4000
[root@viking-07 ~]# export coro_mcast=226.94.1.1
[root@viking-07 ~]# export coro_addr=`ifconfig eth0 | grep "inet addr" | awk '{print $2}' | cut -d ":" -f 2 | cut -d "." -f 1,2,3 | xargs -I {} echo {}".0"`
[root@viking-07 ~]# sed -i.gres "s/.*mcastaddr:.*/mcastaddr:\ $coro_mcast/g" /etc/corosync/corosync.conf
[root@viking-07 ~]# sed -i.gres "s/.*mcastport:.*/mcastport:\ $coro_port/g" /etc/corosync/corosync.conf
[root@viking-07 ~]# sed -i.gres "s/.*bindnetaddr:.*/bindnetaddr:\ $coro_addr/g" /etc/corosync/corosync.conf

Copy the corosync.conf files to the other node:


[root@viking-07 ~]# scp /etc/corosync/corosync.conf root@192.168.39.108:/etc/corosync/corosync.conf

Both Nodes

Now you are ready to start corosync on both nodes:


# service corosync start

For more detail information about configuring Pacemaker, please reference the following documents provided by clusterlabs:

Primary Node

To finish up the configuration, the following commands need to be executed to prepare for the HA iSCSI target configuration for a 2-node cluster:


[root@viking-07 ~]# crm
crm(live)# configure
crm(live)configure# property stonith-enabled="false"
crm(live)configure# property no-quorum-policy="ignore"
crm(live)configure# property default-resource-stickiness="200"
crm(live)configure# commit

For more information on how to use the crm shell, please refer to the CRM CLI (command line interface) tool documentation provided by clusterlabs.

Active/Passive iSCSI Configuration

Now we are ready to configure the Active/Passive iSCSI cluster. The following cluster resources are needed for an active/passive iSCSI Target:

  • A DRBD resource to replicate data.  This is controlled by the cluster manager by switching between the Primary and Secondary roles.
  • An LVM Volume Group, which will be available on whichever node currently holds the DRBD resource in Primary Role
  • A virtual, floating IP for the cluster. This will allow initiators to connect to the target no matter which physical node it is running on
  • iSCSI Target
  • At least one iSCSI LUs that corresponds to a Logical Volume in the LVM Volume Group

In our setup, the Pacemaker configuration has 192.168.44.30 as the virtual IP address to use the target with iSCSI Qualified Name (IQN) iqn.1994-05.com.redhat:cfd95480cf87.clustervol. (An important note here is to make sure both nodes have the same initiatorname. The initiatorname for this configuration is iqn.1994-05.com.redhat:cfd95480cf87.  This information is in the /etc/iscsi/initiatorname.iscsi file.)

The target contains the Logical Unit with LUN1, mapping to the Logical Volume named lun1.

To begin with the configuration of the resource, open the crm shell as root, and issue the following commands on the Primary node (i.e. viking-07):


crm(live)# configure
crm(live)configure# primitive p_drbd_clustervol \
ocf:linbit:drbd \
params drbd_resource="clustervol" \
op monitor interval="29" role="Master" \
op monitor interval="31" role="Slave"
crm(live)configure# ms ms_drbd_clustervol p_drbd_clustervol \
meta master-max="1" master-node-max="1" clone-max="2" \
clone-node-max="1" notify="true"

Create a master/slave resource mapping to the DRBD resource clustervol:


crm(live)configure# primitive p_ip_clustervolip \
ocf:heartbeat:IPaddr2 \
params ip="192.168.44.30" cidr_netmask="24" \
op monitor interval="10s"
crm(live)configure# primitive p_lvm_clustervol \
ocf:heartbeat:LVM \
params volgrpname="clustervol" \
op monitor interval="10s" timeout="30" depth="0"
crm(live)configure# primitive p_target_clustervol \
ocf:heartbeat:iSCSITarget \
params iqn="iqn.1994-05.com.redhat:cfd95480cf87.clustervol" \
tid="1" \
op monitor interval="10s" timeout="20s"

Now we add the Logical Unit:


crm(live)configure# primitive p_lu_clustervol_lun1 ocf:heartbeat:iSCSILogicalUnit \
params target_iqn="iqn.1994-05.com.redhat:cfd95480cf87.clustervol" lun="1" path="/dev/clustervol/lun1" implementation="tgt" \
op monitor interval="10"

For information concerning addressing any security considerations for iSCSI, please refer to the section entitled Security Considerations in the document Highly Available iSCSI Storage with DRBD and Pacemaker, provided by Linbit.

To bring it all together, we need to create a resource group from the resource associated with our iSCSI target:


crm(live)configure# group rg_clustervol \
p_lvm_clustervol \
p_target_clustervol p_lu_clustervol_lun1 p_ip_clustervolip

The Pacemaker default for the resource group is ordered and co-located.  This means resources contained in the resource group will always run on the same physical machine, will be started in the same order as specified, and stopped in reverse order.

To wrap things up, make sure that the resource group is started on the node where DRBD is in the Primary role:


crm(live)configure# order o_drbd_before_clustervol \
inf: ms_drbd_clustervol:promote rg_clustervol:start
crm(live)configure# colocation c_clustervol_on_drbd \
inf: rg_clustervol ms_drbd_clustervol:Master

We have now finished our configuration.  All that is left to do is for it to be activated.  To do so, run the issue the following command in the crm shell:


crm(live)configure# commit

For more information about adding a DRBD-backed service to the cluster configuration, please reference Adding a DRBD-backed service to the cluster configuration in the DRBD 8.3 User’s Guide.

To see the setup of your configured resource group, run the following command using the crm CLI:


[root@viking-07 ~]# crm resource show
Resource Group: rg_clustervol
p_lvm_clustervol (ocf::heartbeat:LVM) Started
p_target_clustervol (ocf::heartbeat:iSCSITarget) Started
p_lu_clustervol_lun1 (ocf::heartbeat:iSCSILogicalUnit) Started
p_ip_clustervolip (ocf::heartbeat:IPaddr2) Started
Master/Slave Set: ms_drbd_clustervol
Masters: [ viking-07.eucalyptus-systems.com ]
Slaves: [ viking-08.eucalyptus-systems.com ]

Accessing the iSCSI Target

Make sure that the cluster is online by using the crm CLI:


[root@viking-07 ~]# crm status
============
Last updated: Tue Apr 3 19:51:25 2012
Stack: openais
Current DC: viking-07.eucalyptus-systems.com - partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ viking-07.eucalyptus-systems.com viking-08.eucalyptus-systems.com ]

Resource Group: rg_clustervol
p_lvm_clustervol (ocf::heartbeat:LVM): Started viking-07.eucalyptus-systems.com
p_target_clustervol (ocf::heartbeat:iSCSITarget): Started viking-07.eucalyptus-systems.com
p_lu_clustervol_lun1 (ocf::heartbeat:iSCSILogicalUnit): Started viking-07.eucalyptus-systems.com
p_ip_clustervolip (ocf::heartbeat:IPaddr2): Started viking-07.eucalyptus-systems.com
Master/Slave Set: ms_drbd_clustervol
Masters: [ viking-07.eucalyptus-systems.com ]
Slaves: [ viking-08.eucalyptus-systems.com ]

The status of the cluster reflects the diagram at the beginning of the blog.  We are ready to access iSCSI.

On another machine – here its viking-09 – start up Open iSCSI and attempt to access the target:


[root@viking-09 ~]# service iscsi start
iscsid is stopped
Starting iSCSI daemon: [ OK ]
[ OK ]

We need to start a discovery session on the target portal. Use the cluster IP address for this:


[root@viking-09 ~]# iscsiadm -m discovery -p 192.168.44.30 -t sendtargets

The output from this command should be the name of the target we configured:


192.168.44.30:3260,1 iqn.1994-05.com.redhat:cfd95480cf87.clustervol

Now we are ready to log into target:


[root@viking-09 ~]# iscsiadm -m node -p 192.168.44.30 -T iqn.1994-05.com.redhat:cfd95480cf87.clustervol --login
Logging in to [iface: default, target: iqn.1994-05.com.redhat:cfd95480cf87.clustervol, portal: 192.168.44.30,3260] (multiple)
Login to [iface: default, target: iqn.1994-05.com.redhat:cfd95480cf87.clustervol, portal: 192.168.44.30,3260] successful.

For more information on connecting to iSCSI targets, please refer to the section entitled Using highly available iSCSI Targets in the document Highly Available iSCSI Storage with DRBD and Pacemaker.

Running dmesg should show us what block device is associated with the target:


......
scsi9 : iSCSI Initiator over TCP/IP
Vendor: IET Model: Controller Rev: 0001
Type: RAID ANSI SCSI revision: 05
scsi 9:0:0:0: Attached scsi generic sg4 type 12
Vendor: IET Model: VIRTUAL-DISK Rev: 0001
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sde: 20971520 512-byte hdwr sectors (10737 MB)
sde: Write Protect is off
sde: Mode Sense: 49 00 00 08
SCSI device sde: drive cache: write back
SCSI device sde: 20971520 512-byte hdwr sectors (10737 MB)
sde: Write Protect is off
sde: Mode Sense: 49 00 00 08
SCSI device sde: drive cache: write back
sde: sde1
sd 9:0:0:1: Attached scsi disk sde
sd 9:0:0:1: Attached scsi generic sg5 type 0
.....

Now the only thing left is to format the device.  For this exercise, we just used XFS.


[root@viking-09 ~]# mkfs.xfs /dev/sde

Once thats completed, we mounted the device a mount point. Our mount point is /mnt/iscsi_vol.


[root@viking-09 ~]# mkdir -p /mnt/iscsi_vol
[root@viking-09 ~]# mount -o noatime,nobarrier /dev/sde /mnt/iscsi_vol/

Using df -ah, we can see the filesystem mounted:


[root@viking-09 ~]# df -ah
Filesystem Size Used Avail Use% Mounted on
/dev/md1 2.7T 1.5G 2.6T 1% /
proc 0 0 0 - /proc
sysfs 0 0 0 - /sys
devpts 0 0 0 - /dev/pts
/dev/md0 487M 29M 433M 7% /boot
tmpfs 7.9G 0 7.9G 0% /dev/shm
none 0 0 0 - /proc/sys/fs/binfmt_misc
sunrpc 0 0 0 - /var/lib/nfs/rpc_pipefs
/dev/sde 10G 4.6M 10G 1% /mnt/iscsi_vol

Testing HA Failover

A simple way to test failover is to use the crm CLI. Here is the current status of our cluster:


[root@viking-07 ~]# crm status
============
Last updated: Tue Apr 3 20:22:06 2012
Stack: openais
Current DC: viking-07.eucalyptus-systems.com - partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ viking-07.eucalyptus-systems.com viking-08.eucalyptus-systems.com ]

Resource Group: rg_clustervol
p_lvm_clustervol (ocf::heartbeat:LVM): Started viking-07.eucalyptus-systems.com
p_target_clustervol (ocf::heartbeat:iSCSITarget): Started viking-07.eucalyptus-systems.com
p_lu_clustervol_lun1 (ocf::heartbeat:iSCSILogicalUnit): Started viking-07.eucalyptus-systems.com
p_ip_clustervolip (ocf::heartbeat:IPaddr2): Started viking-07.eucalyptus-systems.com
Master/Slave Set: ms_drbd_clustervol
Masters: [ viking-07.eucalyptus-systems.com ]
Slaves: [ viking-08.eucalyptus-systems.com ]

To failover, issue the following crm shell command:


[root@viking-07 ~]# crm resource move rg_clustervol viking-08.eucalyptus-systems.com

This crm command moved the resource from viking-07 to viking-08.  The status of the cluster should look like the following:


[root@viking-07 ~]# crm status
============
Last updated: Tue Apr 3 20:23:30 2012
Stack: openais
Current DC: viking-07.eucalyptus-systems.com - partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ viking-07.eucalyptus-systems.com viking-08.eucalyptus-systems.com ]

Resource Group: rg_clustervol
p_lvm_clustervol (ocf::heartbeat:LVM): Started viking-08.eucalyptus-systems.com
p_target_clustervol (ocf::heartbeat:iSCSITarget): Started viking-08.eucalyptus-systems.com
p_lu_clustervol_lun1 (ocf::heartbeat:iSCSILogicalUnit): Started viking-08.eucalyptus-systems.com
p_ip_clustervolip (ocf::heartbeat:IPaddr2): Started viking-08.eucalyptus-systems.com
Master/Slave Set: ms_drbd_clustervol
Masters: [ viking-08.eucalyptus-systems.com ]
Slaves: [ viking-07.eucalyptus-systems.com ]

This is a diagram of what the cluster looks like now:

HA iSCSI Diagram - viking-08 active
HA iSCSI Diagram where viking-08 is the active node

You can switch the resource back to viking-07 by running the following crm CLI command:


[root@viking-07 ~]# crm resource move rg_clustervol viking-07.eucalyptus-systems.com

The status of the cluster should look similar to this:


[root@viking-07 ~]# crm status
============
Last updated: Tue Apr 3 20:28:29 2012
Stack: openais
Current DC: viking-07.eucalyptus-systems.com - partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ viking-07.eucalyptus-systems.com viking-08.eucalyptus-systems.com ]

Resource Group: rg_clustervol
p_lvm_clustervol (ocf::heartbeat:LVM): Started viking-07.eucalyptus-systems.com
p_target_clustervol (ocf::heartbeat:iSCSITarget): Started viking-07.eucalyptus-systems.com
p_lu_clustervol_lun1 (ocf::heartbeat:iSCSILogicalUnit): Started viking-07.eucalyptus-systems.com
p_ip_clustervolip (ocf::heartbeat:IPaddr2): Started viking-07.eucalyptus-systems.com
Master/Slave Set: ms_drbd_clustervol
Masters: [ viking-07.eucalyptus-systems.com ]
Slaves: [ viking-08.eucalyptus-systems.com ]

A more complex test is to do an md5sum of a file (e.g. an ISO) and copy it from a desktop/laptop to the machine that has the iSCSI targeted mounted (e.g. viking-09).  While the copy is happening, you can use the crm CLI to failover back and forth. You will see there is no delay.  You can monitor the status of the cluster by using crm_mon. After the copy is complete, do an md5sum of the ISO on the machine to where it was copied.  The md5sums should match.

The next phase of this blog will be to script as much of this install and configuration as possible (e.g. using Puppet or Chef).  Stayed tuned to more information about this.  Hope you enjoyed this blog.  Let me know if you have questions, suggestions, and/or comments.

Enjoy!

Adventures in High Availability: HA iSCSI with DRBD, iSCSI, and Pacemaker

The Collaboration: Eustore with Varnish and Eucalyptus Walrus

In my last blog, I covered three ways Eucalyptus Systems uses the Varnish-Walrus architecture. This blog will cover how eustore takes advantage of this architecture.

Overview

Eustore is an image management tool developed by David Kavanagh. Its primary goal is to automate image bundling, uploading and registration.  The two commands provided by eustore is eustore-describe-images and eustore-install-image.


# eustore-describe-images --help
Usage: eustore-describe-images [options]

lists images from Eucalyptus.com

Options:
-h, --help show this help message and exit
-v, --verbose display more information about images

Standard Options:
-D, --debug Turn on all debugging output
--debugger Enable interactive debugger on error
-U URL, --url=URL Override service URL with value provided
--region=REGION Name of the region to connect to
-I ACCESS_KEY_ID, --access-key-id=ACCESS_KEY_ID
Override access key value
-S SECRET_KEY, --secret-key=SECRET_KEY
Override secret key value
--version Display version string

# eustore-install-image --help
Usage: eustore-install-image [options]

downloads and installs images from Eucalyptus.com

Options:
-h, --help show this help message and exit
-i IMAGE_NAME, --image_name=IMAGE_NAME
name of image to install
-b BUCKET, --bucket=BUCKET
specify the bucket to store the images in
-k KERNEL_TYPE, --kernel_type=KERNEL_TYPE
specify the type you're using [xen|kvm]
-d DIR, --dir=DIR specify a temporary directory for large files
--kernel=KERNEL Override bundled kernel with one already installed
--ramdisk=RAMDISK Override bundled ramdisk with one already installed

Standard Options:
-D, --debug Turn on all debugging output
--debugger Enable interactive debugger on error
-U URL, --url=URL Override service URL with value provided
--region=REGION Name of the region to connect to
-I ACCESS_KEY_ID, --access-key-id=ACCESS_KEY_ID
Override access key value
-S SECRET_KEY, --secret-key=SECRET_KEY
Override secret key value
--version Display version string

Eustore, by default, uses the images located on emis.eucalyptus.com.  Eustore can be configured to use other locations for images.  Eustore utilizes two components for image management:

  • JSON configuration file – catalog.json
  • Location for tar-gzipped EMIs

The images found on emis.eucalyptus.com and the JSON configuration file associated with those images are all located in unique Walrus buckets. The images shown in this blog are in the starter-emis bucket. The ACLs for these buckets allow for the objects to be publicly accessible. For more information on Walrus ACLs, please reference the section “Access Control List (ACL) Overview” in the AWS S3 Developer’s Guide.

The two commands mentioned above that eustore provides – eustore-describe-images and eustore-install-images – significantly cuts down the number of commands needed to be input by the user. Without using eustore, a user would need to run 3 commands (euca-bundle-image, euca-upload-image, and euca-register) for the kernel, ramdisk, and raw disk image for an EMI (this translates to a total of 9 commands).

The Collaboration

eustore-describe-images

When eustore-describe-images is ran, the following occurs:

Diagram of eustore-describe-images
eustore-describe-images
  1. eustore-describe-image requests information from JSON file (stored in Walrus bucket) from emis.eucalyptus.com (varnishd instance)
  2. ** If JSON file – catalog.json – is not present on emis.eucalyptus.com‘s cache, then JSON file is pulled from Walrus bucket
  3. Data from JSON file is returned to eustore-describe-images.


    # eustore-describe-images
    ....
    centos-x86_64-20120114 centos x86_64 2012.1.14 CentOS 5 1.3GB root, Single Kernel
    centos-lg-i386-20110702 centos i386 2011.07.02 CentOS 5 4.5GB root, Hypervisor-Specific Kernels
    centos-lg-x86_64-20110702centos x86_64 2011.07.02 CentOS 5 4.5GB root, Hypervisor-Specific Kernels
    centos-lg-x86_64-20111228centos x86_64 2011.12.28 CentOS 5 4.5GB root, Single Kernel
    centos-lg-x86_64-20120114centos x86_64 2012.1.14 CentOS 5 4.5GB root, Single Kernel
    debian-i386-20110702 debian i386 2011.07.02 Debian 6 1.3GB root, Hypervisor-Specific Kernels
    debian-x86_64-20110702 debian x86_64 2011.07.02 Debian 6 1.3GB root, Hypervisor-Specific Kernels
    debian-x86_64-20120114 debian x86_64 2012.1.14 Debian 6 1.3GB root, Single Kernel
    .....

eustore-install-image

eustore-install-image follows the same steps as eustore-describe-images, except it uses the information stored in the JSON file for each EMI.  The following information is present for each EMI:


{
"images": [
{
"name": "centos-x86_64-20120114",
"description": "CentOS 5 1.3GB root, Single Kernel",
"version": "2012.1.14",
"architecture": "x86_64",
"os": "centos",
"url": "starter-emis/euca-centos-2012.1.14-x86_64.tgz",
"date": "20120114150503",
"recipe": "centos-based",
"stamp": "28fc-4826",
"contact": "images@lists.eucalyptus.com"
},
.....
{
"name": "debian-x86_64-20120114",
"description": "Debian 6 1.3GB root, Single Kernel",
"version": "2012.1.14",
"architecture": "x86_64",
"os": "debian",
"url": "starter-emis/euca-debian-2012.1.14-x86_64.tgz",
"date": "20120114152138",
"recipe": "debian-based",
"stamp": "3752-f34a",
"contact": "images@lists.eucalyptus.com"
},
....
]
}

When eustore-install-image -i centos-x86_64-20120114 -b centos_x86-64 is executed, the following occurs:

Diagram of eustore-install-image
eustore-install-image
  1. eustore-install-image requests image (which is in a tar-gzipped form) to be downloaded from emis.eucalyptus.com
  2. ** If image is not available (euca-centos-2012.1.14-x86_64.tgz) in varnish cache, then varnishd (emis.eucalyptus.com) will pull image from starter-emis bucket and store it in ephemeral space for caching to handle future requests.
  3. Once the tar-gzipped file is downloaded, eustore-install-image will bundle, upload, and register the kernel (EKI), ramdisk (ERI) and image (EMI).

As demonstrated above, eustore definitely makes image management efficient and user-friendly. Stay tuned for upcoming blogs discussing more on how the Varnish-Walrus architecture is utilized.

For any questions, concerns, and/or suggestions, please email images@lists.eucalyptus.com or community@lists.eucalyptus.com. And as always, you can respond with comments here as well.:-)

Enjoy!

**This step won’t happen if the contents are cached on emis.eucalyptus.com

The Collaboration: Eustore with Varnish and Eucalyptus Walrus

Fun with Varnish and Walrus on Eucalyptus, Part 2

A few weeks ago, I posted a blog entitled “Fun with Varnish and Walrus on Eucalyptus, Part 1“. This blog will follow-up on my blog to showcase a few production use cases that utilize the VarnishWalrus architecture built on top of Eucalyptus.*NOTE* This architecture can also be leveraged using AWS EC2 and S3. This is one of the many benefits of Eucalyptus being AWS compatible.

The tools and web pages that take advantage of the Varnish-Walrus architecture on Eucalyptus are the following:

Eustore uses the Varnish-Walrus architecture by pulling images through emi.eucalyptus.com (the varnish instance). The data for each of the images is stored in a JSON file located in a Walrus bukkit. For more information about Eustore, please refer to David Kavanagh’s Eustore blog.

The Starter Eucalyptus Machine Images (EMIs) page uses the Varnish-Walrus architecture to allow users to download all of the EMIs that can be downloaded.

Starter EMIs
Starter Eucalyptus Machine Images (EMIs)
Since emis.eucalyptus.com is a varnish instance, you can query logs there to get statistics on how many of each EMI has been downloaded.

The Eucalyptus Machine Images page is a static web page for emis.eucalyptus.com, which is comprised of HTML, CSS, and jQuery – which are all stored in a Walrus bukkit.

http://emis.eucalyptus.com
Eucalyptus Machine Images Page
The web page for emis.eucalyptus.com definitely shows the power of using Walrus as a data store for various information. It accesses the same JSON file that is used by Eustore. We did this to make sure that there is consistency with all tools and web pages that provide access to the EMIs we create.

Hope you enjoyed this introduction to the use cases we use here at Eucalyptus. Stay tuned to the follow-up blogs that provide a more in-depth view as to how each use case utilizes our Varnish-Walrus infrastructure.

Thanks to David Kavanagh and Ian Struble for helping in this endeavor. This blog would have been out sooner, but I was busy at Scale 10x working the booth for Eucalyptus Systems. To see the fun we had at the conference, check out the following tumblr posts:

Till next time…

1Eustore was designed by David Kavanagh, one of the many great colleagues I work with at Eucalyptus Systems. It initially started as a project idea that spurred from various image management needs discussed in the Eucalyptus Image Management group.

Fun with Varnish and Walrus on Eucalyptus, Part 2

Fun with Varnish and Walrus on Eucalyptus, Part 1

After getting some free time to put together a high-level diagram of the Varish/Walrus setup we are using at Eucalyptus Systems, I decided to use it as an opportunity to make it my first technical blog.

The Inspiration

Here at Eucalyptus Systems, we are really big on “drinking our own champagne“.  We are in the process of migrating every-day enterprise services to utilize Eucalyptus.  My good friend and co-worker Graziano got our team started down this path with his blog posts on Drinking Champagne and Planet Eucalyptus.

The Problem

We needed to migrate storage of various tar-gzipped files from a virtual machine, to an infrastructure running on Eucalyptus.  Since Eucalyptus Walrus is compatible with Amazon’s S3, it serves as a great service for storing tons of static content and data.  Walrus – just as S3 – also has ACLs for data objects stored.

With all the coolness of storing data in Walrus, we needed to figure out a way lessen the network load to Walrus due to multiple HTTP GET requests. This is where Varnish comes to the rescue…

The Solution

Varnishd-Walrus Architecture

Above is the architectural diagram of how Varnishd can be set up as a caching service for objects stored in Walrus buckets.  Varnish is primarily developed as an HTTP accelerator.  In this setup, we use varnish to accomplish the following:

  • caching bucket objects requested through HTTP
  • custom URL naming for buckets
  • granular control to Walrus buckets

Bucket Objects in Walrus

We upload the bucket objects using a patched version of s3cmd.  To allow the objects to be accessed by the varnish caching instance, we use s3cmd as follows:

  • Create the bucket:

    s3cmd mb s3://bucket-name

  • Upload the object and make sure its public accessible:

    s3cmd put --acl-public --guess-mime-type object s3://bucket-name/object

And thats it.  All the other configuration is done on the varnish end.  Now, on to varnish..

Varnishd Setup

The instance that is running varnish is running Debian 6.0. We  process any request for specific bucket objects that come to the instance, and pull from the bucket where the object is located.  The instance is customized to take in scripts through user-data/user-data-file option thats can be used by the euca-run-instance command.  The rc.local script that enables this option in the image can be found here.  The script we use for this varnish setup – along with other deployment scripts – can be found here on projects.eucalyptus.com.

Thats it!  We can bring up another instance quickly without a problem – since it’s scripted..:-).  We also use Walrus to store our configurations as well.  For extra security, we don’t make those objects public.  We use s3cmd to download the objects, then move the configuration files to the correct location in the instance.

We hope this setup inspires other ideas that can be implemented with Eucalyptus.  Please feel free to give any feedback.  We are always open to improve things here at Eucalyptus.  Enjoy, and be on the look out for a follow-up post discussing how to add load balancing to this setup.

Fun with Varnish and Walrus on Eucalyptus, Part 1