Demo of Eucalyptus hotness: 3.3 milestone 6
Reblogged from Greg DeKoenigsberg Speaks:
Our demo day for milestone 6 was yesterday, and it was choice. We're at feature completeness at this point, and we're now on final approach for release sometime Soon-ish, as soon as we shake out all the code nasties. We've got some good stuff to show off on Vimeo. The basic transcript:
- 0:00 Eric Choi, Product Mktg Manager, with agenda/housekeeping.
Big Data on the Cloud using Ansible, RHadoop, AppScale, and AWS/Eucalyptus
Background
Big Data has been a hot topic over the last few years. Big Data on public clouds, such as AWS’s Elastic MapReduce, has been gaining even more popularity as cloud computing becomes more of an industry standard.
R is an open source project for statistical computing and graphics. It has been growing in popularity for doing linear and nonlinear modeling, classical statistical tests, time-series analysis and others, at various Universities and companies.
RHadoop was developed by Revolution Analytics to interface with Hadoop. Revolution Analytics builds analytic software solutions using R.
AppScale is an open source PaaS that implements the Google AppEngine API on IaaS environments. One of the Google AppEngine APIs that is implemented is AppEngine MapReduce. The back-end support for this API that AppScale using Cloudera’s Distribution for Apache Hadoop.
Ansible is an open source orchestration software that utilizes SSH for handling configuration management for physical/virtual machines, and machines running in the cloud.
Amazon Web Services is a public IaaS that provides infrastructure and application services in the cloud. Eucalyptus is an open source software solution that provides the AWS APIs for EC2, S3, and IAM for on-premise cloud environments.
This blog entry will cover how to deploy AppScale (either on AWS or Eucalyptus), then use Ansible to configure each AppScale node with R, and the RHadoop packages in order allow programs written in R to utilize MapReduce in the cloud.
Pre-requisites
To get started, the following is needed on a desktop/laptop computer:
- AppScale Tools installed.
- Ansible installed.
- The following AWS/Eucalyptus variables exported as global variables in your shell:
- EC2_ACCESS_KEY
- EC2_SECRET_KEY
- EC2_URL
*NOTE: These variables are used by AppScale Tools version 1.6.9. Check the AWS and Eucalyptus documentation regarding obtaining user credentials.
Deployment
AppScale
After installing AppScale Tools and Ansible, the AppScale cluster needs to be deployed. After defining the AWS/Eucalyptus variables, initialize the creation of the AppScale cluster configuration file – AppScalefile.
$ ./appscale-tools/bin/appscale init cloud
Edit the AppScalefile, providing information for the keypair, security group, and AppScale AMI/EMI. The keypair and security group do not need to be pre-created. AppScale will handle this. The AppScale AMI on AWS (us-east-1) is ami-4e472227. The Eucalyptus EMI will be unique based upon the Eucalyptus cloud that is being used. In this example, the AWS AppScale AMI will be used, and the AppScale cluster size will be 3 nodes. Here is the example AppScalefile:
--- group : 'appscale-rmr' infrastructure : 'ec2' instance_type : 'm1.large' keyname : 'appscale-rmr' machine : 'ami-4e472227' max : 3 min : 3 table : 'hypertable'
After editing the AppScalefile, start up the AppScale cluster by running the following command:
$ ./appscale-tools/bin/appscale up
Once the cluster finishes setting up, the status of the cluster can be seen by running the command below:
$ ./appscale-tools/bin/appscale status
R, RHadoop Installation Using Ansible
Now that the cluster is up and running, grab the Ansible playbook for installing R, and RHadoop rmr2 and rhdfs packages onto the AppScale nodes. The playbook can be downloaded from github using git:
$ git clone https://github.com/hspencer77/ansible-r-appscale-playbook.git
After downloading the playbook, the ansible-r-appscale-playbook/production file needs to be populated with the information of the AppScale cluster. Grab the cluster node information by running the following command:
$ ./appscale-tools/bin/appscale status | grep amazon | grep Status | awk '{print $5}' | cut -d ":" -f 1
ec2-50-17-96-162.compute-1.amazonaws.com
ec2-50-19-45-193.compute-1.amazonaws.com
ec2-67-202-23-157.compute-1.amazonaws.com
Add those DNS entries to the ansible-r-appscale-playbook/production file. After editing, the file will look like the following:
[appscale-nodes]
ec2-50-17-96-162.compute-1.amazonaws.com
ec2-50-19-45-193.compute-1.amazonaws.com
ec2-67-202-23-157.compute-1.amazonaws.com
Now the playbook can be executed. The playbook requires the SSH private key to the nodes. This key will be located under the ~/.appscale folder. In this example, the key file is named appscale-rmr.key. To execute the playbook, run the following command:
$ ansible-playbook -i r-appscale-deployment/production
--private-key=~/.appscale/appscale-rmr.key -v r-appscale-deployment/site.yml
Testing Out The Deployment – Wordcount.R
Once the playbook has finished running, the AppScale cluster is now ready to be used. To test out the setup, SSH into the head node of the AppScale cluster. To find out the head node of the cluster, execute the following command:
$ ./appscale-tools/bin/appscale status
After discovering the head node, SSH into the head node using the private key located in the ~/.appscale directory:
$ ssh -i ~/.appscale/appscale-rmr.key root@ec2-50-17-96-162.compute-1.amazonaws.com
To test out the R setup on all the nodes, grab the wordcount.R program:
root@appscale-image0:~# tar zxf rmr2_2.0.2.tar.gz rmr2/tests/wordcount.R
In the wordcount.R file, the following lines are present
rmr2:::hdfs.put("/etc/passwd", "/tmp/wordcount-test")
out.hadoop = from.dfs(wordcount("/tmp/wordcount-test", pattern = " +"))
When the wordcount.R program is executed, it will grab the /etc/password file from the head node, copy it to the hdfs filesystem, then run wordcount on /etc/password to look for the pattern ” +”. NOTE: wordcount.R can be edited to use any file and pattern desired.
Run wordcount.R:
root@appscale-image0:~# R
R version 2.15.3 (2013-03-01) -- "Security Blanket"
Copyright (C) 2013 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
> source('rmr2/tests/wordcount.R')
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
13/04/05 02:33:41 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
13/04/05 02:33:43 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
packageJobJar: [/tmp/RtmprcYtsu/rmr-local-env19811a7afd54, /tmp/RtmprcYtsu/rmr-global-env1981646cf288, /tmp/RtmprcYtsu/rmr-streaming-map198150b6ff60, /tmp/RtmprcYtsu/rmr-streaming-reduce198177b3496f, /tmp/RtmprcYtsu/rmr-streaming-combine19813f7ea210, /var/appscale/hadoop/hadoop-unjar5632722635192578728/] [] /tmp/streamjob8198423737782283790.jar tmpDir=null
13/04/05 02:33:44 WARN snappy.LoadSnappy: Snappy native library is available
13/04/05 02:33:44 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/04/05 02:33:44 INFO snappy.LoadSnappy: Snappy native library loaded
13/04/05 02:33:44 INFO mapred.FileInputFormat: Total input paths to process : 1
13/04/05 02:33:44 INFO streaming.StreamJob: getLocalDirs(): [/var/appscale/hadoop/mapred/local]
13/04/05 02:33:44 INFO streaming.StreamJob: Running job: job_201304042111_0015
13/04/05 02:33:44 INFO streaming.StreamJob: To kill this job, run:
13/04/05 02:33:44 INFO streaming.StreamJob: /root/appscale/AppDB/hadoop-0.20.2-cdh3u3/bin/hadoop job -Dmapred.job.tracker=10.77.33.247:9001 -kill job_201304042111_0015
13/04/05 02:33:44 INFO streaming.StreamJob: Tracking URL: http://appscale-image0:50030/jobdetails.jsp?jobid=job_201304042111_0015
13/04/05 02:33:45 INFO streaming.StreamJob: map 0% reduce 0%
13/04/05 02:33:51 INFO streaming.StreamJob: map 50% reduce 0%
13/04/05 02:33:52 INFO streaming.StreamJob: map 100% reduce 0%
13/04/05 02:33:59 INFO streaming.StreamJob: map 100% reduce 33%
13/04/05 02:34:02 INFO streaming.StreamJob: map 100% reduce 100%
13/04/05 02:34:04 INFO streaming.StreamJob: Job complete: job_201304042111_0015
13/04/05 02:34:04 INFO streaming.StreamJob: Output: /tmp/RtmprcYtsu/file1981524ee1a3
13/04/05 02:34:05 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
13/04/05 02:34:07 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
13/04/05 02:34:08 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
13/04/05 02:34:10 INFO security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
Deleted hdfs://10.77.33.247:9000/tmp/wordcount-test
>quit("yes")
Thats it! The AppScale cluster is ready for additional R programs that utilize MapReduce. Enjoy the world of Big Data on public/private IaaS.
What's new in Ansible 1.1 for AWS and Eucalyptus users?
Reblogged from Take that to the bank and cash it!:
I thought the Ansible 1.0 development cycle was busy but 1.1 is crammed full of orchestration goodness. On Tuesday, 1.1 was released and you can read more about it here: http://blog.ansibleworks.com/2013/04/02/ansible-1-1-released/
For those working on AWS and Eucalyptus, 1.1 brings some nice module improvements as well as a new cloudformation and s3 module. It's great to see the AWS-related modules becoming so popular so quickly.
Using Ansible to Deploy Neo4j HA Cluster on AWS/Eucalyptus
As a follow-up to my last Neo4j, AWS/Eucalyptus blog, this entry demonstrates another great example of AWS/Eucalyptus fidelity by using Ansible to deploy a Neo4j High Available cluster.
Pre-requisites
In order to use this Ansible playbook on AWS/Eucalyptus, the following is needed:
- An AWS or Eucalyptus account, with a user’s access key and secret access key.
- EC2 IAM Policy to allow launching of instances, and authorize ports in security group
- Ubuntu Cloud Image (Precise 12.04)
- EC2 API Client Tools
- git repository tools
Before deploying the cluster, a security group needs to be created that the cluster will use. The security group must allow the following:
- port 22 (SSH)
- all instances part of the security group allowed to community with each other (ports 0 - 65535)
To create the security group and authorize the ports, make sure the user’s access key, secret access key, and EC2 URL are noted, and do the following:
- Create the security group
ec2-create-group --aws-access-key <EC2_ACCESS_KEY> --aws-secret-key <EC2_SECRET_KEY> --url <EC2_URL> -g neo4j-cluster -d "Neo4j HA Cluster"
- Authorize port for SSH in neo4j-cluster security group
ec2-authorize --aws-access-key <EC2_ACCESS_KEY> --aws-secret-key <EC2_SECRET_KEY> --url <EC2_URL> -P tcp -p 22 -s 0.0.0.0/0 neo4j-cluster
- Authorize all port communication between cluster members
ec2-authorize --aws-access-key <EC2_ACCESS_KEY> --aws-secret-key <EC2_SECRET_KEY> --url <EC2_URL> -P tcp -o neo4j-cluster -p -1 neo4j-cluster
After completing these steps, use
ec2-describe-group
to view the security group:
ec2-describe-group --aws-access-key <EC2_ACCESS_KEY> --aws-secret-key <EC2_SECRET_KEY> --url <EC2_URL> neo4j-cluster GROUP sg-1cbc5777 986451091583 neo4j-cluster Neo4j HA Cluster PERMISSION 986451091583 neo4j-cluster ALLOWS tcp 0 65535 FROM USER 986451091583 NAME neo4j-cluster ID sg-1cbc5777 ingress PERMISSION 986451091583 neo4j-cluster ALLOWS tcp 22 22 FROM CIDR 0.0.0.0/0 ingress
Neo4j HA Cluster Deployment
Once the security group is created with the correct ports authorized, the cluster can be deployed. To deploy the cluster, do the following:
- Obtain Ansible from git and setup the environment by following the instructions mentioned here -
http://ansible.cc/docs/gettingstarted.html#getting-ansible
- Obtain the Ansible Playbook for Neo4j HA Cluster using git
git clone https://github.com/hspencer77/ansible-neo4j-cluster.git
- Change directory into ansible-neo4j-cluster
cd ansible-neo4j-cluster
- Set up /etc/ansible/hosts with the following information:
[local] 127.0.0.1 - Populate vars/ec2-config with either Eucalyptus/AWS information. vars/ec2-config contains the following variables:
keypair: <EC2/Eucalyptus Keypair> ec2_access_key: <EC2_ACCESS_KEY> ec2_secret_key: <EC2_SECRET_KEY> ec2_url: <EC2_URL> instance_type: m1.small security_group: <AWS/Eucalyptus Security Group> image: <AMI/EMI> -
Execute the following command:
ansible-playbook neo4j-cluster.yml \ --private-key=<AWS/Eucalyptus Private Key file> --extra-vars "node_count=3" - After the playbook finishes, there will be an URL provided to access the cluster – similar to the example below:
TASK: [Display HAProxy URL] ********************* changed: [23.22.248.75] => {"changed": true, "cmd": "echo \"HAProxy URL for Neo4j - http://ec2-23-22-248-75.compute-1.amazonaws.com/webadmin/#/info/org.neo4j/High%20Availability/\" ", "delta": "0:00:00.006835", "end": "2013-03-30 19:54:31.104320", "rc": 0, "start": "2013-03-30 19:54:31.097485", "stderr": "", "stdout": "HAProxy URL for Neo4j - http://ec2-23-22-248-75.compute-1.amazonaws.com/webadmin/#/info/org.neo4j/High%20Availability/"}To view the status of cluster in the browser, open up
http://ec2-23-22-248-75.compute-1.amazonaws.com/webadmin/#/info/org.neo4j/High%20Availability/
. - To get the status of the cluster, use curl:
curl -H "Content-Type:application/json" -d '["org.neo4j:*"]' http://ec2-23-22-248-75.compute-1.amazonaws.com/db/manage/server/jmx/query
Thats it! A Neo4j HA cluster with an HA Proxy server serving as an endpoint is available to be used. If a bigger cluster is desired, just change the
node_count
value. For additional information regarding this playbook, and how it handles the cluster membership, please refer to the following URL -
https://github.com/hspencer77/ansible-neo4j-cluster/blob/master/README.md
.
Hope you enjoy! As always, questions/comments/suggestions are always welcome.
Exclusive: Startup AnsibleWorks pitches open-source IT configuration, deployment tool
A couple of former Red Hat (s rhat) veterans think there's an easier way to configure, deploy and manage IT across an organization and founded AnsibleWorks to attack that problem.
Systems administrators and developers want one tool for deployment, configuration and management -- they don't want to deal with agents and add-ons, said Said Siouani, CEO of Santa Barbara, Calif.-based AnsibleWorks.
Deploying Eucalyptus via Ansible playbook(s)
Reblogged from Take that to the bank and cash it!:
The first cut of the Ansible deployment playbook for deploying Eucalyptus private clouds is ready. I've merged the first "release" into the master branch here: https://github.com/lwade/eucalyptus-playbook. Feedback and contributions are very welcome, please file issues against the project.
This playbook allows a user to deploy a single front-end cloud (i.e. all component on a single system) and as many NC's as they want.
Run Appscale on Eucalyptus
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). - Wikipedia
According to Wikipedia currently there are few popular service models exist.
1. Infrastructure as a service (IaaS)
2. Platform as a service (PaaS)
3. Software as a service (SaaS)
So, I have an Eucalyptus cloud, which is great, serves as AWS-like IaaS platform.
















