ZaaS Building Blocks: Leveraging Amazon’s Auto Scaling Groups (ASG)

shutterstock_144430459As we continue to talk about how we deliver reliability, flexibility, and scale as part of our cloud-based Zenoss as a Service (ZaaS) solution, we must talk about Auto Scaling. Auto Scaling allows you to scale AWS capacity up or down automatically, ensuring that the number of instances increases during demand and decreases during lulls to minimize costs.

ZaaS leverages Amazon’s ASG (Auto Scaling Groups) feature set in three different ways to provide scalability and functionality to our architecture. Only one of these is in the traditional ‘auto scale’

sense of the technology, however the product provides a few things that can be leveraged for individual instance stability as well as code scalability.

The most common use of ASGs is to provide a scalable, load responsive group of common machines that either perform the same action, or are members of a cluster. We take advantage of this capability in our RabbitMQ message bus and ElasticSearch  log analysis clusters, which run on ASGs that respond to CloudWatch metrics based on CPU usage, as well as disk and network I/O. When there is a flood in the message queuing, a new RabbitMQ node is launched and, with the help of Chef magic, it joins the cluster and has HA-all mirroring enabled on the queues. It is not only easy to run clusters, but also to maintain them.  You can simulate a failure event to deprecate old versions and replace them with new ones.  For example, our ElasticSearch cluster is upgraded by simply upgrading the Chef Cookbook and then marking the older versions as ‘Status Check Failed.’  This allows for the new versions to gracefully replace them with additional Chef magic rebalancing the cluster and setting appropriate sharding in the background.  This ensures that ZaaS has adequate resources to meet customer demand in a responsive manner.

In addition, we rely on ASGs to build in ZaaS code scalability.  Even if you do not need elasticity in your infrastructure, it is good to use ASGs to launch any Amazon Machine Image (AMI) that needs multiples of the same machine running. For example, we use ZaaS itself to manage ZaaS’ internal infrastructure supporting client sites, and have our own instance per region.  ASGs enable you to define a singular architecture within CloudFormation templates, but not need to duplicate code in order to launch 4 or even 100 servers of the same type. We use this for relatively static services within our internal systems such as master-slave pairs or workers or servers that have predictable loads.

We also use ASGs at an individual instance level.  For this, we use ASG settings of 1:2:1, where Minimum=1, Maximum=2, and Desired=1. These settings ensure that you always have a single instance of that particular AMI running at any given time. This is useful for NAT, Proxy, VPN and other services that don’t require immediate elasticity, but still need the reliability of a cluster style setup. Setting desired to 1 and maximum of 2 allows you to issue updates to your AMI and have them roll out to your existing architecture. A max of 1 would not allow the new instance to be launched.  With a max of 2, the new updates are rolled out and the existing instance is terminated. If there are ever 0 instances, a new one will be brought up, guarantying that the customer’s instance of the ZaaS service never goesdown.

ASGs are just another tool in the AWS arsenal that allows us to deliver the seamless support customers need to provide IT service assurance for their own applications and services. For more about ZaaS, see our previous blogs on:

Insights from Developing and Managing ZaaS

Using AWS CloudFormation