AWS: Consolidated Billing, Reserved Instances and Availability Zones
If you're tasked with managing a large installation on AWS, and you expect to use consolidated billing, then your life will be getting more complicated than you might expect. In fact, if you're already managing such an installation, then your life might already be more complicated than you previously thought it was. Confused? So was I.
AWS gives us two highly useful tools for managing large installations: Reserved instances and consolidated billing.
Reserved Instances
By purchasing a Reserved Instance (from now on, referred to as RIs) for one or three years for a specific instance type in a specific Availability Zone (AZ) name (more on that later), you can reduce your amortized hourly rate by between 39% and 73%. That's some massive savings.
Consolidated Billing
Consolidated Billing allows you to roll up the bills for multiple accounts into a single payer account. This not only makes the lives of your accounting staff easier, but it also allows you to benefit from some of the bulk discounts offered by AWS.
If you're managing a large installation and your not using RIs, you're definitely doing it wrong. If you're not using Consolidated Billing, then you're only likely doing it wrong.
Devils, Details
However, all is not roses. There are three details that work together to make your life complicated.
Obfuscation
AZs are named differently between accounts. This is likely done to avoid overloading a single AZ as everyone stampedes to provision their instances in the first choice.
This also applies to the consolidated billing payer and sub-accounts.
Reserved Instance Placement
I mentioned earlier that RIs are tied to AZ name, not the actual building. This is conceptually silly, but likely just a consequence of needing to identify the AZ while obfuscating the actual location.
Dead Availability Zone
One of the AZs in US East (the most popular region, by far) is effectively out of commission. Dead. Kaput. You can see this by calling the describe AZs api endpoint, which will return only four out of the five AZs you'd expect to see. Any attempts to provision in that AZ will consistently fail.
Of course, this AZ is named differently for everyone.
Complications
Because of all of this, reserved instances purchased in the consolidated billing account aren't nearly as useful as you might have expected.
While you do gain the reduced price, these instances aren't actually "reserved". There isn't a dormant VM sitting idle, waiting for you to provision it. This is due to the affinity of the RI to the AZ name, and means you'll still get the normal number of out of capacity errors.
You also can't hard-code AZ names in your deploy scripts if those scripts are shared across sub-accounts – for one out of five of those accounts, the named AZ will be the dead one. You must use the describe AZs api endpoint in your scripts to determine AZ names dynamically.
Finally, you lose some granularity in targeting your reserved instances. Let's say that every sub-account will provision a cr1.8xlarge
in some AZ (remember, you can't force them all to use the same AZ name). You now have to keep track of the number of these very expensive instances for each AZ and buy RIs accordingly. If that number drops for an AZ, then you're out of luck – you've already agreed to pay for the RI for the year (or three).
What you can do about it
You can always buy reserved instances in each sub-account. This means more administrative juggling, but gives you actual reservation of the instances and maintains the economic benefits. This may not be easy or even possible if you don't control the life-cycle of your sub-accounts.
If you run a lot of sub-accounts that are roughly identical, then the law of scale may save you. The bigger you get, the more vague you can be about the word "roughly".
You also have a couple of options if you need to rebalance your RIs: You can trade them on the open marketplace, or you can migrate them to another AZ.