AWS Glacier is a secure, durable, and extremely low-cost Amazon S3 cloud storage class for data archiving and long-term backup. It is designed to deliver 99.999999999% durability and provide comprehensive security and compliance capabilities that can help meet even the most stringent regulatory requirements.
Customers can store data for as little as $1 per terabyte per month, a significant savings compared to on-premises solutions. What you wouldn’t expect from a simple service like long term storage is its surprisingly complex pricing scheme which consists of storage pricing, retrieval pricing, retrieval request pricing, upload request pricing, expedited request pricing and many more that are fully listed here. Out of all the pricing items, the one which most often catches a customer off guard is the seemingly low cost of the upload request pricing, which will be discussed in the customer use case below.
Customer issue
As with every AWS service, following best practices and reading through the FAQ is always necessary before implementing. Failure to do so will result in suboptimal performance, or slightly to massively increased billing. In this case the customer created a lifecycle policy on a bucket which moves files to IA after 30 days and glacier after 60 days. A completely reasonable approach considering the bucket contains an archive of ingested files and is unlikely to need retrieval after those 60 days. What they failed to account for is the request pricing and a certain entry in the FAQ:
If you ingest 350,000 files daily, and do not aggregate them, at the end of the month in addition to the storage costs you will pay for 10,248,602 requests, and your bill will look like this:
Now let’s consider the math behind the process, and whether the client actually saved any money with the move to glacier. We shall keep it simple and look at storage costs and requests costs. As you can see above, the client’s requests cost is $514.23, so the storage savings need to be at least that to break even. Calculating the difference per GB for IA storage and glacier we come up to the following:
Infrequent Access Tier, All Storage / Month $0.0125 per GB
S3 Glacier, All Storage / Month $0.004 per GB
We can see that IA storage is 3,125 times more expensive then Glacier so the breakeven point would be roughly 60TB for which the IA storage cost would be $750 and glacier storage cost $240. If we consider the client actually only stored 1TB of data (file size around 100 KB), we can see that the cost savings of storage is $8 with the request cost being $514.23 totalling a net loss of $506.32 monthly!!!
Another example is a $12,000 bill incurred by one of our clients after they moved their entire streaming data history to glacier. The files were on average 4.5KB each and again totalled around 1TB in size. With 240 million requests and the monthly cost savings of $8, it would take the client exactly 125 years for their move to glacier to pay itself off.
The solution
he solution, as it is in most cases with AWS, is to follow the best practices and aggregate and compress your files before storing them in Glacier. For the 2nd scenario, the solution is to simply download all the files to an ec2 instance manually, zip them all up, and send them to a glacier vault. Of course, one must not forget that you still must pay for the GET request ($0.0004 per 1000 requests) which would come out to $96, but still significantly cheaper than $12,000. The first scenario, while slightly more complicated, is still very simple to resolve. In our case we decided to use the spare computing resources of one of our EC2 instances and run a bash script which would run once a week, download the files, compress and archive them, and send them to a glacier vault.
#!/bin/sh
# This script expects a date as the first argument, format YYYYMMDD e.g. 20200120 # the second argument expected is the number of days that will be subtracted from # the provided date to create a date range.
# EXAMPLE: sh ./s3-to-glacier.sh 20200104 3 # The above command will download 20200104, 20200103, 20200102, 20200101
DATE=$1 DAYS=$2
SOURCE_BUCKET=”” VAULT_NAME=””
for i in $(seq 0 $DAYS) do CURRENT_DATE=$(date -d “$DATE-$i days” +’%Y%m%d’) BACKUP_END=$(date -d “$DATE -$DAYS days” +’%Y%m%d’) echo “==========Downloading s3://$SOURCE_BUCKET/$CURRENT_DATE/==========” aws s3 cp –recursive “s3://$SOURCE_BUCKET/$CURRENT_DATE/” “./backup_${BACKUP_END}_${DATE}/$CURRENT_DATE” done
zip -r “./backup_${BACKUP_END}_${DATE}.zip” ./backup_${BACKUP_END}_${DATE}
aws glacier upload-archive –account-id – –vault-name $VAULT_NAME –body “./backup_${BACKUP_END}_${DATE}.zip”
rm “./backup_${BACKUP_END}${DATE}.zip” rm -r “./backup${BACKUP_END}_${DATE}”
Conclusion
While deceivingly simple, every AWS service and especially its billing needs to be thoroughly examined. Failure to do so will result in unexpected, sometimes devastating billing charges which can be avoided through experience and diligence in research. These two scenarios demonstrated how a seemingly simple task, and a correct business decision of archiving old and very infrequently used files to glacier, turned into a devastating blow to the billing charges due to a lack of understanding of the pricing model of an AWS service.