How we analysed our increased AWS data transfer usage
While migrating a number of our sites from our EC2-Classic infrastructure to our new ECS (Elastic Container Service) infrastructure we noticed a large increase in outbound data transfer and costs.
We assumed it was due to something we missed in the migration so we rolled a number of larger of sites back to see if it would lower the data transfer usage, but it didn’t. Data transfer usage was still high.
AWS Billing console does a great job at breaking down costs per service, but not for data transfer usage. AWS Billing only shows the overall data transfer to internet usage. It doesn’t show which specific service transferred the data.
Using AWS’ Cost Explorer tool we were able to narrow down the specific service with the largest increase in data transfer after the migration.
We created a new report in Cost Explorer, filtering all data transfer Usage Type Groups.
It was S3. More S3 data was transferring out to the internet when we began migrating our sites to ECS on 20th October. Was this a coincidence or did we completely miss something during the migration?
We had to find out which bucket was transferring all this data. This was easy as we only have one main public bucket where we store our media. We enabled data transfer metrics on the S3 bucket, but this only shows us the total data transfer from the S3 bucket, including data transfer to Cloudfront, EC2 and out to internet. We wanted to narrow it down to data transfer going out to the internet as this was where we were charged.
To get greater insight of the data going out to the internet we enabled S3 Server Access Logging on the bucket and transferred the logs to an ELK (Elasticsearch, Logstash, Kibana) stack monitoring service called Logz.io.
With S3 access logs now going to Logz.io we were able to see the top objects, IP addresses and user agents.
Most of the requests were coming from IP addresses belonging to our EC2 instances. This was fine as our S3 bucket and EC2 instances are in the same region. Data transfer between EC2 instances and S3 buckets in the same region is free.
What stood out was a large number of requests from Yandex bot. This particular bot was crawling our videos aggressively, resulting in a large amount of data being transferred. We already had crawl delays on our robots.txt for Yandex, but this was not helping.
We ended up updating our S3 bucket policy to deny access to any Yandex user agent to our video objects and our outbound data transfer immediately dropped back to its previous levels.
Yandex just so happened to start aggressively crawling our videos at the same time we started migrating our sites to ECS.
AWS data transfer costs are quite high and can easily get out of control. We were lucky to have picked this up early as this could have gone unnoticed raking up a large cost.
This post was not sponsored by any services mentioned within the post.