Scaling Open edX on AWS

Hey everyone!

Could you share some info how you’re handling big spikes of traffic on AWS?
There is edX.org Deployment page, but it seems to be from 2016 and there was a post on Slack from @feanil that edX is using m4.xlarge with auto scaling groups, so there were probably many changes since the creation of this document.

Currently we’re using Target Tracking scaling with two conditions:

  • As required to maintain Average CPU Utilization at 40
  • As required to maintain Application Load Balancer Request Count Per Target at 400

We’re also monitoring HealthyHostCount and spawning additional instances if it falls below 2.
However when we have a sudden spike of hundreds of users trying to access the page, the service becomes clogged and it takes around 5 minutes to get newly spawned instances healthy to unload the traffic (even after decreasing the cooldown period). Do you have some hints/recommendations/example numbers for effective scaling basing on your setup?

@feanil, a friendly explicit ping, since you may have some context on this topic.