I ran into several issues after trying to get started with the Tutor AMI. I just want to report on my experience to get feedback about whether this is surprising or not since the documentation seems to suggest that I shouldn’t have had these problems. I’d also like to get some reassurance that things will be stable going forward, or suggestions for how I can either avoid problems in the future or recover from the problems I ran into without having to terminate the EC2 instance and start over again. Based on my experience so far, I’m a little worried about using this in production. Here’s a list of the issues I ran into:
4 GB RAM was not enough
- The build never got past the Django migrations.
- This is the free memory on EC2 after launching tutor:
tutor@ip-172-31-4-25:/root$ free -h
total used free shared buff/cache available
Mem: 7.7Gi 3.6Gi 1.1Gi 2.0Mi 3.0Gi 3.6Gi
Swap: 0B 0B 0B
Questions
- Is it the case that 4 GB is sufficient to run tutor but not enough to build it?
Default EBS volume size of 25 GB was not enough
20 GB storage was used up after the build
tutor@ip-172-31-4-25:/root$ df -H
Filesystem Size Used Avail Use% Mounted on
/dev/root 67G 20G 48G 30% /
devtmpfs 4.2G 0 4.2G 0% /dev
tmpfs 4.2G 0 4.2G 0% /dev/shm
tmpfs 826M 2.0M 824M 1% /run
tmpfs 5.3M 0 5.3M 0% /run/lock
tmpfs 4.2G 0 4.2G 0% /sys/fs/cgroup
/dev/loop0 59M 59M 0 100% /snap/core18/1885
/dev/loop2 75M 75M 0 100% /snap/lxd/16922
/dev/loop3 30M 30M 0 100% /snap/amazon-ssm-agent/2012
/dev/loop4 59M 59M 0 100% /snap/core18/2785
/dev/loop5 123M 123M 0 100% /snap/core/14946
/dev/loop6 67M 67M 0 100% /snap/core20/1891
/dev/loop7 27M 27M 0 100% /snap/amazon-ssm-agent/6563
/dev/loop8 97M 97M 0 100% /snap/lxd/24061
/dev/loop1 67M 67M 0 100% /snap/core20/1950
tmpfs 826M 0 826M 0% /run/user/0
tutor@ip-172-31-4-25:/root$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 43 12 10.13GB 3.716GB (36%)
Containers 22 15 11.14MB 0B (0%)
Local Volumes 6 6 2.444GB 0B (0%)
Build Cache 0 0 0B 0B
- This caused the site to become unavailable from a web browser. I also observed the following errors after accessing the EC2 instance via ssh:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
sudo: unable to resolve host ip-172-31-4-157: Temporary failure in name resolution
Job for docker.service failed because the control process exited with error code.
This was the state after
- Doing an initial build
- Importing the demo course
- Trying to install the indigo theme and rebuild
Solutions
After realizing that my failed deployments were due to inadequate memory and storage requirements, I took the following steps:
- I launched the AMI with a
t3a.large
instance (8 GB RAM) - After that, I resized the EBS volume to 64 GB
- This also required following this guide to extend the filesystem: Extend a Linux file system after resizing a volume - Amazon Elastic Compute Cloud
- I’m not sure if there’s a better way to handle this. When launching the AMI, it doesn’t look like there’s a way to select the volume size. This is unfortunate because it seems like the default volume size is inadequate.
- This also required following this guide to extend the filesystem: Extend a Linux file system after resizing a volume - Amazon Elastic Compute Cloud