How to Optimize Course Images with JPEG Compression Saving MongoDB Storage Cost and Course Page Load Times

We wrote this application to go through all of our Open edX courses and optimize course images with JPEG compression, resize larger images down to a smaller width, and delete unused images. Feel free to use this application if you think it will help you out but I would recommend that you run this against a test course prior to see how it works.

After running this application, our production MongoDB openedx fs.chunks collection logical data size went from 202.8 GB down to 85.33 GB after having processed 1993 course runs.

For courses that are live, I would recommend running this application during none peak times because of the course import command of reloading the optimized course to ensure that you don’t disrupt the learning experience. The local ./process-course-ids.txt file can be adjusted to process any number of course_ids.

To help ensure that you don’t use much CPU resources, I would recommend keeping these setting low. The default is two CPU cores running at a time (one core per course). This can be increased if necessary, however, I wouldn’t recommend going below these default settings.

It calls upon the tutor command to export the course from the platform, optimize the exported course images with JPEG compression, then calls upon the tutor command again to import the optimized course back to the platform. The application can process multiple courses at once.

The application performs the following operations. Details about how we optimized images with JPEG compression can be found here.

  • All JPEG and PNG images are converted to compressed JPEG format (e.g. 80% quality, 72 DPI for screens)

  • PNG images are removed due to larger sizes than JPEG images. Images with transparency have their background set to white.

  • Resize images with width > 1400 px down to 1400 px width to match upcoming Open edX platform courseware dimensions.

  • Images with width <= 1400 px width are not resized. This is to account for drag-and-drop background images that needed to stay at 675 px width to avoid having to rework target zone locations for those problems.

Examples of Image Optimizations

Here are a couple of examples showing how the changes in course images after optimization were applied. The example images below were used in our local copy of the course-v1:edX+DemoX+Demo_Course course and are only used to show extreme cases of image optimization and vet this application.

We downloaded images from online image repositories, then made two different resolutions (e.g. > 1400 px width, < 1400 px width) and incorporated them into the following course pages. Original dimensions for these images were added onto the name after the @ symbol, as noted in the file name paths below on the spreadsheet.

The application keeps track of this optimization changes in an Excel spreadsheet called ./logs/image_optimization_stats.{APPLICATION_DATE}.{APPLICATION_TIME}.xlsx.

Large Images > 1400 px Width

We’re just showing the iguana larger picture iguana-8084900@5257x3505.jpg here and how it was reduced in size. The butterfly large image is also on this page but at the bottom. Notice that the iguana large image went from 5257x3505 4.28 MB JPEG to 1400x933 385.37 KB JPEG compressed with an overall savings of 3.91 MB size. The resolution changed to 1400 width to accommodate the upcoming edX courseware dimensions. It takes about 133 ms to download this image using a 153 Mbps download internet connection.

We’re using the browser inspector to show new image dimensions by hovering over the HTML src attribute.

Small Images <= 1400 px Width

We’re just showing the buttery smaller picture png-2678397@1280x851.png here and how it was reduced in size, but the resolution stayed because it was lower than the 1400 width dimension. The iguana small image is also on this page but at the top of the page. Notice that the butterfly small image went from 1280x851 573.02 KB PNG to 1280x851 172.63 KB JPEG compressed with an overall savings of 400.38 KB. The original image had transparency around the outside of the butterfly shape and was filled in with a white background color. Also, the image type went from PNG to JPEG format and the URL path to the course image was also updated to reflect this change. It takes about 92 ms to download this image using a 153 Mbps download internet connection.

We’re using the browser inspector to show new image dimensions by hovering over the HTML src attribute.

Image Optimization Run Log and Course Backups

For every application run, we keep track of the individual course changes using a log file and back this up to S3 for the given run date and time. This allows us to keep track of the changes made and see if there were any errors that were triggered. The example below shows one log file and the changes being tracked.

Course backups before and after optimization for the given application run date and time are also stored on S3. This will allow us to recover the course images if necessary. The example below shows one course backup for a given application run date.

cc @dave @braden @kmccormick @regis

4 Likes

This is amazing!