How to remove the tar.gz files of the exported courses from the server?

My name is Gonzalo Zupo and I have a question about Open edX.

The context is as follows:

Currently, I am working with an installation of Open edX in its Ginkgo.2 version, which was deployed in mid-2018.

Lately, our clients are constantly exporting their courses and we noticed that the memory of the server where Open edX is implemented has begun to run out.

We realized that, in the system, the tar.gz files corresponding to the exported courses remain stored.

In the path /edx/var/edxapp/media/user_tasks/, there is a folder for each year in which a course has been exported, having paths similar to the following:

  • /edx/var/edxapp/media/user_tasks/2018
  • /edx/var/edxapp/media/user_tasks/2019
  • /edx/var/edxapp/media/user_tasks/2020
  • /edx/var/edxapp/media/user_tasks/2021
  • /edx/var/edxapp/media/user_tasks/2022

In each of these routes there are more folders, one for each month where courses have been exported and, within these folders, there are more subfolders corresponding to the specific dates where the courses were exported. There are full paths similar to the following:

  • /edx/var/edxapp/media/user_tasks/2022/06/17

This route would correspond to the folder where the courses that have been exported on June 17 of this year are stored.

The problem is the following:

All these compressed files, which are from exported courses, are consuming a lot of storage memory (approximately 39GB), so we want to remove them from the system so that the occupied memory is released.

We tried to delete these files but, oddly enough, they always “come back” after a few seconds after being deleted, we have tried deleting these files through the root user, changing the owner, group and permissions but nothing works.

Deleting by folder does not work either, since it has the same result, the folder is deleted but, after a few seconds, the folder returns along with its files intact.

Faced with this problem, we located 2 tables, in the relational database, in which there is a record of the generated tar.gz files and the process of exporting and importing courses in Studio. These tables are:

  • user_tasks_usertaskartifact
  • user_tasks_usertaskstatus

We delete records of some specific tar.gz files and then delete said file in the corresponding path within the server. This was done in the hope that, without the database record, deleting the file from the server would not “come back” after being deleted. But, the result was the same as without deleting the record from the database, the files “come back” after a few seconds of being deleted.

The operating system where open edx is implemented is Ubuntu 16.04.4 LTS and it is a devsatck installation.

So the query is:

  • What could be causing this problem with the files?

  • Can these files be removed from the system or is it not possible?

Thanks.