Devstack creation time reduction

Hi, everyone!

I worked on the Open edX platform recently and I find cumbersome that it takes so long to create a new devstack. I’d like to discuss possible improvements with you.

First, it would be nice to have an out-of-the-box method to create a minimum devstack with just a few components (mostly LMS and CMS), which would reduce devstack creation time significantly. I have created Jinja2 templates out of the docker-compose*.yml files with Open edX component-specific code enclosed in conditional blocks and a config file with ACTIVATE_* variables in which the user selects the desired components, similar to what’s done here and here in Tutor. I have also created a Python script which renders the docker-compose* files during a devstack provisioning . Doing so, I’m able to create a devstack in half the time. Would the community be interested in this? I could submit it upstream.

Another point: I see we use database dumps in the devstack. Couldn’t the several Django migrations currently run on every devstack creation be run once beforehand in these database dumps? With these updated database dumps, we wouldn’t have to run the migrations on every devstack creation.

Thank you in advance for your attention!

Hi Alan,

I 100% agree that provisioning Devstack takes way too long, and it’s something I’ve been working on addressing as a side project as well. I’m with you that, by default, Devstack should be a core set of services (probably LMS, CMS, Discovery, Forums, and MFE support), with all other services being opt-in.

While I appreciate the ingenuity in the linked templated docker-compose files, I’m reluctant about the level of complexity they introduce. A lesson edX learned from our configuration repository is that templated configuration files can get very difficult to reason about and maintain. And, even with those opinions aside, Devstack already has like three different ways of separating services, and I think we ought to simplify that down to one way before adding a new one.

Docker Compose on its own already provides a lot of the functionality one needs to use only a subset of services defined in docker-compose.yml, and we’ve been building off of that recently. For example, this should j only pull images for, provision, and start services that are required by LMS/CMS:

make dev.pull.lms
make dev.provision.services.lms+discovery+forum
make dev.up.lms dev.up.studio

Now that’s not a perfect interface, and the README doesn’t make it obvious that you can do this. But by composing these commands into more user-friendly ones and fleshing out the documentation, I think we can move much close towards the goal of “out-of-the-box method to create a minimum devstack”.

If you’re up for some code walking, we use those type of modular make commands to shard out CI tests by service in the .travis.yml file. I have a PR open to do more work in that direction.

If you’d like to help move Devstack towards being more focused and modular, feel free to open a PR and I can try to take a look. And if you have interest, I’ve been looking for a review on the my PR as well!

Best,
Kyle

1 Like

And regarding the DB dumps question – That would be a fantastic improvement! I would gladly review a PR on that.

I imagine this would involve a Jenkins job. I can provide more resources for that if anyone would find it helpful.

Thanks for the feedback!

Regardless of what the default setup should be, it’d be useful to be possible to create a minimal devstack; AFAIK this is currently comprised of only LMS, CMS and the mongo/mysql databases. Certain small features can be developed/maintained on top of a minimal devstack with only these components. For instance, updating a translated string or adding a field to the login screen.

I had looked into the devstack creation code before creating the template-based one and seen this feature, but, as it is implemented right now, it also installs the docker-compose services dependencies even if I haven’t asked for them. For instance, in the case of LMS, memcached,discovery,firefox and chrome are installed.

Looking at the Docker documentation more closely, I noticed that “docker-compose up” has a “–no-deps” parameter which we could use to ignore the dependencies block and install only the components explicitly requested in make dev.provision.services.[custom_component_list]". I have tested it in my workstation and it worked. What do you think? We’d just need to decide how this would be enabled (e.g. an env variable).

Of course, this assumes that the developer knows which Open edX components are the bare minimum in a devstack. It’d be nice to add an optional “minimal” option somewhere which would decide that for him.

Another option would be to reduce the dependencies lists in the docker-compose*.yml files to the bare minimum the component needs to run and decide what other components to install in the code, depending on the input options. However, I’m not sure we want the bare minimum installation as the default. It seems to me that not.

As soon as we agree on how to make the devstack creation more modular, I can send a PR. I’ll also take a look in your PR as soon as I have some free time.

1 Like

I thought of just checking out the master branch, creating a new devstack (which automatically applies the migrations to the MySQL dbs), generate MySQL db creation scripts from the edxapp and edxapp_csmh databases in the mysql Docker container, update the corresponding .sql files in devstack git repo’s root directory and send a PR. Isn’t that enough? What do we need a Jenkins job for?

Your first response has some good points and ideas. Let me circle back to it next week!

Oh yeah, something along those lines would work for a one-time update. I misunderstood your question as “can we automate these dumps?” which would be cool but probably involve a Jenkins job and more work. But doing a one-time update and documentating how it was done would definitely be helpful.

@Cory_Lee : If there were PR to update the devstack database dump, would you be mind reviewing it? Asking because it looks like you handled the last one.

Hi, Kyle! Any updates on this?

Agreed. If we advertised well it in the README, I’m thinking that this functionality could be built off of the DEFAULT_SERVICES option I’m adding.

LMS does have quite a lot of dependencies. Currently:

  • mysql: Necessary.

  • mongo: Necessary.

  • memcached: This should be optional, but I wouldn’t be surprised if somethings in LMS require memcached. Might be worth looking into.

  • devpi: This feels optional. AFAIK, it’s there to speed up requirements upgrades and installs. I wonder how we could toggle whether LMS brings this up automatically without templating docker-compose.yml.

  • forum: Obviously, this only matters if one needs the forum. Again, I wonder how we could make this dependency optional.

  • discovery: This PR by @cpennington added this dependency. I wonder if it is still necessary.

  • firefox: Only necessary for bok_choy tests. This ADR describes our current experiment of disabling most Bok Choy tests. If the experiment goes well, I’d support removing this dependency.

  • chrome: Same as Firefox.

So, of the 8 deps, we only really need ~4 of those.

Good to know about docker-compose up --no-deps. Similar to the new make dev.pull.without-deps.$services target, a make dev.up.without-deps.$services target would make a lot of sense.

I’m confused by what you mean by “install only the components explicitly requested in make dev.provision.services.[custom_component_list]”. Can you elaborate?

I agree. At its simplest, that could be something like:

# Services that are pulled, provisioned, run, and checked by default.
# Separated by plus-signs.
# By default, we enable:
#         lms+studio+discovery+forum+ecommerce+credentials
# which will enable the majority of Open edX features.
# A more minimal value for DEFAULT_SERVICES, which will still enable
# a large set of Open edX's functionality, can be achieved with:
#         lms+studio
DEFAULT_SERVICES ?= lms+studio+discovery+forum+ecommerce+credentials

coupled with exposition in the README.

A more complex version could be a MINIMAL_DEVSTACK flag that, when set, programmatically sets the value of DEFAULT_SERVICES. That could get complicated if implemented haphazardly.

“Bare minimum” gets subjective, and dependent on how the service is configured (for example, every service depends on LMS, unless it’s configured not to redirect to the LMS for auth, but you probably wouldn’t want to configure it that way for devstack, etc.). Regardless, if we notice superfluous dependency relationships, it’d be good to sever them.

Looking further into the future, it’d be slick if services could be easily run and debugged from within their own repository without needing to spin up any sort of multi-service devstack. Then, this devstack only becomes necessary when testing service-to-service interactions.

  • devpi: This feels optional. AFAIK, it’s there to speed up requirements upgrades and installs. I wonder how we could toggle whether LMS brings this up automatically without templating docker-compose.yml.
  • forum: Obviously, this only matters if one needs the forum. Again, I wonder how we could make this dependency optional.

Using the --no-deps parameter in docker-compose. Or do you mean how to make this dependency optional to the user in the command line interface?

So, of the 8 deps, we only really need ~4 of those.

Which 4 deps do we need?

I think it’s useful to differentiate the use cases here to make sure we are aligned. One use case is automating a minimal OpenEDX installation with essential features (eg LMS with login, create/attend basic courses). I’m sure it would be very useful to some developers which work constantly on basic features, as I have mentioned before. A second one is automating a minimal OpenEDX installation in which all features work. For lack of better terminology, I’ll call the first one “bare minimum” and the second one “minimum”.

First, both use cases would be supported if there was an devstack creation option which would ignore all dependencies set in the docker compose yml files and the user had to provide all the components he/she wants, eg “make dev.provision.services.lms+mysql+mongo”. That gives the user total flexibility, but the downside is that he/she has to know the dependencies and list them on every devstack provision. For convenience, we could add options which would not require the user to know the dependencies: bare_minimum, minimum, full.

I remind you that the docker-compose dependencies are only a convenience tool and we don’t really need it to implement any of these scenarios. For instance, we could:

  • keep the minimum dependencies we have today in docker-compose files, add a list of the bare minimum dependencies to the provisioning shell script and use this list with --no-deps param when a bare minimum installation is requested
  • keep only the bare minimum dependencies in docker-compose yaml files, add a list of the minimum dependences to the provisioning shell script and use it when a minimum installation is requested

The result is the same.

I’m confused by what you mean by “install only the components explicitly requested in make dev.provision.services.[custom_component_list]”. Can you elaborate?

As I explained above, the user could have the freedom to install only the components he wants. The provisioning code would not install the docker-compose services dependencies currently listed in the docker-compose yaml files. Example:

make dev.provision.services.lms

would not install mysql, mongo, devpi, discovery, chrome, firefox, etc. This would the user to simply run:

make dev.provision.services.lms+mongo+mysql

if he/she wants a bare minimum installation

it’d be slick if services could be easily run and debugged from within their own repository without needing to spin up any sort of multi-service devstack.

It’d be ideal, but very tricky to implement due to databases dependencies. I suggest we focus on the more attainable goals we are discussing above.

I’m hoping to find ways to trim down dependencies without adding additional steps for a developer to get a reasonably-featured Open edX instance (whether or not that should includes Forum and Discovery is outside of my expertise).

For every call of “Devstack should be more lightweight!”, there’s another call of “Devstack should just work out-of-the-box more often!” While they’re both valid concerns, sometimes they are at odds with one another. Removing the dependency relationships altogether seems to be a step against working outside-of-the-box.

That being said, I’m coming around on a config option such as NO_SERVICE_DEPENDENCIES that defaults to false, which, when enabled, would trigger using --no-deps on everything.

MySQL and Mongo for sure. Firefox and Chrome will likely be removed as deps. Discovery, Forum, DevPI, and Memcached are all in a gray area. So, four dependencies plus or minus two.

What I’m not clear on is the meaning of install. In docker-compose, there is no notion of “install”. Even if we only provision the database for, say, LMS, docker-compose up lms will still bring up (unprovisioned) instances of Discovery and Forum. Conversely, even if you provisioned Discovery and Forum, docker-compose up --no-deps lms would only bring up LMS.

Of course, we could roll our own notion of “install” by tracking whether a service has been “installed” in a YAML file or something, and then reading that “installation state” file in all the Make commands. Is that what you’re thinking? Or am I missing a piece?

FWIW, I am remembering that Docker Compose supports injection of environment variables into the compose files, which could allow us to make the dependency relationships in the Docker Compose files configurable. So, something like this is totally possible:

lms:
    depends_on: [ mysql, mongo, ${LMS_EXTRA_DEPENDENCIES} ]
    ...

where LMS_EXTRA_DEPENDENCIES could be:

  • passed in through the Make targets based on values in options.mk
  • defined explicitly in a .env or .env.overrides file.

I think this might give us the flexibility that the Jinja2 templates you originally linked had, without adding Jinja2 to Devstack. What do you think?

That makes sense, but we can implement what I suggested without any changes to the current command line user interface.

Current CLI examples which would continue working as they do today:

  • make dev.provision (creates a full devstack)
  • make dev.provision.services.lms (creates a devstack with LMS and all its dependencies currently set in docker-compose yaml files)

New command:

  • export INSTALL_DOCKER_COMPOSE_DEPENDENCIES=false; make dev.provision.services.lms+mysql+mongo (creates a devstack with a bare minimum LMS)

An alternative interface to the last command could be:

  • export INSTALL_ONLY_MINIMUM_DOCKER_COMPOSE_DEPENDENCIES=true; make dev.provision.services.lms (again, creates a devstack with a bare minimum LMS, but the implementation uses a hard-coded list of docker-compose dependencies instead of requiring that the user provides them)

I have not suggested removing these dependency relationships altogether, but rather adding a way so that the user could pick which components he/she wants. I don’t understand the problem in doing that.

One of the possible implementations which could be built on top of that would be to move the current list of non-essential dependencies from docker-compose yaml files to the provisioning shell script code (that list would still exist, just somewhere else). In that hypothetical implementation, the docker-compose yaml files would keep only the bare minimum dependencies. Again, all existing devstack creation features would be preserved.

That being said, I’m coming around on a config option such as NO_SERVICE_DEPENDENCIES that defaults to false , which, when enabled, would trigger using --no-deps on everything.

That’s exactly my initial proposal.

What I’m not clear on is the meaning of install . In docker-compose, there is no notion of “install”. Even if we only provision the database for, say, LMS, docker-compose up lms will still bring up (unprovisioned) instances of Discovery and Forum. Conversely, even if you provisioned Discovery and Forum, docker-compose up --no-deps lms would only bring up LMS.

I mentioned the word “install” from a user perspective, as I was talking about use cases. Regarding implementation, again I see two possible paths:

  • only support 1) the current devstack creation which installs all Docker Compose dependencies and 2) a custom installation in which the Docker Compose deps are totally ignored

  • for convenience, support a 3rd option with bare minimum dependencies. This would require customizing the components inter-dependencies list somehow instead of having them hard-coded as they are now. This could be done by partially managing the dependencies ourselves, templatizing the docker-compose yaml files or using env vars in docker-compose yaml files as you suggested.

Anyway, this 3rd option only adds convenience and the feature we should focus on right away IMHO is providing some way to create a devstack with a custom list of components, which would automatically allow the user to create a bare minimum devstack. Currently, the only way to create such a devstack is commenting out a lot of lines in docker-compose yaml files, which is very cumbersome.

The second bullet point isn’t the case. make dev.provision.services.lms will only provision LMS. Provisioning does not include dependencies.

make dev.provision.services.lms achieves this. MySQL and Mongo are not provisioning targets; they can be brought up as-is.

With the merging of my PR that you reviewed, I think your 2nd option can be mostly achieved by making an options.local.mk file and setting DEFAULT_SERVICES to something more minimal, like DEFAULT_SERVICES=lms+mongo+mysql.

So, it seems to me that the only remaining steps to realize option (2) are:

  1. Implement your proposed NO_SERVICE_DEPENDENCIES option, which could essentially alias:
    • make dev.pull.$services -> make dev.pull.without-deps.$services, and
    • make dev.up.$services -> make dev.up.without-deps.$services
  2. Document the options.local.mk and the DEFAULT_SERVICES variable in the README.

Do you agree?