A proposal for BTR: Tag point releases whenever they're needed

In particular, why not tag release/ulmo.2 right now?

Historically, Open edX point releases go out on a 2-month cadence, something like this:

  • Dec 9: release/zebrawood.1
  • Feb 9: release/zebrawood.2
  • Apr 9: release/zebrawood.3

and when the .1 release is delayed, the point releases are delayed too. So in the case of Ulmo, we’d have:

  • Jan 18: release/ulmo.1
  • Mar 18: release/ulmo.2
  • May 18: release/ulmo.3

But here’s the thing, there are three important fixes sitting on the tip of the release/ulmo right now: a critical bugfix for the Catalog MFE, a Django security fix, and critical fix to the openedx image build. We need to get these into a Tutor patch release ASAP, so that Tutor v20 (Ulmo) users can have a secure and functioning platform. But, due to the delay of the ulmo.1 release, we’re still over a month from the planned ulmo.2 tagging date.

When this has happened in past releases, the strategy has been to cherry-pick the critical fixes into the Tutor Dockerfile, and then do a Tutor patch release. For example, in Quince, Tutor had to do a v17.0.5 release to cherry-pick in a privilege escalation fix which merged into the quince branch but missed the quince.3 tag. Yes, we could do this now for the three Ulmo fixes I listed (and that’s what I’ll do tomorrow if this proposal is rejected :).

But, it seems backwards to me that we’re relying on Tutor to cherry-pick in critical fixes rather than doing our own Open edX patch releases. Plus, as a Tutor maintainer, I can vouch that these Dockerfile cherry-picks are a pain in the neck to keep track of, especially across the main and release branches :face_with_tongue:

My immediate proposal for BTR

Release ulmo.2 now so that we can push these importantfixes out without modifying Tutor’s Dockerfile. We’ll then cut a Tutor patch release which simply updates OPENEDX_COMMON_VERSION from ulmo.1 to ulmo.2.

My long-term proposal for BTR

:stop_sign: Stop: tagging .1 , .2, .3 on a strict schedule.
:stop_sign: Stop: cherry picking into Tutor’s Dockerfile

:sparkles: Start: the following process…

  • .1 is tagged according to the schedule
  • .2, .3, .4, etc. can be tagged when either of the following are true:
    • the release manager deems it appropriate to push out fix(es)
    • approximately 2 months have gone by since the last point release.
  • The final point release goes concurrently with .1 of the next named release, thus ensuring that there are no dangling “unreleased” commits one a named release at the point where it goes out of support.
  • Every named release will have at least four point release (.1, .2, .3, .4), but if may have many more if necessary.

:eye: Example:

  • yucca.1 goes out on Jun 9 according to the schedule.
  • yucca.2 goes out on Jun 19 because a critical bug is found and fixed
  • yucca.3 goes out on Aug 19 because 2 months have passed
  • yucca.4 goes out on Aug 30 to fix a django security bug
  • yucca.5 goes out on Aug 31 to fix another security issue
  • yucca.6 goes out on Oct 31 because 2 months have passed
  • yucca.7goes out on Dec 9, which is when zebrawood.1 goes out. Yucca is now unsupported.

:plus: Benefits: Fixes are officially released and blessed by the Open edX project rather than relying on Tutor to decide what’s critical enough to patch. Tutor gets simpler–no more patch conflicts between the main and release braches. Critical fixes no longer get “lost” just because they merged after .3 but before the next release was tagged.

:minus: Drawbacks: The release manager will need to run the tagging script more often. They need to tag every released repo to do a point release, not just the affected ones. This will be some added work, but I’m happy to help improve the automation if this becomes a point of friction.

Thoughts?

6 Likes

Not that I necessarily know all the inner workings and what goes into doing all this, but in my opinion this makes a lot of sense, delaying releases just to fit a calendar schedule somewhat undermines the importance of the fixes/updates waiting to be added

Strong +1 from the Tutor maintainer perspective.

This proposal addresses a real pain point we’ve been dealing with. Cherry-picking fixes into Tutor’s Dockerfile across main and release branches is error-prone and creates unnecessary maintenance burden. Having official point releases cut when needed (especially for security fixes) would be much cleaner.

Those three fixes (Catalog MFE bug, Django security patch, and image build fix) are critical enough to warrant an immediate release rather than waiting until mid-March.

The only consideration: we should document the new cadence clearly so operators understand point releases may come more frequently than every 2 months.

1 Like

Thanks for the comments so far.

As an aside, I’ve also opened a Tutor PR to cherry-pick the ulmo.1 fixes into the Dockerfile (the old way), just so they aren’t blocked by us waiting us to reach consensus on when to release ulmo.2

Overall this proposal sounds great!

There is one scenario I want to think through though.

Scenario: Non-critical fix in review, critical fix lands.

For this example let’s assume the last point release was 1 month ago, on Jan 10.

Current (fully scheduled)

  • Feb 5: Non critical fix PR opened
  • Feb 10: Critical fix lands
    • Tutor cherry-pick hotfix lands shortly after
  • Feb 15: Non-critical fix lands
  • Mar 10: New point release goes out, includes both fixes

Proposed (no delays for critical fixes)

  • Feb 5: Non critical fix PR opened
  • Feb 10: Critical fix lands
  • Feb 13: New point release goes out
  • Feb 15: Non-critical fix lands
  • Apr 13: New point release goes out

In this scenario the release of the non-critical fix (which may still be quite important for some site operators) would be delayed by over a month compared to the current schedule.

I don’t have a strong opinion on how to best handle this scenario, but one possible option that comes to mind would be to shorten the post-critical-point-release window from 2 months to 1 month.

Updated yucca example:

  • yucca.1 goes out on Jun 9 according to the schedule.
  • yucca.2 goes out on Jun 19 because a critical bug is found and fixed
  • yucca.3 goes out on Jul 19 because 1 month has passed since the previous critical-fix release
  • yucca.4 goes out on Aug 29 to fix a django security bug
  • yucca.5 goes out on Aug 30 to fix another security issue
  • yucca.6 goes out on Sep 30 because 1 month has passed since the previous critical-fix release
  • yucca.7 goes out on Nov 30 because 2 months have passed
  • yucca.8goes out on Dec 9, which is when zebrawood.1 goes out. Yucca is now unsupported.

Not a maintainer here, but may I suggest adopting some kind of semantic versioning (https://semver.org/) ?

ulmo/verawood/yucca are the main version. Breaking changes

.1, .2, .3, .4, are feature additions

.X.1, .X.2, .X.3 are bug fixes

That way,

ulmo.2 can remain in the fixed calendar

ulmo.1.1 can be released now as a bug fix of ulmo.1.

In any case, I strongly support releasing a new version of tutor ASAP to include those critical bug fixes.

I’m generally in favor of this proposal, but I want to focus on this:

I don’t think this is a drawback. The current release script is in dire need of a refactor, primarily to make it actually automatic: faster and less prone to failure. It should be part of a Github workflow so that the release manager need only press a button. They shouldn’t even need to keep tabs on the log, and instead be notified only in case of failure.

The flip side: we need to make sure new repositories follow the rules, so the script doesn’t fail due to permissions issues. This accounts for 90% of the failures we run into, time and again.

I’m in favor of the proposal. Critical platform bug fixes shouldn’t rely on Tutor cherry-picking.

@mboisson: I get the appeal of semantic release, but I’d actually push for keeping incremental feature updates out of the point releases altogether (i.e. only have bug fixes). Even the scheduled point releases don’t get nearly the kind of testing scrutiny that they should for feature additions.

I 100% agree with both, additional tagging when needed, and stop using the git patches in the dockerfile. The patches approach seems like it was from a time where Tutor wasn’t official, so landing upstream changes was harder.

In this scenario the release of the non-critical fix (which may still be quite important for some site operators) would be delayed by over a month compared to the current schedule.

I think we can keep the scheduled releases the same and add the additional releases in-between. On March 18 we simply tagrelease/ulmo.3instead ofrelease/ulmo.2. Worst case scenario, ulmo.3 and ulmo.2 point to the same commit, which wouldn’t be the first time it happened.