Changelogs and reducing developer busywork

Hey Arch-BOM team @ 2U/edX –

(or anyone else who is interested, I’m just tagging them since they own the cookiecutters currently)

I see in edx-cookiecutters that we are now adding a Changelog to every new repository. I don’t see an ADR, but I also don’t need to be convinced that Changelogs are a good idea :slight_smile: That being said, I worry that we are asking developers to write similar things too many times in too many places, in particular:

  1. GitHub releases
  2. Conventional Commits (OEP-51)
  3. PR templates
  4. Changelogs

I know one could argue that each of these serves a different purpose, but do we really need all four? I worry that we have a limited “busywork budget” with developers, and the more repetitive things we ask them to do, the less likely they’re going to do all of them well.

So, if we’re going to start taking Changelogs seriously, what would you think about dropping one or two of our other asks?

  • In the past, we’ve used GitHub Releases both to (a) capture release notes and (b) trigger a publish to PyPI. Since (a) will now be done in a Changelog, how do folks feel about replacing (b) with a GitHub Action that’d automatically push a tag and publish to PyPI whenever the version number changes?
  • I see that the PR templates are barely being used. Just take a look through PRs in different repositories. What do you think about either:
    • deleting the template from the cookiecutter and/or edx-platform, or
    • simplifying the template, particularly so that it doesn’t duplicate any information between conventional commits? For example, the template could just have space for Test Instructions and Review/Merge Notes, and then say “See conventional commit messages for change details”.
4 Likes
  1. I think we’ve had a Changelog in every new repository since we refactored all the cookiecutters into one repo: Adding python library cookiecutter by jinder1s · Pull Request #1 · openedx/edx-cookiecutters · GitHub I think we’ve had discussions about how to turn our conventional commits into Changelog entries, but I don’t think we’ve made strong moves in that direction, yet.
  2. The PR template issue was noticed by us, and there was a proposed fix for it, but it hasn’t merged (yet?): feat: Update PR template to something people might actually use by timmc-edx · Pull Request #243 · openedx/edx-cookiecutters · GitHub
1 Like

As long as your branch contains a single commit when the PR is made, the PR title and body will be pre-filled with the commit message, so that doesn’t feel like busywork to me. I usually copy the commit message from the changelog (or vice versa) when making the branch and then tweak it slightly, which again doesn’t take more than a couple seconds. (Sometimes there’s a markdown vs. rST format change to make, or a piece of information to elide for the changelog or extra context to supply for the commit.)

If I have multiple commits that separate out refactors from fixes/features I usually just give a quick summary in the PR and then say “best reviewed commit-by-commit” and include the important stuff in the commit messages. But this is rare.

The changelog and the Releases definitely have duplication, and are also in different formats so just copy/pasting isn’t enough. But what’s even worse is that there’s redundancy that makes it hard to know which you should look at. I’d be very happy to have us stop using Release bodies, and just rely on them as a release mechanism. We could even forego Releases entirely and use tags, since PyPI workflows can be readily triggered from either one.

TL;DR: I would like to stop filling out Releases (use them only as a tagging UI) or even just switch from Releases to tags.

1 Like

I have thoughts about this:

  1. We don’t need GitHub releases for Python projects certainly. I guess JavaScript projects have a culture of Releases being the goto place for information, but maybe I am wrong about that.
  2. I have a side project for managing changelogs that I started as an exploration of how to help with some of the problems maintaining changelogs: GitHub - nedbat/scriv: Changelog management tool . I would be more than happy to add to it if needed to make it useful in our repos (I think it probably already is).
  3. One of the things scriv can do is populate a GitHub release from the text in a changelog file.
  4. Using commit messages for the PR description would be great if commits were written with long detailed commit messages. I don’t think we are there yet.

Regarding redundancy / non-obviousness of the authoritative source of information between the changelog and Releases - maybe the Release body could just be a link to the appropriate section of the changelog? That would make the information available from both places, make it clear which one is authoritative, and avoid the need to keep two blocks of text synchronized (in case later edits of the text are needed).

1 Like

While I’m for making developer’s lives easier, I think we should be careful here. Sure, let’s make releases zero-input (I like @Tim_McCormack’s idea of either switching to tags or not using release bodies, or alternatively @nedbat’s scriv github-release that creates release bodies from the Changelog), but:

  • Conventional commits are great, because they make commits somewhat parseable. It makes it easy for authoring Changelogs, for example, particularly if you didn’t author the commits yourself.
  • PR templates: as I see it, the purpose of a good PR description is to make it easy for reviewers to discuss and test that PR. If we use templates as a way to communicate this, let’s please not lose that.
  • Changelogs should definitely stay. Their primary benefit being the ability to easily report on the important changes to that repo after the fact (say, for Open edX Release docs).

On PR descriptions in particular, allow me to quibble with this point:

I think a good PR description can also be seen as a proposal, a large chunk of which will not necessarily end up in the commit message. Sure, we have ADRs for that, but some changes fall in between requiring an ADR and being a one-line dependency update. Plus, if you make reviewers have to go into commit messages, that’s one more hurdle to discourage them.

I guess what I’m trying to say with all this is: I think it’s a developer’s responsibility to communicate his purpose with a particular PR as clearly as possible, as opposed to asking the reviewer, Changelog author, or Release Notes author to go digging. This includes jumping through a few hurdles, but they’re all necessary, and are easier traversed by the developer than anybody else.

1 Like

I just want to +1 the feeling of existential dread at having to copy and link information everywhere, particularly in the development of python libraries that are installed as dependencies into edx-platform (we actually need Github releases to publish library versions to PyPI in our current workflow). I rely on commit bodies becoming PR descriptions and using URLs instead of copy/paste where applicable, which saves time, of course. But the sense of writing the same thing in N different places is strong and real. Heavy TPS report vibes. Thank you for raising this issue.

3 Likes

Agreed. I like all of these, and I find it reasonably simple to write each the way I want them, except Release Notes. And for Release Notes, I’ve just been adding a link to the changelog. See Release Replace rest_condition · openedx/edx-drf-extensions · GitHub as an example.

It seems like there’s a consensus in this thread making release bodies zero-input would be a good change :+1: When I find the time I will put forward a more specific proposal to move towards such a workflow for all repositories (of course, anyone else is welcome to beat me to this!)

I have heard it said several places that conventional commits are good because they can be parsed & used in changelogs, but I have also heard that commit messages make poor changelog entries because they’re targeted at developers of the codebase, not consumers of the codebase.

Do folks have thoughts on this? If conventional commits do in fact make good changelogs, then heck, why aren’t we auto-generating changelogs?

I have to politely disagree here. I agree with OEP-51 that a thorough technical description and rationale for any change belongs in the commit message. What’s left over, in my mind, is just reviewer-facing logistics: how do we test this, does it depend on another PR, how soon does this need to merge, etc.

When it comes pasting the commit message into the PR description versus saying “See commit messages”, I figure that can be a matter of the author’s personal taste.

Fully agreed that developers are responsible for communicating their changes!

Still, I feel like we have so many rules about communication. Some are written down formally (OEP-51), others not so much (Changelogs, GitHub releases, PR descriptions, when to use Slack vs Discourse vs GH issues). We expect developers to read our minds about which rules are real, and then be proactive about following those rules.

(I challenge everyone in this thread to be empathetic. Many of us are experienced Open edX developers who enjoy reading & writing prose and whose teams focus on improving the platform as a whole. Our rules need to work for everyone… new developers, people who don’t write as much, folks whose teams’ priorities are feature development or platform customization, and so on.)

I want to make this easier. I want to get rid of the rules that we care less about, and then be explicit about the rules that we really do care about. That’ll make it easier for folks to follow the rules, easier for new folks to feel confident as Open edX developers, and easier for us to educate folks on our best practices.

1 Like

I wonder whether we need GitHub releases at all. At best, they duplicate information that is already in the repo and in the package repository. Can we just decide they are unnecessary?

Even if we keep them somehow, we should trigger publication of a component by creating a git tag, not by creating a GitHub release.

This could be a delicate balance between addressing the developers and addressing the users. I think auto-generated changelogs from commit messages tend to be poor:

  • If all commits are included, then the changelog includes “refactored/style/chore/etc” commits, which is silly. At the very least, parsing commit messages should use the conventional commit type to only include commits that matter to users, and then categorize them by type.
  • The commits are listed in the changelog in chronological order within their types, rather than order of importance.
  • Most importantly, auto-generated changelogs are only as good as their commits. I find an editing pass over the changelog during a release finds obvious improvements.

Also, we are trying to coordinate among contributors and maintainers to get proposals before getting pull requests. Sometimes a change can be made with just a pull request, but often more coordination up-front would produce a better result, and easy the review and merging down the line. It might be that if you feel like you need to write a proposal in the pull request description, then it should have happened earlier, in another channel.

Absolutely.

One caveat to the release note body replacement is that changelogs aren’t (yet) required. I tried creating an OEP for them once upon a time, but I was told the timing wasn’t right given too many other types of changes engineers were dealing with.

In an imagined world, I can see the aggregation between the 4 listed places as follows:

  • 2 commits form 1 PR
  • 2 PRs form 1 changelog entry
  • 2 changelog entries form 1 github release

However, the hierarchical view above becomes the hinderance if I just have 1 commit I want to release out. I think that’s the cause of this topic.
Unfortunately, we do this in an analogous way on edX content construction, which we ask studio author to write titles for each chapter, sequence, vertical and xblocks, even if the course is just 1 xblock big.
So, the solution requires an automated method that aggregates while bubbling up the hierarchy. Github automatically does this with commits → PR description → Github releases. The changelog is the odd item out because that’s an in-file construct.
My opinion is: we should promote the awareness of this set of information. In the github repo readme, the maintainer or the ownership should specify which way they want to maintain the changing set info (i.e. expectations on all 4 areas where devs put change logs). However, I can see the attempt to standardize will require a critical mass of repositories to conform. That requires work we are not prepare to take on.

Adding changelogs to the cookiecutter but not technically requiring them is exactly the sort of “soft ask” I’d like to stop doing. Whether or not there’s an OEP, some developers in new repos will see this Changelog.rst file and wonder whether or not they have to do anything about it. The most conscientious developers will try to do the right thing by taking the extra time to fill out changelog, but if they’re the only ones doing so, that’s totally unfair to them and a waste of their time!

Anyway, it’s a bummer we had table your Changelog OEP @robrap . It did a good job explaining the difference between commit messages and changelog entries, and it hit on some of the same GitHub release stuff we’re talking about now (“It may be simplest to avoid [GitHub releases] altogether”). I recognize that there are, uhh, still a lot of changes going on for Open edX developers right now :sweat_smile: But perhaps when we’re through this most recent wave of changes, it’d be a ripe time to reintroduce the changelogs OEP.

100% agreed.

If we can get to a place where we’re publishing PyPI/NPM releases off of git tags, then I think it just becomes an implementation detail whether we want to mirror the releases on GitHub too. Whatever the case is, developers could stop worrying about GH releases entirely.

Regarding conventional commits and changelog entries:

I agree, and I am a little torn. Like, we are putting in the work to label all our commits with types now, and I want that effort to be useful, so I want to use them to help auto-populate changelog entries… but it is clear that a good commit message and a good changelog entry are not the same thing :face_with_diagonal_mouth:

I like this framing. I do think each layer in the hierarchy has a purpose:

  • commit messages are for other devs in the same repo
  • PRs descriptions are for your reviewers
  • changelog entries are for people who use your repo
  • release notes are for people who use your repo

Even for the single-commit case, these layers matter. As you mention, though, we could and should use tooling to make this simpler – particularly, we could combine the “changelog” and “release notes” steps together with some basic tooling so that the developer only has to write those notes once.

Agreed that we should promote awareness. I’m not sure I like the idea of leaving this workflow up to maintainers, though. If every repo wants contributors to describe changes in a different way, that’s a lot of extra mental overhead for developers. Within reason, I’d like the basic contribution workflow across our 100+ repositories to be identical.

Getting every repo to conform to a new standard would definitely take time, but it’s not without precedent: migrating PR checks from Jenkins to GitHub Actions, moving to Conventional Commits, upgrading Django… we’re capable of org-wide changes!

I just realized that an OEP about releases had been started, with a similar discussion about Changelogs and the necessity of GitHub Releases : OEP-53 Package Release process by Jawayria · Pull Request #201 · openedx/open-edx-proposals · GitHub

  • PR templates as provided by the cookiecutter have way too many pieces which hide the meat of the PR in checklists and so on. But making people go to the commits is not helpful in the github UI. Important information needs to be pulled out of the commit messages. Tim’s PR on this is great!
  • Changelogs should definitely be at a higher granularity than the individual commits. If we autopopulate the from commit messages they will tend to stay as autopopulated because it’s easier than editing.
  • Triggering releases off the changelog would be a good way to make the changelog part of the flow and it could standardize how releases happen since some repos use github release, some use tags, some just automatically release on every merge inferring the version number off of commit messages.
1 Like

Whoa! I love the idea of making the changelog trigger tagging and publishing. :smiley:

1 Like