Improving discoverability of core contributors

One issue I’ve been looking into recently has been improving the discoverability of core contributors in order to help find reviewers from the pool of core contributors. The current wiki lists all core contributors, however while it maps core contributors to the repos they have access to, it doesn’t map the other way which makes this harder than needed.

I think there are a couple of ways that we can simplify this:

  1. GitHub has a rich API that will allow us to query the list of people with write-access to a repo. This is essentially close to a core contributor list and can made discoverable. Using this will ensure that we always have an up-to-date list as contributors are added or removed.
  2. We can create a GitHub action that is run periodically, and uses information from the above API to dynamically populate a project board that will list every repo and every core contributor and potentially even more information. This board can be easily queried and filtered and support multiple views into this data.
  3. We can also create some automation around automatically assigning new PRs to random core contributors to that repo, potentially taking into account their existing load.

I’d love to get feedback from others in the community, particularly those in the Maintenance Working Group, about some of these ideas. If they are suitable I or someone from OpenCraft can help develop the functionality.


Original discussions:

1 Like

I really like steps 1 & 2. I’m not certain about step 3 - random assignment may mean things are missed if say, someone is on holiday or on bigger repos such as edx-platform where each CC to the repo may only feel comfortable in certain areas.

Would it be possible to marry this with maintainer data from openedx.yaml?

According to OEP-55, maintainers should be the ones who:

promptly triage incoming requests that propose changes to or extensions of the component, assessing their appropriateness and/or routing them to proper reviewers

So I think #1 is a good idea, but instead of #2 and #3, I would suggest it could be helpful to have a bot that posts a comment like this on each OSPR or bug report:

Thanks for this contribution, @contributor. This package is maintained by: @Amy and @Daryl. They will aim to triage this within three working days by assigning it a priority and a reviewer. (Current potential reviewers include: cc1, cc2, cc2, cc4.)

I don’t think this approach would work for edx-platform, but it would work for other repos.

I’ve seen similar things used successfully on other open source projects, e.g. here’s the bot from DefinitelyTyped:

In their case, I like how the bot is scheduled to “escalate” to more senior reviewers after a while, until someone is able to give it attention. First it routes to the maintainer of the sub-project I contributed to, and then it escalates to an overall project maintainer after a week.

1 Like

Leveraging the information from the API is a great idea. One of the things that could be done with that is to populate the info in the catalog-info.yaml file so that backstage.openedx.org is always updated.

I agree, I started writing a longer response with mechanisms that can allow reassignment etc, but in the end I think there are just too many edge cases so it’s based to leave this to a human.

As part of a previous PR we now already display the maintainer name in the bot’s PR message. However it is explicitly put in backticks to avoid the maintainer being pinged. Perhaps instead they should be pinged?

GitHub has support for a CODEOWNERS file that can map parts of the code to who owns it. I don’t know if this information is available via the API, but there might be library to parse it which can be used to narrow down the list of potential reviewers.

I don’t know enough about backstage, but would it be better to perhaps update it to directly read this info from github rather than pushing the list to the catagog-info.yaml file and then from there to backstage?

Perhaps putting it in the catalog file is a good way to maintain a historical record if that’s valuable. It shouldn’t be too hard to have a bot create a PR to update this file when a new CC is added or one is removed.

Respectfully speaking, I think the new message is far too long. The first few times I saw it my eyes just kinda glazed over. The screenshot @braden posted is nice and to the point.

Backstage is a 3rd party tool and it reads from cataloginfo.yaml. One reason to keep things in catalog_info is that you need to be a part of the Open edX GH org to see backstage.openedx.org. Now, we’re happy to add community members as triage users, but you have to ask to do that. catalog_info.yaml is visible by anyone.

I have to second this. I also didn’t notice it said who the maintainer was and I’ve also never read it through, because it seems like it’s oriented to first-time contributors. It also takes up a ton of space on the screen (more vertical space than my browser can show at once), and if I delete it, it immediately gets posted again which is annoying. So I’m glad it has the maintainer listed, but I suspect many people haven’t noticed that either.

Would it make sense to move the static aspects of that message to a wiki or other external document and make the message briefer? We could just copy-paste it almost exactly and highlight key branches.

Or, alternatively, we already know if the person is a first-time contributor and we and post this longer message in that case and just post a condensed message with key pieces of information.

CC: @itsjeyd

I might be missing something but I don’t see any way to define a set of contributors in the catalog-info spec, however it does seem that you can add custom annotations. We could also create a plugin for backstage to consume this and show it in backstage.

@braden To circle back to the discoverability of core contributors, though, it would be nice to have a place where you can easily look up the core contributors for a repo so that someone who is triaging PRs can ping the relevant people. I think it would be too noisy to ping all CCs for each PR, we can definitely ping the maintainer but at that point it would be useful to them who else they can reach out to for review, which is where a mapping of repo to CCs comes in.

That makes sense to me.

1 Like

@sarina @braden @xitij2000 Regarding the length and content of the OSPR message, it’s worth noting that it currently serves additional purposes besides naming the maintainer(s) of the parent repo:

It’s supposed to make the OSPR review process fully transparent (and relevant documentation easily discoverable) to PR authors, enabling them to step through it without any help from OSPR managers.

Side note: While the OSPR review process has been relatively stable for a while now, the product review process is still being actively refined, and from what I’ve seen, even long-term members of the Open edX community aren’t always aware that (a) their PRs require product approval and/or (b) that they’ll need to complete the product review process before their changes will go to engineering review. So there seems to be some utility to making information about the steps involved in getting a PR merged easily accessible to everyone, not just first-time contributors.

In addition to helping PR authors, having the bot post detailed instructions about the OSPR review process is supposed to reduce the amount of time that OSPR managers need to spend on each PR, allowing us to process more PRs and avoid having to manually post the same pointers/instructions on different PRs over and over again.

That said I’m not opposed to changing the message, but it would be worth keeping the points/requirements above in mind with any adjustments that you decide to make.

Also curious to see what @Michelle_Philbrick thinks.

Agreed with @itsjeyd - the message was updated with things with things that were making the process more transparent for contributors, and getting them information right up front.

Regarding the product review process, as @itsjeyd is right that it’s still being actively refined, and there’s still OSPRs coming through that need to be sent back to product to look at first. I would not take out messaging related to that.

Maybe it’s not so much an issue of length, but format / styling? We could maybe combine some bullets and edit wording a bit to make the actual layout smaller, but I think the content is important.

@itsjeyd @Michelle_Philbrick The way I understood what @xitij2000 suggested, you would keep the benefits of the longer message - it would just be moved to a page with all the details, which the shorter message would incite people to go read.

If anything, it might allow to put more emphasis on the fact that a PR author is responsible for getting their PR proactively through the review process, by highlighting that remark in a shorter message - while directing all contributors to go read the longer message?

1 Like

Yes - I really like the contents of the new message, and I think it’s very helpful for first time contributors. But as someone who opens a lot of PRs, I find the format distracting - a huge comment right in the middle of my PR that I (as an experienced contributor) don’t want and can’t get rid of. I’d prefer to move the contents elsewhere so that people can read it by clicking a link.

Alternately: GitHub already has some way of identifying first time contributors. Maybe in that case we can include the full message, and in other cases just a link to the “What to expect” docs.

1 Like

100% agree my comment was entirely about the length. Agree with Xavier & Braden about dramatically shortening the text and directing people to other pages either wiki or on docs.openedx.org. eg,

Hi and welcome!
We have a process for PRs that you can read about here!
I’m pinging some maintainers for you @maintainer!

That’s so short and readable. I think people’s eyes glaze over at a wall of text that’s longer than their screen viewport, and such a message is intimidating to get in an email.

I hope I have a compromise solution that can fulfil both needs somewhat.

GitHub supports collapsible sections in Markdown, and we can take advantage of this to make the message sort, but also provide relevant information at the same time.

I have a simple version of this here. It front-loads the simple message that has been requested here, while keeping all the details of the current message, but collapsed, so it takes less vertical space. The headings give people a general indication of the steps required, so if they don’t know about the product approval aspect, for instance, they can expand that section and read up.

The exact message can perhaps be tightened up a bit more, but does this general approach sound good?


On the matter of the core contributor discoverability. My understanding is that it would be a welcome change to have it auto-generated and posted in a project board, or better to have it automatically injected into the catalog-info.yaml file for each repo. Once that is done, it can be consumed by Backstage.

I can create a task for a future OpenCraft sprint to add GitHub automation to automatically create a PR to add the list of CCs to the catalog-info.yaml file as custom annotations. This automation would need to run manually or on schedule since AFAIK you can’t use the addition of a team member as a trigger.

@itsjeyd @antoviaque Would having this in the catalog-info.yaml file be sufficient? It will allow seeing the list of CCs for a repo by opening that file. Would that be sufficient? Or should we look into creating a plugin for Backstage to consume this information and present it via that UI. Or, to also post to a project board in GitHub.

If posting to a project board, does it make sense to have a project board for each repo that has a list of the CCs or a global project board with CCs across repos (this will allso filtering for repo or CC).

2 Likes

@xitij2000 Collapsible markdown sections look like a great compromise to me - but I’ll defer to others who brought up the issue.

This also sounds good to me :+1: But @itsjeyd and @feanil as the chair of the maintainers working group might have preferences on this.

That LGTM :+1: Great idea!

Having CC info in catalog-info.yaml would definitely be good first step.

I haven’t really gotten into the habit of using Backstage (some of the info that it shows is not up-to-date), so don’t have a strong opinion on whether/how the info should be presented in its UI.

As for project boards, maybe having project-specific ones listing CCs would be helpful to maintainers? I’m not sure.

From the perspective of OSPR management, the main tool we use for triage is the Contributions board. It’s already a global board that lets us do all kinds of filtering across repos. If we could add the ability to list CCs for each repo to that board, that would be *immensely* helpful. :slight_smile:

CC @Michelle_Philbrick

1 Like