I always respect the effort that it takes to put these things together. I would like to add a little color to some of the stats, to give a little more context on the edx-platform commit numbers.
Current git practices have led to fewer commits
If you look at a random early PR, you’ll see that most of us weren’t squashing our commits. That will significantly inflate the number of commits by those early individuals, particularly for new feature development.
The impact of this was pretty drastic. On March 28th, 2013, Victor Shnayder (a developer and later a product owner at edX) wrote:
A quick thought on our git process: we often have logically-related changes split into dozens or even hundreds of commits, which makes looking at release diffs quite difficult. E.g. today’s release has 219 commits, for probably ~30 actual changes.
Proposal: let’s start to use interactive rebase to collapse commits into logical units with complete diffs and commit messages that will be useful to not-just-you before making pull requests and/or before pushing work you’ve done.
(Note that releases were every week or two back then, not multiple-times-a-day like it is now.)
It took a while before this became the predominant way of doing things, and I don’t remember exactly how long that transition took (at least months, maybe more). But that’s part of why the number of commits starts going down during 2014, even though the number of people working on the codebase went up during that time.
edx-platform used to be the marketing site
In the early days, edx.org didn’t have a separate marketing site. It was just the LMS. Which meant that every blog post, every copy change on the front page, and even Anant’s bio pic needed to be a code change.
Every metric is problematic when you’re trying to do an analysis like this. Total lines of code changed gets skewed by the folks that merge in large vendor libraries. The introduction of the translations workflows skews things in a different way. The upgrade automation, the various release practices, etc. Nothing is perfect, and I think that an analysis of commits is valid. But if you’re treating all commits as equal and want to measure who is doing what in the code, you might want to limit the time range to after 2015 or so.
Take care.