[Blog] A brief history of the Open edX

Today I published the first article on personal site. I choose to write about this project since its what I was mostly involved in last few months.

The article still needs some final touches, but still I think you can find it useful. Your feedback is weclome specially if I made a wrong statement.


I always respect the effort that it takes to put these things together. I would like to add a little color to some of the stats, to give a little more context on the edx-platform commit numbers.

Current git practices have led to fewer commits

If you look at a random early PR, you’ll see that most of us weren’t squashing our commits. That will significantly inflate the number of commits by those early individuals, particularly for new feature development.

The impact of this was pretty drastic. On March 28th, 2013, Victor Shnayder (a developer and later a product owner at edX) wrote:

A quick thought on our git process: we often have logically-related changes split into dozens or even hundreds of commits, which makes looking at release diffs quite difficult. E.g. today’s release has 219 commits, for probably ~30 actual changes.

Proposal: let’s start to use interactive rebase to collapse commits into logical units with complete diffs and commit messages that will be useful to not-just-you before making pull requests and/or before pushing work you’ve done.

(Note that releases were every week or two back then, not multiple-times-a-day like it is now.)

It took a while before this became the predominant way of doing things, and I don’t remember exactly how long that transition took (at least months, maybe more). But that’s part of why the number of commits starts going down during 2014, even though the number of people working on the codebase went up during that time.

edx-platform used to be the marketing site

In the early days, edx.org didn’t have a separate marketing site. It was just the LMS. Which meant that every blog post, every copy change on the front page, and even Anant’s bio pic needed to be a code change.

Every metric is problematic when you’re trying to do an analysis like this. Total lines of code changed gets skewed by the folks that merge in large vendor libraries. The introduction of the translations workflows skews things in a different way. The upgrade automation, the various release practices, etc. Nothing is perfect, and I think that an analysis of commits is valid. But if you’re treating all commits as equal and want to measure who is doing what in the code, you might want to limit the time range to after 2015 or so.

Take care. :slight_smile:


Also, be careful about interpreting large deleted line counts as focused refactoring efforts. We were pretty fast and loose about copying vendor JS files into the repo in the early days (like all of MathJax). So some of those big drops can be removing those vendor files from the repo in favor of using npm.

I mean, it’s still refactoring in a sense. But it doesn’t necessarily indicate a focused effort on refactoring.

Thank very much @dave for the valuable input,.Given your input, I think I will go with another round to review the article.

Regarding This

I have just did quick the following
To get the commiters start from 1.1.2016:
The commiters:
git shortlog -sn --no-merges --since 1.1.2016 >> committers_since_2016.txt
Top commiters:
head committers_since_2016.txt

   583	Calen Pennington
   430	Feanil Patel
   385	Nimisha Asthagiri
   357	edX requirements bot
   336	Ned Batchelder
   307	Robert Raposa
   269	Awais Qureshi
   267	Jeremy Bowman
   244	edX Transifex Bot
   238	John Eskew

Total commiters:
wc -l committers_since_2016.txt: 640

Total commits:
git rev-list --no-merges --count HEAD --after 1.1.2016: 16783

Given the above, the chart distrubituoin chart would be:

In comparison the graph that counts from 2011:


As opposite to the previous chart which started from 2011, we can see the commiters with 10-100 commits have more share, almost equal share with comitters of >100 commits. Is that exactly what you were referring to?
And I guess to be even more accurate and fair I would have to remove t
he counts/points of bots, which would increase the share of (10 to 100) a bit.

Edit: Please treat the “At least 9 commits” label in 2011 graph, as " 1 to 9".

Yeah, I figured it would significantly change this graph, though I didn’t know exactly how much. Thank you again for doing this analysis. I’m curious to see how this distribution of contributions continues to evolve over time.

True, though there’s always “one more thing” to do in terms of improving accuracy on stuff like this. :stuck_out_tongue: