Open edX Analytics Pipeline for Tutor (Insights)

Hi :blush:

I wanted to start a conversation about the analytics pipeline that was used for Insights.

It is known that Tutor is now the official, community-supported distribution of open edX for production; and Tutor doesn’t have something for Insights but they have Cairn (no open source).

I think it is important as a community we agree to see if we do the data extraction and transformation process in a similar way, the E and the T of the ETL.

Would you be interested in looking for an alternative to the OpenedX analytics pipeline for Tutor? Do you have any ideas or requirements?


Open edX’s analytics pipeline was processing logs.

And the tech stack used was hadoop, hive, luigi tasks etc.

Setup of analytics is so comlex and has to be perfectly done to get it working so something in that area, we can work towards simplifying the installation and setup.

As for the data I think people just want to see some basic things.

  • How many users are enrolled in a course ?
  • Who are the users ?
  • How they are engaged with the content ?
  • How they are engaged with the videos ?
  • From which country they are coming ?
  • What is the age group of the users ?
  • What are their progress ? And how they are performing with the questions and answers ?

That sort of thing. I also think there should be a way to develop on top of it, for example if we want to develop some stuff in edX insights it requires knowledge of various things (hell, even it’s setup requires knowledge of various things like ansible and ubuntu system and various other things).

Some of those data is only found in logs and rightly so, those events if stored in mysql will be huge.

I don’t have any special requirements or anything that I use as such, I do the setup of insights and use it and have come to appreciate the piece of complex software that is available, this are just my 2 cents on the basis of what I’ve felt using the insights.

I am thinking of developing a custom solution for it, but don’t have the bandwidth to do so, I think there are some talks on tutor logs already in this direction but again haven’t had time to look into it deeply.


I hear that Figures is being updated for the latest release. Or maybe just Maple. Might be worth asking on #figures.


Yes, Figures is being updated to Maple and we’re making changes for Figures so it’s easier to be upgraded in the future by removing some fork dependencies we had in version 4 (Juniper) and earlier versions.


Thanks for confirm @omar
Are Figures going to support Lilac too? or will Figures jump from Juniper to Maple?

We’ve had a trouble prioritizing release upgrades for Figures, but here’s what we’re going to work next:

If you’d like to contribute to Lilac support, it would be great to coordinate with my colleage John Baldwin via the Slack #figures channel.

1 Like

According to the web page, Cairn is actually open source (AGPLv3). It’s just not free; you have to buy a Tutor license. So it could be a nice option for some people.

Here are a couple of older comments on this subject that I think are still relevant: