Open edX Analytics Pipeline for Tutor (Insights)

mafermazu · April 4, 2022, 9:11pm

Hi

I wanted to start a conversation about the analytics pipeline that was used for Insights.

It is known that Tutor is now the official, community-supported distribution of open edX for production; and Tutor doesn’t have something for Insights but they have Cairn (no open source).

I think it is important as a community we agree to see if we do the data extraction and transformation process in a similar way, the E and the T of the ETL.

Would you be interested in looking for an alternative to the OpenedX analytics pipeline for Tutor? Do you have any ideas or requirements?

chintan · April 5, 2022, 6:10am

Open edX’s analytics pipeline was processing logs.

And the tech stack used was hadoop, hive, luigi tasks etc.

Setup of analytics is so comlex and has to be perfectly done to get it working so something in that area, we can work towards simplifying the installation and setup.

As for the data I think people just want to see some basic things.

How many users are enrolled in a course ?
Who are the users ?
How they are engaged with the content ?
How they are engaged with the videos ?
From which country they are coming ?
What is the age group of the users ?
What are their progress ? And how they are performing with the questions and answers ?

That sort of thing. I also think there should be a way to develop on top of it, for example if we want to develop some stuff in edX insights it requires knowledge of various things (hell, even it’s setup requires knowledge of various things like ansible and ubuntu system and various other things).

Some of those data is only found in logs and rightly so, those events if stored in mysql will be huge.

I don’t have any special requirements or anything that I use as such, I do the setup of insights and use it and have come to appreciate the piece of complex software that is available, this are just my 2 cents on the basis of what I’ve felt using the insights.

I am thinking of developing a custom solution for it, but don’t have the bandwidth to do so, I think there are some talks on tutor logs already in this direction but again haven’t had time to look into it deeply.

arbrandes · April 6, 2022, 4:28pm

I hear that Figures is being updated for the latest release. Or maybe just Maple. Might be worth asking on #figures.

omar · April 7, 2022, 12:24pm

Yes, Figures is being updated to Maple and we’re making changes for Figures so it’s easier to be upgraded in the future by removing some fork dependencies we had in version 4 (Juniper) and earlier versions.

mafermazu · April 7, 2022, 1:44pm

Thanks for confirm @omar
Are Figures going to support Lilac too? or will Figures jump from Juniper to Maple?

omar · April 7, 2022, 3:00pm

We’ve had a trouble prioritizing release upgrades for Figures, but here’s what we’re going to work next:

Koa (for our internal needs), but will also be relased as Figures version 0.5.x
Maple (mostly driven by community needs), will be released as Figures version 0.10.x. For more information please take a look at this issue: Document Figures and Open edX versions compatibility table · Issue #436 · appsembler/figures · GitHub

If you’d like to contribute to Lilac support, it would be great to coordinate with my colleage John Baldwin via the Slack #figures channel.

braden · April 8, 2022, 4:19am

According to the web page, Cairn is actually open source (AGPLv3). It’s just not free; you have to buy a Tutor license. So it could be a nice option for some people.

Here are a couple of older comments on this subject that I think are still relevant:

Topic		Replies	Views
Analytics in Maple (Cairn vs. ...nothing?) Site Operators	2	1471	March 2, 2022
Attempting to setup Analytics Pipeline on devstack master Development	7	695	November 9, 2022
Is Insights / analytics-pipeline going to be replaced? Site Operators analytics	6	1417	September 28, 2021
What are the temporary suggested replacement for analytics pipeline Development analytics	1	396	November 15, 2022
Insights installation in OpenedX Juniper.3 Site Operations Help analytics	1	731	October 20, 2020

Open edX Analytics Pipeline for Tutor (Insights)

Related topics