Is Insights / analytics-pipeline going to be replaced?

pongtawat · March 29, 2021, 9:25am

Hello,

Now that analytics-pipeline has been removed from devstack via DEPR-119. There is a comment there that “edx-analytics-pipeline repo eventually being archived”.

Does this mean edX team is developing something to replace the current analytics-pipline? Will Insights app also get replaced too? Is there a timeline for the new Insights/analytics?

I just wondering if we should deploy Insights/analytic-pipeline now if it’s going to be removed/replaced soon.

Thank you,
Pongtawat

sambapete · March 29, 2021, 2:08pm

I had asked a similar question in the analytics channel of the Open edX’s Slack.

My original question was the following:

And what is the status of Insights in the mid to long term? I saw at least one DEPR ticket [DEPR-119] - JIRA about Insights and Devstack. What about the future and the native / community installation ? @Nimisha Asthagiri I am tagging you to get the architectural perspective and try to understand where analytics is going with regards to Open edX.

Here is what @nimisha answered me on January 27th:

Insights Ownership and Future
Insights, as you know, has not received as much care as we would like. At edX, its ‘ownership’ has been tossed about.The good news is that, finally, there is a product-delivery team that has taken official ownership of Insights now.The meh news is that, they have not yet made a decision on its future but they have a goal for doing so in this coming quarter. Analytics Architecture Future
From an architectural perspective for Open edX, here are 2 tickets that provide info:

Frontend Data Visualization modules: Log in with Atlassian account

Backend Data Warehouse boundaries: Log in with Atlassian account

nimisha · March 29, 2021, 4:17pm

@schenedx may have more updated information on this, at this point.

schenedx · March 29, 2021, 4:41pm

@nimisha Thank you for the pointer.

I don’t believe I have anything more concrete to share. All I can say is we are still evaluating.
Here is what I know:

My product-delivery team took over the technical ownership of Insights only 2 weeks ago. The boundary of the ownership is only on the Frontend Data Visualization part. The Backend data warehouse ownership is not on us.
Insights data visualization component is being researched for the next steps. However, the priority of such next steps needs to be decided in the next 2 month.
Insights data translation and aggregation pipeline is using EMR, an older infrastructure compare to rest of the data infrastructure. As time progresses, this area would require a more urgent decision

This is a feature/product that is on my mind. Whatever you choose to do in this area, I am interested to hear your thoughts and feedback.

braden · March 29, 2021, 6:03pm

FWIW I discussed this with Brian Beggs last summer, and while I imagine a lot may have changed, here’s an excerpt from the internal note I shared with my team afterward:

edX has been focused on building up their own BI (Business intelligence) tooling and it’s very focused on their own needs and the sort of concerns and limitations that apply at edX.org scale. …

edx-analytics-pipeline is based on the ETL model (Extract raw data, Transform it, Load it into the separate analytics database), and on Hadoop Map-Reduce, Hive, Luigi, etc. But edX and [others] are moving away from an ETL approach and toward ELT instead: Extract raw data, Load it into your data warehouse, and then Transform it as needed when you need to run reports/queries. For this purpose, edX has been using an open-core tool called dbt and I was told that the team loves dbt and it’s made their analytics way better, more flexible, easier to code, etc. The main difference is that before, to update edx-analytics-pipeline for some new report, one had to know python, Hive, Hadoop, Luigi, SQL, Jenkins DSL, and more. With dbt, writing a new report only requires knowing SQL.

So where does this leave us?

I think that if customers ask for analytics in the future we should try to leverage that to create an open source “Open edX Data Warehouse” built on dbt. This could be built out fairly quickly, would rapidly surpass Insights in functionality, and scale to instances of any size. The edX dbt code is not open source, but Brian had mentioned they’re open to open-sourcing parts of it that would be in common with a community approach. For customers…that have their own BI tools, they can ingest data from dbt; for the community in general we can create some sample BI reports using metabase. This would be so much simpler and cheaper than edX Insights…

We all liked this general approach but have not done any work toward it yet.

nimisha · March 29, 2021, 9:09pm

@braden Thanks for the readout from your meeting with @bbeggs.

cc: @natea and @e0d who have expressed interest in being actively involved in this.

natea · September 28, 2021, 2:22pm

Thanks Nimisha and Braden for your comments on the analytics story at edX. We discovered Metabase and have been finding it super powerful and easy to deliver custom reports to our customers.

Our non-technical people can easily built reports without needing to know SQL, which I think is one of the biggest strengths of Metabase, but I’ve also heard very good things about dbt for doing more sophisticated analytics.

One of the downsides of Metabase is that you can’t do cross-database joins, so taking the data in MySQL and synthesizing it with data captures from the events tracking stream is not possible, unless you push all of that data into a single data warehouse.

For this purpose, we’re looking at tools like Dremio and Snowflake.

Topic		Replies	Views
Deprecation/Removal: Remaining Insights Code and References Deprecation analytics , depr	1	259	January 25, 2024
Deprecation/Removal: edx-analytics-pipeline DEPR-119 Deprecation	1	586	October 29, 2020
remove analytics-pipeline Site Operations Help api , analytics , devstack , olive	1	313	August 8, 2023
What are the temporary suggested replacement for analytics pipeline Development analytics	1	396	November 15, 2022
Open edX Analytics Pipeline for Tutor (Insights) Development tutor , data	6	1741	April 8, 2022

Is Insights / analytics-pipeline going to be replaced?

Related topics