Hi all,
Jill and I requested folks ask some questions / suggestions during our talk yesterday and many folks were kind enough to answer! We’ve captured all of the questions and given the answers that we have below (ordered by number of votes, most to least). Please feel free to keep the conversation going in comments below!
Realtime analytics.
This is at the core of what we hope to provide! At the moment the time from user action to it showing up in a report is under 2 seconds on an Olive Tutor local build.
Easy to us interface for non technical instructors
We’re hoping that the Superset interfaces will meet the instructor needs, but would love your feedback!
Will we be able to add our own events? For example, can I fire a custom event from a plugin and have it show up in OARS?
Not yet, but it’s a high priority feature! We’ve created a stub ticket to track the work here: Open edX Data Working Group · GitHub
One-click install
That’s the dream! Right now to get started it’s installing a few Tutor plugins and enabling them, then running an initialization script. It’s a tradeoff between simplicity and configurability, but we hope to make it as easy as possible to get running with reasonable defaults. Currently detailed instructions are here as we develop, but we’ll be adding proper documentation as time goes by.
Filtering results by category (like demographics)
We have the ability to do this for filtering by course or date, but for learner privacy purposes we need to be careful about the ways in which we surface demographic data. I’d love to learn more about what criteria people would like to be able to filter by as each one may come with a certain cost in terms of performance.
Legacv data can merge into new analytic systems easily for full historical tracking.
One of the pieces of feedback we are taking away from the talk is that in addition to the management command we had planned to replay tracking log files, we need a bulk load solution for historical data. Follow along on the ticket here: Create a bulk load tool for replaying tracking logs · Issue #48 · openedx/openedx-oars · GitHub
Data separation and metric reporting needed for both Platform Administrators and Institution Admins/Instructors.
This is definitely a core requirement in v1 and is significantly implemented already. There are some additional tweaks we would like to add so that new classes of user can be added. The current status of this is:
Student: No access
Instructor: Read access to data from their classes, instructor dashboard with filterable charts, ability to create new charts / query data that they have access to.
Platform / Institutional Admins: Full access to all data, ability to create new charts / dashboards, ability to create new data sources and change data permissions.
In the future we would also like to add:
Platform / Institutional Data Analyst: Full access to all data, ability to create new charts / dashboards. No ability to create new data sources or change permissions.
Resiliency to changes in upstream event contents and/or some wav of knowing when upstream event schemas change?
As the xAPI events are based on tracking logs our first step here is to tighten the versioning and changes to the tracking log events. We’re working on that later this year under this epic: Open edX Data Working Group · GitHub
The xAPI events will also be versioned, as well as the version of the transformer (event-routing-backends) so that we can know when things change and make appropriate downstream updates.
Data Export Features
Within certain limitations, data can be exported directly from Superset via the user interface. What data and how much can vary based on the source query, so bulk export from ClickHouse may be a feature we need to develop. Follow along and add your thoughts here: Investigate bulk export · Issue #49 · openedx/openedx-oars · GitHub
can we flip some analvtics to display for learners?
This is a use case we’ve talked about and is supported by our SSO + Superset integration. It definitely raises more concerns about privacy issues (ie demographic data like location). What kinds of data are you interested in surfacing to learners?
Time learners spent in the course
This is on the roadmap once we have the page load events!
Course vs course comparison
This is a great use case that came up in the talk! We can make something up, but are actively looking for what metrics folks would like to compare! This post will be linked from our feedback ticket here, feel free to add more information there: Sort through community input · Issue #32 · openedx/openedx-oars · GitHub
Can we set up a shared anonymized data pool (cross providers) to use for crafting/ inventing new analytics algorithms?
This sounds similar to the edX “Research Data Package”, and something I’d love to help facilitate if we can ensure robust learner privacy protections. I would love to hear from organizations that may want to participate in this or offer input.
Clients seek marketing data (email opt-in)
There should be the ability to generate email opt-in reports from the CMS, I’ll look into how one would do that.
How to run custom queries?
Custom queries are a feature in Superset called “SQL Lab”. Users should be limited to only the data sources and course data that they normally have access to in this view. You should also be able to save queries, or use queries from the SQL Lab in charts. This is one easy way for users to share new queries or charts with peers.
Note: currently access to SQL Lab in OARS is limited to superusers only as we are still standardizing and securing access to data in datasets, and need to ensure that all our access restrictions persist into all uses of SQL Lab.
Average correct on first attempt, average correct on final attempt and average number of attempts for every problem type including custom and isinput
We have these for all problem types that emit the appropriate tracking log event. I’m not sure off hand what coverage we have for “all” problem types, especially for custom XBlocks, but I will investigate it and post back.
How will the implementation of the event-bus effect OARS?
We’ve got high hopes that we will be able to use the event bus to make sending tracking log events more resilient to disruption than the current Celery-based model. The big concern is the volume of events and making sure we can keep the systems performant and affordable. Investigation for this is scheduled for v2 later this year in this ticket: Open edX Data Working Group · GitHub
Where should a course be revised to promote engagement or performance?
This is core to the reports we would like to provide. If there are particular metrics that would be helpful to make these determinations, please let us know!
Can I use some of OARS but not all of OARS?
Absolutely. The further up the stack that you make changes, however, will limit what you can use downstream. For example it would be fairly easy to swap out Superset for Tableau, but changing xAPI to Caliper would require changes all the way down the stack that would basically force a rewrite of the system downstream of event-routing-backends (different LRS, which may mean different database, different SQL to get the same reports, etc).
We’re hoping that people who deviate from the recommended settings / technologies will return their recipes back to the project for others to learn from and follow. If community momentum moves away from a particular set of features or technology, we would consider making that the “new default” in a following Open edX release. OARS is intended to be a well-maintained, living system that grows along with our community needs!