OARS talk questions

Hi all,

Jill and I requested folks ask some questions / suggestions during our talk yesterday and many folks were kind enough to answer! We’ve captured all of the questions and given the answers that we have below (ordered by number of votes, most to least). Please feel free to keep the conversation going in comments below!

Realtime analytics.

This is at the core of what we hope to provide! At the moment the time from user action to it showing up in a report is under 2 seconds on an Olive Tutor local build.

Easy to us interface for non technical instructors

We’re hoping that the Superset interfaces will meet the instructor needs, but would love your feedback!

Will we be able to add our own events? For example, can I fire a custom event from a plugin and have it show up in OARS?

Not yet, but it’s a high priority feature! We’ve created a stub ticket to track the work here: Data Working Group · GitHub

One-click install

That’s the dream! Right now to get started it’s installing a few Tutor plugins and enabling them, then running an initialization script. It’s a tradeoff between simplicity and configurability, but we hope to make it as easy as possible to get running with reasonable defaults. Currently detailed instructions are here as we develop, but we’ll be adding proper documentation as time goes by.

Filtering results by category (like demographics)

We have the ability to do this for filtering by course or date, but for learner privacy purposes we need to be careful about the ways in which we surface demographic data. I’d love to learn more about what criteria people would like to be able to filter by as each one may come with a certain cost in terms of performance.

Legacv data can merge into new analytic systems easily for full historical tracking.

One of the pieces of feedback we are taking away from the talk is that in addition to the management command we had planned to replay tracking log files, we need a bulk load solution for historical data. Follow along on the ticket here: Create a bulk load tool for replaying tracking logs · Issue #48 · openedx/openedx-aspects · GitHub

Data separation and metric reporting needed for both Platform Administrators and Institution Admins/Instructors.

This is definitely a core requirement in v1 and is significantly implemented already. There are some additional tweaks we would like to add so that new classes of user can be added. The current status of this is:

Student: No access

Instructor: Read access to data from their classes, instructor dashboard with filterable charts, ability to create new charts / query data that they have access to.

Platform / Institutional Admins: Full access to all data, ability to create new charts / dashboards, ability to create new data sources and change data permissions.

In the future we would also like to add:

Platform / Institutional Data Analyst: Full access to all data, ability to create new charts / dashboards. No ability to create new data sources or change permissions.

Resiliency to changes in upstream event contents and/or some wav of knowing when upstream event schemas change?

As the xAPI events are based on tracking logs our first step here is to tighten the versioning and changes to the tracking log events. We’re working on that later this year under this epic: Data Working Group · GitHub

The xAPI events will also be versioned, as well as the version of the transformer (event-routing-backends) so that we can know when things change and make appropriate downstream updates.

Data Export Features

Within certain limitations, data can be exported directly from Superset via the user interface. What data and how much can vary based on the source query, so bulk export from ClickHouse may be a feature we need to develop. Follow along and add your thoughts here: Investigate bulk export · Issue #49 · openedx/openedx-aspects · GitHub

can we flip some analvtics to display for learners?

This is a use case we’ve talked about and is supported by our SSO + Superset integration. It definitely raises more concerns about privacy issues (ie demographic data like location). What kinds of data are you interested in surfacing to learners?

Time learners spent in the course

This is on the roadmap once we have the page load events!

Course vs course comparison

This is a great use case that came up in the talk! We can make something up, but are actively looking for what metrics folks would like to compare! This post will be linked from our feedback ticket here, feel free to add more information there: Sort through community input · Issue #32 · openedx/openedx-aspects · GitHub

Can we set up a shared anonymized data pool (cross providers) to use for crafting/ inventing new analytics algorithms?

This sounds similar to the edX “Research Data Package”, and something I’d love to help facilitate if we can ensure robust learner privacy protections. I would love to hear from organizations that may want to participate in this or offer input.

Clients seek marketing data (email opt-in)

There should be the ability to generate email opt-in reports from the CMS, I’ll look into how one would do that.

How to run custom queries?

Custom queries are a feature in Superset called “SQL Lab”. Users should be limited to only the data sources and course data that they normally have access to in this view. You should also be able to save queries, or use queries from the SQL Lab in charts. This is one easy way for users to share new queries or charts with peers.

Note: currently access to SQL Lab in OARS is limited to superusers only as we are still standardizing and securing access to data in datasets, and need to ensure that all our access restrictions persist into all uses of SQL Lab.

Average correct on first attempt, average correct on final attempt and average number of attempts for every problem type including custom and isinput

We have these for all problem types that emit the appropriate tracking log event. I’m not sure off hand what coverage we have for “all” problem types, especially for custom XBlocks, but I will investigate it and post back.

How will the implementation of the event-bus effect OARS?

We’ve got high hopes that we will be able to use the event bus to make sending tracking log events more resilient to disruption than the current Celery-based model. The big concern is the volume of events and making sure we can keep the systems performant and affordable. Investigation for this is scheduled for v2 later this year in this ticket: Data Working Group · GitHub

Where should a course be revised to promote engagement or performance?

This is core to the reports we would like to provide. If there are particular metrics that would be helpful to make these determinations, please let us know!

Can I use some of OARS but not all of OARS?

Absolutely. The further up the stack that you make changes, however, will limit what you can use downstream. For example it would be fairly easy to swap out Superset for Tableau, but changing xAPI to Caliper would require changes all the way down the stack that would basically force a rewrite of the system downstream of event-routing-backends (different LRS, which may mean different database, different SQL to get the same reports, etc).

We’re hoping that people who deviate from the recommended settings / technologies will return their recipes back to the project for others to learn from and follow. If community momentum moves away from a particular set of features or technology, we would consider making that the “new default” in a following Open edX release. OARS is intended to be a well-maintained, living system that grows along with our community needs!

1 Like

Notes from the Data-wg meet-up:

What are the next steps for charts and reports?

We’ll be leaning pretty heavily on the Product team to assemble and refine the use cases to guide chart creation. Axim will likely start a funded contribution to employ a data expert to create these charts in Superset.

How will we know what charts and dashboards people are creating/using?

We can ask people to export and share their charts/dashboards, but we aren’t planning to monitor this in people’s individual instances. edX/2U have shown willingness to share their use cases and perhaps deploy OARS on edx.org, which would be a good source of large numbers of users data.

Can researchers craft their own queries? Will OARS provide some standard joins/views to help craft these queries?

Yes, and yes.

Cairns creates a view/abstraction layer on top of MySQL to provide some data from the LMS.

We are talking about share DBT packages to construct more complex queries across the xAPI data in Clickhouse or whatever other databases people end up using to feed into the reports.

Maintenance of the data schemas is a concern, and one approach to populate the databases that back DBT would be to create analytics data abstraction APIs.

And performance of these abstraction layers is also a critical concern.

Does OARS have an opinion about how to use course content data?

Course graph data will be automatically pulled into OARS.

How do we ensure that new Open edX features also take data analytics into account?

Want to work closely with the Product team when they’re developing new features to ensure that data events are part of the planning process there.

We are also working on providing guidance and implementation patterns for adding new events that get propagated through to OARS.

How are the data events documented, and how will this be maintained through schema changes?

We are planning a tracking log cleanup, which also involves documentation. New events will need to be documented in a similar way.

There are also things to note beyond the basic schemas and data fields in an event – e.g. many unenrollment events happen simply because learners no longer want to see a course on their dashboard. So we will also need guidelines for Data Analysts.

Will content tags be included in the OARS data?

Short answer: yes!
Long answer: Some taxonomies the tagging project is discussing support for are licensed, and so we need to be careful to respect these licenses when shuffling data around.

How can we percolate/display Superset data back in the CMS/LMS?

We’re going to try using the Superset API and visualization packages to show Superset data inside the CMS/LMS so that instructors (and hopefully students!) can see useful data in context.

e.g Student completing a course gets motiviational messages like, “you’re doing better than 95% of the learners in this course!”

e.g Educators: “Learners are watching these segments of your video.”

Can we deliver specific dashboards to different roles?

Yes we can. We’re focused currently on restricting access to data (by course role) at the moment, but Superset does support restricting access to charts and dashboards too, and we can create separate dashboards for different types of users. However, “restricting” might be too constraining… we may just show the users all available dashboards, and rely on users to Favorite which dashboards they’re most interested in.

Who will decide what goes in the supported dashboard(s)?

The Product working group is sifting through use cases and will be making recommendations. Get involved if you want to steer this conversation!

What are the charts we know everyone wants to see?

  • registrations / enrollments / unenrollments
  • time on page
  • time on problem
  • video data – repeat views of particular segments, skipped segments/too long videos, measuring engagement, transcript downloads (because they may be skipping watching the video entirely)
1 Like