OpenTelemetry Integration Coming Soon!

Hey there, our fantastic open-edx community!

We’re thrilled to share some exciting news with you today. We’ve got big plans in the pipeline - we’re integrating OpenTelemetry into our project!

Why OpenTelemetry?

We want to take our monitoring game to the next level. OpenTelemetry (Otel), with its strong community support, allows us to generate valuable data without breaking a sweat. No OpenTelemetry Collector is needed for now; we’re focusing on the means of data generation.

Our Current Setup:

Right now, we’re using the trusty edx-django-utils package/repo for our monitoring module. You can find it here. This module is our go-to for monitoring, and it defaults to using New Relic. It’s our gateway to third-party monitoring libraries like newrelic.agent. It’s packed with middleware and utility methods that make it a breeze to add custom attributes and keep an eye on memory consumption. For a full rundown of what’s in our public API, check out our init.py file.

What’s Next?

Our plan is to write middleware classes that implement OpenTelemetry for various monitoring purposes: cache monitoring, deployment monitoring, cookie monitoring, code owner monitoring, and monitoring memory. We’re taking it one step at a time. Our initial move is to put OpenTelemetry to work for cache monitoring. We’re still deciding whether to use the existing abstraction layer or define our middleware classes separately and import them here.

Currently, we’re already using middleware classes to tap into New Relic’s monitoring capabilities. You can see an example of how it’s done by using the edx-django-utils package for monitoring here.

The Key Point:

We’re not bidding farewell to our New Relic setup. Instead, we’re getting clever with it. We’ll introduce environment variables or feature flags to enable the OpenTelemetry-based middleware classes where needed. This means no changes in the edx-django-utils package; we’ll take care of the conditions on our end.

Looking Ahead:

Once OpenTelemetry is seamlessly integrated, we’re all set to use Grafana as a trace exporter. We’ll also explore other third-party tools to enhance our tracing capabilities further if required.

We’re super excited about this journey, and we want you to be a part of it. Your insights, feedback, and contributions are always welcome. Let’s make monitoring better together!

Stay tuned for updates, and feel free to drop any questions or suggestions in the comments. Here’s to a brighter, more open future for monitoring!

Best,

@shahbazshabbir

2 Likes

@shahbazshabbir that’s really exciting to hear! If someone wanted to follow the progress what’s the best way to do that? Monitor the edx-django-utils repo for PRs? Is there a github project board with tickets of what work is happening when?

Thanks @shahbazshabbir! I think this will be helpful to unlock work related to the discussion about alternatives to New Relic for the storage and querying as well. Application Performance Monitoring - Architecture and Engineering - Open edX Community Wiki

I’ll share the update here once it’s ready. Yes, It will be in the edx-django-utils repository.

This is exciting. I’ll be watching this ticket, so it would be great to get updates here. Also, please reach out if you have any questions along the way.

Hi @shahbazshabbir, have there been any updates on this work that could be shared and have you been collaborating with @arbrandes or the frontend working group (Find and recommend an alternative to New Relic for Application Performance Monitoring · Issue #134 · openedx/wg-frontend · GitHub)?

Hi @adzuci,

Rather than directly integrating OTEL into edx-django-utils, an alternative approach has been taken. A dedicated plugin has been created, following the open-edx-plugin structure, and released under the MIT license. Currently, it is in the testing phase, and once complete, the plugin will be available for use and review.

I believe this method provides more flexibility and makes it easier to integrate OTEL.

For more details about the plugin and its implementations, please refer to the README file. I’ll keep you updated on the progress, and any feedback or suggestions during this testing phase would be highly appreciated.

Thanks

2 Likes

Cool to see movement towards OpenTelemetry!

I was looking at implementing OTel in edx-django-utils – I think we could just change e.g. set_custom_attribute to call both New Relic and the appropriate OTel API call, with config to turn either one on or off. This would allow all of the existing IDAs to start reporting to OTel collectors right away, since we already have everything instrumented with edx-django-utils.

I don’t think that would be duplicated effort with the plugin you’ve written, but I wanted to check in. My understanding is that the plugin does two things:

  • Adds instrumentation for some Django-specific things
  • Adds configuration for directing data collection to various endpoints

I’d also be interested in hearing more about how the Django-settings-based configuration compares to the environment-variable-based config that OpenTelemetry provides. Is the latter missing some features that you needed, or is it more about wanting to use Django settings instead of env vars, or something else?

Hi Tim,

Thanks for the work you’re doing to add OTel support in edx-django-utils. This is great to see, and I hope that we can have a corresponding effort in the frontend-platform repo.

In terms of duplicative effort, I agree that there isn’t complete overlap, and that your work will have a wider impact. I imagine that once your work lands we will modify our plugin to focus only on the generalized tracing setup in edx-platform.

As to the configuration, I think that the plugin defaults to using the Django settings because that is what the developer was familiar and comfortable with. Thank you for pointing out the built-in support for environment configuration. I think that we will likely deprecate those explicit settings and take advantage of the built-in options.

I should add a few things I’ve learned since I last posted:

  • The wrapper command opentelemetry-instrument is not compatible with gunicorn’s forking model: Working With Fork Process Models — OpenTelemetry Python documentation – which might mean that the automatic environment variable configuration I linked to won’t work either. This may not be relevant to everyone’s deploys, though.
  • If you do use that wrapper, you also need to pass DJANGO_SETTINGS_MODULE in the environment, at least for edxapp.
  • If you get a “Couldn’t build proto file into descriptor pool: duplicate file name opentelemetry/proto/common/v1/common.proto” error on startup, adding PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python seems to fix it (although I haven’t looked into why).