Plugin slots vs configuration

Ever since frontend-plugin-framework introduced the concept of a <PluginSlot />, whenever I run into PRs that add configuration to control UI I’ve been asking: wouldn’t it be better to make it a slot instead?

There have been a few examples of this, recently, and I’m not even counting the several instances of business-specific code we’ve asked to be made into plugins:

What I’d like at this point is a temperature check on my stance, which is as follows:

1. We have too many toggles and configuration variables already

We’re adding over 20 individual variables in Redwood, and most of those just for controlling Studio UI behavior. This adds to the hundreds that already exist, some documented some not.

The point is that the more toggles you add, the worse you make it on operators, whether they’re upgrading from one release to the next or just starting from scratch. We should consider each new toggle as a detriment to adoption, to be weighed carefully against the value in optionality that it brings to the table.

2. We don’t have a unified guideline on when, where, how and configuration should be created

Some are feature flags in settings.py, some are waffle flags in the database, some are MFE-specific environment variables (which can depend on other backend configuration themselves). The latter, in turn, can either be configured via .env files, via MFE_CONFIG in settings.py, or via site_configuration in the database… some of which work with actual booleans, some only with string values. Oh, and now we have env.config.js, of course, which adds yet another configuration vehicle.

In short, it’s usually up to the developer implementing the toggle, who also decides whether or not to document it correctly. All of this compounds the adoption problem described on 1.

3. Config variables are APIs we have to support

Once a toggle exists in a release, removing it means it has to go through the DEPR process. This means that with each new configuration item we’re introducing complexity in the codebase that can’t be easily refactored away: both sides of that if() have to be maintained for the lifetime of the toggle.

4. Some configuration strategies can make UX objectively worse

From making bundle sizes bigger (the else needs to get bundled, too!) to requiring new requests to the server and throwing a wrench in react-router soft navigation (looking at you, course-authoring!), sometimes even resorting to configuration can make the user experience not as good as it would be otherwise.

5. Why add configuration when you can let the user create a plugin?

I don’t mean to say with 1-4 that configuration variables are always bad, or even avoidable. Sometimes, it’s just necessary to allow users to flip switches easily. But - and here’s my point - many other times a single config var is not enough for the user to achieve the level of customization they need. This is the worst possible situation, in that we as maintainers of the platform have to deal with the drawbacks of configuration, and the user still has to fork.

Enter plugins. Can they suffer from some of the same drawbacks as config vars (specially if we do them wrong - which we’re trying to avoid)? Yes. But the flip side is that they give users much more control. So much so that whenever we find that optionality is necessary, we should avoid relying on a regular configuration variable (whether feature flag, waffle flag, or environment variable), and instead attempt to present the it as something a plugin can modify.

Thoughts? Objections? At some point I intend to extract an OEP, or at least an ADR, out of the conversation. Or maybe more than one: say, one to define when-where-how to create configuration, another for plugins, and maybe even a plan to deprecate and remove old flags that have no business still existing.

5 Likes

Still thinking about all of the above, but have a few thoughts that maybe help expand or frame the conversation.

  1. One aspect of it that I don’t see accounted for is that with all of these options, we still don’t have a good or reasonable way of fully sharing a toggle between the backend and an MFE without letting it ride to the front on some API request. I feel like whatever approach we converge on (and I think we are likely asking to converge on some solution?), it should support defining a toggle somewhere and making it easily available to the backend and frontend together. There’s a lot of power in WaffleFlag and the subclasses that extend it, for instance, that we completely lack on the frontend. Want to roll a feature out for a course? Some particular users? Tough luck without doing a bunch of bespoke config sharing or duplicating the flag’s config.

  2. A while back we briefly talked about sharing flags via the MFE config API, but I don’t think it ever went anywhere.

  3. Another thought alluded to above - I do think it’s useful to think of this in terms of architectural “expanding” and “contracting” as we move toward a better config/customization system. We’ve been adding new configuration/customization options for a while, expanding without getting rid of the old, and at some point it should “contract” and become simpler. This post is kind of flagging that we’ve added a ton of complexity and maybe it’s time we started cleaning it up and converging on a few good, powerful, expressive, and understandable mechanisms. For real.

4 Likes

To add a little fuel to the fire, I think it’s worth re-reading the now 6-year-old OEP-17: Feature Toggles and reflecting on whether the motivation and use cases as stated there are as important (or even, still valid) for the project in general. Notably, how important are incremental releases, timed releases, 99.99% uptime, monitored rollouts, and beta testing to the community at large?

It’s also interesting to note that we’re running into the very pitfalls that OEP warns against: additional code complexity, “explosion” in testing permutations, “forgotten latent unused code paths”, and, which is what drove me to write the top post, “using different standards, strategies, testing procedures [for feature toggling] leads to confusion, production failures, and long-term maintenance issues”.

I think that the benefits of feature toggles are still relevant, the problem is that they have started to sprawl and they are not easy to discover without intimate knowledge of the entire scope of development. Waffle flags also suffer from the problem that you noted of not being reusable across process boundaries (read edx-platform and MFE, etc.). I think that if, as a community, we can migrate all toggle usage to be against a common interface that supports a centralized management plane for those toggles, then it simplifies the enablement of features that span process boundaries, which is increasingly the case. The OpenFeature project which is incubating in the CNCF is a good target because it allows for the system operator to decide which backend to use without having to do any customization in the edX code. As a community supported option the FlagSmith system is worth investigating given that it is open source and implemented in Django.

As to your point on the <PluginSlot/>, I do think that having well-documented and well-supported plugin interfaces is a valuable extension mechanism. I think that part of the problem in the edX space is that the level of support and documentation for those plugin interfaces has not always been evenly applied, and there is starting to be a proliferation of interfaces, similar to the problem we are facing in the feature toggles space. If we can work as a community to specify exactly what interfaces are actively used and supported, and what capabilities those interfaces provide then I think we will be in a better spot going forward.

5 Likes

(Note: I wrote this before reading anyone else’s reply)

I don’t think adding toggles is a problem, it’s removing them that’s the problem. Specifically, we’re not doing it like we planned to. In contentstore/toggles.py alone, there are 24 toggles with a “toggle_target_removal_date” in the past. One of which was supposed to be removed exactly three years ago!

Perhaps we should consider replacing the “toggle_target_removal_date” comment with a toggle_expiry_date that waffle takes into account, so that the toggle automatically turns on by that date, and it’s obvious to everyone that it can/should then be removed.

100% agreed this is a big problem. Though for rolling out new features, I think it’s pretty clear that waffle flags are the way to go because of how much control they give over roll-outs (i.e. on a per-user or per-course basis). Except that waffle flags can’t directly affect MFE behavior, which leads to a whole set of problematic workarounds as you mention.

Unless it’s a rollout flag/toggle, which had a clearly defined “target removal date” from the beginning, in which case it can just be removed.

Because currently our plugins are not that easy to install nor configure? (see below)

6. Why don’t we have an admin panel?

I believe some of these problems would be resolved if we had a proper “Open edX Admin” panel, where admins of various levels could log in and configure every aspect of Open edX from basic settings, policies, theme, plugins, etc.

Yes, we have the Django admin, but it will happily let you shoot yourself in the foot if you want to. It’s really only suitable for admins who are very careful about what they’re doing. It also exposes a lot of student PII to anyone authorized to use it. It also can’t do things like install plugins or change themes. I mean a new admin area that warns/prevents you from doing anything destructive, but which guides you in configuring things you want to configure, exposes all the possible settings, and lets you install themes/plugins from the community. It should of course be a very simple shell with a plugin architecture.

Imagine if you could browse the list of community plugins and then just check a box to enable that plugin for your system. Or click “Advanced…” and configure its settings.

There’s also no need to reinvent the wheel - react admin is already an extremely nice framework for building such a thing, and it mostly uses the same technologies as [we want to in] our MFEs (React, React Router, React Query, TypeScript), except that it uses Material UI instead of Paragon. And Wordpress provides great examples of how to do all of this stuff.

4 Likes

Commenting to bring this back to life.