Hi all,
Thank you @nimisha for organizing this meeting. I wish I could have attended, but I have been stuck in bed with a fever since Thursday and it’s only yesterday that I started to get better and managed to stare at a screen again. It’s not the covid-19, although it’s related to the anxiety caused by the outbreak.
Anyway, I really enjoyed watching the recording of the meeting, where everyone was very insightful. It really helped clarify some of the technical decisions that were made, for instance on feature toggle annotations. I was interested to learn about the rejected alternatives in OEP30.
I still don’t have a very clear view of what kind of tooling the feature toggle annotations are supposed to enable. Also, I don’t really have a dog in that fight. I mean: yes, it’s important to document/annotate the feature toggles, but it’s less of a priority to me, right now, than producing a reference manual for Open edX. So I don’t have much interesting to say about how edX should implement feature toggles annotations.
Concerning the reference manual, I noted that the documentation we produce right now should be more or less independent of the technical implementation we choose to actually generate the docs. So I can happily keep writing unstructured docs for the settings and operations, and it shouldn’t affect the strategy we choose later.
Thus, I am not too concerned right now by the tool we will use to parse the feature toggle annotations and generating the reference manuals. However, this does not prevent me from having a strong opinion on it It’s one of those subjects where I’m unlikely to change my mind, but it’s OK if things don’t go the way I think they should. (I realize more and more questions fall into that category as I am getting older…)
Basically, we need to decide between two options:
- Use standard Python/Django tooling, such as Sphinx
- Use tooling that is custom-built for edx-platform/Open edX
This is not a binary choice though, and a few alternatives have been mentioned during the talk that borrow from both options.
My “not too concerned but very strong” opinion is that we should rely more on standard than custom-built tooling. This is what I tried to achieve by relying on autodoc-supported docstrings in my reference documentation pull request. Let me defend this proposition against the arguments raised in the meeting:
Sphinx requires module imports
@jmbowman You mentioned that the problem with docstrings is that sphinx needs to import the python modules in order to parse the docstrings. As such, a complete virtual environment with python dependencies is needed to generate the docs. I don’t see why this is a problem, in particular with Devstack or other Open edX docker images? After all, we all have a working development environment on our laptops and and CI machines. It’s relatively easy to pip install
all the dependencies from an Open edX project, even on a host machine (provided it’s UNIX-based and recent).
On the other hand, if we need to switch to a different, custom kind of tooling, then the environment used to generate the docs will have to be the same for all Open edX projects: we will have to share the same “docs” virtualenv across all Open edX repos. How will that work with the devstack? Does that mean we need to create a new virtualenv inside each image, or create a dedicated image that mounts all repos inside the container at runtime?
Module imports cause bad code to break
It was also mentioned that importing some Python modules may cause issues because they might be doing some ungoldy things, such as parsing sys.argv
and whatnot. This is true: bad code will cause the documentation generation to fail. And this is a good thing. It gives us an opportunity to detect and improve bad code in the Open edX codebase.
Call me an idealist: I believe that collective effort can help improve the overall quality of the Open edX codebase.
And anyway, what’s the alternative? Developing a custom, complex tool to work around the difficulties of the edx-platform code base? The code-annotations repo currently clocs at 2k+ lines of code. It’s probably all very good code, but it’s yet another tool that Open edX developers need to master in order to understand what goes wrong when building docs. The effort that have gone in developing that tool could have been directed toward improving the rest of the Open edX codebase.
This is a missed opportunity.
This documentation effort could be the right excuse to make edx-platform more like other Django projects, which is a good thing because it makes the project more approachable and understandable by the rest of the Django community. The way to do this is to remove one by one the weird bits of code that lie around and which force us to jump through non-convex-shaped hoops all the time.
Some modules are only importable under certain conditions
Importing certain modules may break because these modules will only work either with the LMS or the CMS, or when certain stars are aligned. I don’t have an example in mind of when that might occur but that’s ok. @robrap provided an excellent answer to that problem: refactor the problematic module such that the feature toggle annotation is moved to an independent module. I.e:
myweirdmodule.py:
if os.environ["cute_flag"] == "cutifyallthethings":
# Feature flag to accelerate overall cuteness.
# .. toggle_name: overall.cuteness
# ...
MY_CUTE_FEATURE_IS_ENABLED = True
becomes:
if os.environ["cute_flag"] == "yesiwanttoenablethis":
from .cutefeature import *
with cutefeature.py:
# Feature flag to accelerate overall cuteness.
# .. toggle_name: overall.cuteness
# ...
MY_CUTE_FEATURE_IS_ENABLED = True
In a way, this is the same issue as the previous one: writing the docs gives us the opportunity to refactor the code in a cleaner, more modular way.
Docstrings cannot document blocks of multiple settings
This is something that @nedbat mentioned. The given example was the documentation of the following AWS_*
settings https://github.com/edx/edx-platform/blob/fdc98696b15f8f2e595c056db76526e208e37140/lms/envs/common.py#L1337:
#: S3BotoStorage insists on a timeout for uploaded assets. We should make it
#: permanent instead, and yada yada yada.
AWS_QUERYSTRING_EXPIRE = 10 * 365 * 24 * 60 * 60 # 10 years
AWS_SES_REGION_NAME = 'us-east-1'
AWS_SES_REGION_ENDPOINT = 'email.us-east-1.amazonaws.com'
...
Here the docstring will be used to generate the docs only for the AWS_QUERYSTRING_EXPIRE
and not the other settings. This might look like a problem, but it isn’t. And we might be tempted to solve it by bundling these settings in a dict
, but we shouldn’t.
To understand why, let’s look at how other people in the world address this problem. For instance, Django has many EMAIL_*
settings: https://docs.djangoproject.com/en/3.0/ref/settings/#std:setting-EMAIL_BACKEND
Because Django does not bundle these settings in a dict
, that means that we can’t do it either. (and by the way, we have the same problem with boto3 and social-auth)
But that’s OK! It’s actually better if we document each setting individually. Then, we can link the docs for each setting to a reference setting where the bulk of the explanations will be given, using standard Sphinx tooling. Something like:
#: Activate the yadayadiness for students. See :ref:`lms.envs.ENABLE_YADA` for details.
ENABLE_YADA_FOR_STUDENTS = True
Conclusions?
As I (and others) have mentioned, code annotations and docstrings are not incompatible. Also, it shouldn’t be difficult to switch from one approach to another further down the road. That means we don’t have to make a hard technical decision right now. And whatever technical decision we end up making, I’ll be happy to go along with it. However, I do not want to invest my time in the BTR working group developing a custom tool for generating docs or annotations – at least, not right now. That means that I’ll keep working on producing unstructured docs for settings, operations, and maybe API endpoints, knowing that we may have to migrate them in the future.