Sandboxing Instructor Code with WebAssembly (instead of codejail)

Last week I ran into an issue where I was trying to debug an issue around ProblemBlock when it was executing instructor Python code (customresponse). Normally this runs through safe_exec and codejail, which is supported in Tutor via a plugin. I ran into some issues getting this running properly on my Mac though, and I ended up just hacking something on my local machine (using unsafe execution) in order to diagnose my original issue.

Coincidentally, I had also recently read this interesting article on Python WASI support. Python 3.11 has added support for building to wasm32-wasi (though only at tier 3). That means we can invoke it from a WebAssembly runtime like wasmtime.

So with a quick bit of late night hacking, I came up with this proof-of-concept that uses WebAssembly to execute instructor code instead of codejail:

(WARNING!!! THIS IS TERRIBLY HACKY PROOF-OF-CONCEPT CODE THAT YOU SHOULD NEVER, EVER USE IN PRODUCTION!!!)

What can it do?

It will allow customresponse code to access context passed in from the ProblemBlock, like the anonymous_student_id. Variables you set in your customresponse python will also be returned to the problem, and can be used in the HTML presented to the student. Grading also works. All executing through a WebAssembly sandbox.

All code execution happens on the server-sideā€“there is no wasm code being sent to the browser. Just think of it as a WebAssembly version of codejail.

It doesnā€™t support the code prolog though, and there are no libraries like scipy installed. Also, it doesnā€™t do the trick we do with random2 to emulate Python 2.7 random module behavior, so the values you get in randomization arenā€™t going to be the same.

How does it work?

It puts an entire wasm32-compiled Python 3.11 binary and lib in a new directory in edx-platform. We construct the script thatā€™s going to run in a special folder, write the globals the sandboxed script is supposed to see via a JSON file, execute the script through wasmtime, and then read out another JSON file for the values to pass back.

FWIW, coding it this way was just to make debugging easier, and any real solution would use dynamically generated tempdirs, or possibly some env related variable passing mechanism, or maybe even interface types, whenever those stabilize and are widely adopted. Any bundled Python images would also go somewhere else. Again, this was a quick hack just to prove the concept could work.

Why is this exciting?

Iā€™m excited by the idea of WebAssembly for a number of reasons:

Ops/Deployment

  • This should simplify deployment. Whether itā€™s running in the LMS process or on a separate service, itā€™s not going to require AppArmor, which should make it easier for non-Linux platforms to set up and run.
  • We can potentially remove scientific libraries from edx-platformā€™s dependencies.
  • The wasmtime runtime has a notion of ā€œfuelā€ that you can give to an invocation, which is a much more deterministic count of instructions executed than the time-based mechanisms we use for codejail. Weā€™d still have to guard against malicious sleep() calls and the like, but doing most of our accounting with fuel credits means that we wonā€™t be susceptible to sporadic failures for really intensive problems that happen to be executed during server restarts or other times when the system is under unusual CPU load.

Instructional Possibilities

Open edX has always been a go-to platform when you really have a particular learning experience you want to deliver and are willing to roll up your sleeves and code it yourself. Embracing WebAssembly could supercharge the things we already do well in this department, and serve as a springboard for new innovation. Just a few examples:

  • We can offer course teams multiple versions of a given set of scientific libraries for their course, along with a richer offering of libraries for their grading. We can eventually reach a point where we donā€™t have to choose between breaking content and updating a library to a non-ancient version.
  • If weā€™re offering multiple versions/options (e.g. via backwards compatible, opt-in attributes on the <script> tag in ProblemBlock), we might be able to offer all kinds of other options as well. Rust, JavaScript. Or even variants of Python like MicroPython, which are much smaller in size and memory usage than CPython, and may be more suited for more dynamic grading that doesnā€™t need scientific libraries.
  • Could we trust this enough to execute student code?
  • We could potentially open up many more grading and async grading possibilities, and the sharing of graders through libraries. The line between ā€œnew XBlockā€ and ā€œJS frontend + backend grader contentā€ gets a lot blurrier.
  • We could contemplate running course team code in places weā€™ve never seriously entertained it before, like analyzing events pulled from OARS.

Itā€™s not that we couldnā€™t do these things before, necessarily. But I expect it will become a lot easier to implement them in the next year or two, and that ease will enable a lot of innovation.

What are the challenges?

This space is rapidly developing, but there are still rough edges. Even if Python compiles to wasm now, many interesting libraries are not yet available.

For instance, one major hurdle would be getting the full scientific stack running, because wasm support is lacking in Fortran compilers. Pyodide gets around this by translating Fortran to C and then compiling that, but that only supports older versions of Fortran, and they canā€™t update to even the relatively old versions of those libraries that edx-platform uses. That being said, lFortran looks like theyā€™re really focusing on SciPy support (LFortran can now parse all of SciPy to AST - Announcements - Fortran Discourse, LFortran Breakthrough: Now Building Legacy and Modern Minpack -), so Iā€™m hopeful that will work itself out in the coming months.

Whatā€™s next?

All these possibilities sound like fun, but I think the first step is to seriously evaluate the feasibility of creating a backwards-compatible codejail alternative using WebAssembly, and to make safe_exec optionally use it. That will mean prototyping, figuring out a rough plan for getting the versions of the libraries we currently support, analyzing the security story, measuring performance and memory usage, determining tradeoffs between various wasm runtimes, etc.

Once we have a decent start on that, I think we can start to take the lessons learned and start having more forward-looking conversations about what else we could do with this in platform.

More to follow on this, but Iā€™d love to get thoughts from folks. Also, if anyoneā€™s already been working on this, please let me know! Iā€™d love to pick their brains. :stuck_out_tongue:

4 Likes

Note that shifting codejail to WebAssembly and moving it to a service separate from edx-platform are related but separate efforts. Thereā€™s a ticket for the latter with some recent activity at Move codejail to its own new service Ā· Issue #31517 Ā· openedx/edx-platform Ā· GitHub .

1 Like

This is really cool @dave - thanks for sharing!

As you can probably guess, I love this idea and am very excited by it!

I previously explored if it would make sense to author the logic of XBlocks themselves in WebAssembly as opposed to Python, but came to the conclusion that JavaScript made more sense than WebAssembly for that use case, because until wasm gets garbage collection, you canā€™t write code in nice high level languages. I still think we could gain hugely from letting course authors write mini XBlocks with their own JS code, and have it run in a sandbox, and thatā€™s trivial to do with JS but difficult with python (unless you go this wasm route, which is great but has way too much overhead for the XBlock logic use case).

Anyhow, for this use case, I think wasm makes a ton of sense.

I think this approach is stronger from a security perspective. Python doesnā€™t have the security primitives you need for sandboxing; codejail works by taking a dangerously permissive python runtime and applying a bunch of OS rules to lock it down. If there is any mistake in how itā€™s locked down, you have a security issue. On the other hand, running WebAssembly in this way (from a non-JS runtime) inherently creates a sandbox, and there arenā€™t even APIs for the code to access things like the host file system. (A fake in-memory filesystem is presented to python if needed.) Plus as you mentioned, the use of ā€œfuelā€ is great for managing load.

Yes! This would also solve the potential problem we found in the copy/paste work, where people copy a problem block then paste it into a course with an incompatible python_lib.zip - instead, let each individual problem select its own runtime.

Is there a list of these packages somewhere, or is that part of what we need to figure out?

Iā€™d be happy to help with this effort.

I believe itā€™s the requirements files here:

Iā€™m hoping the pure Python ones just work out of the box. I think weā€™ll need to wait until LFortran improves for the Fortran-based ones. The ones that are part C/Rust are a big question mark to me.

It looks like the Pyodide folks got the latest cryptography versions building recently, though thatā€™s a slightly different target since they use Emscripten.

I love the concept of WebAssembly, and seeing this proposed used of it makes me warm and fuzzy. Cheers!

Allow me to ask a dumb question, though. Would it be a horrible idea to allow operators/instructors to optinally run the wasm code in the browser? Potentially with some cryptographical signing to prevent tampering with the binary. (Yeah, as hand-wavy as it gets, I know. Hence, the question.)

Thatā€™s not what Iā€™m seeing? It looks like the versions are broadly compatible to me, and mostly identical. I think Pyodide may get us what we need already.

Package Open edX version pyodide version
cffi 1.15.1 1.15.1
chem 1.2.0
click 8.1.3 8.1.3
codejail-includes 1.0.0
cryptography 38.0.4 39.0.2 :arrow_up:
cycler 0.11.0 0.11.0
joblib 1.2.0 1.2.0
kiwisolver 1.4.4 1.4.4
lxml 4.9.2 4.9.2
markupsafe 2.1.2 2.1.2
matplotlib 3.3.4 3.5.2 :arrow_up:
mpmath 1.3.0 1.3.0
networkx 3.1 3.0 :exclamation:
nltk 3.8.1 3.8.1
numpy 1.22.4 1.24.2 :arrow_up:
openedx-calc 3.0.1
pillow 9.5.0 9.1.1 :exclamation:
pycparser 2.21 2.21
pyparsing 3.0.9 3.0.9
python-dateutil 2.8.2 2.8.2
random2 1.0.1
regex 2023.5.5 2023.3.23 :exclamation:
scipy 1.7.3 1.9.3 :arrow_up:
six 1.16.0 1.16.0
sympy 1.12 1.11.1 :exclamation:
tqdm 4.65.0 4.65.0

I think itā€™s very doable and could be a nice solution, except for two cases:

  1. If the question is for marks, you donā€™t want to send it to the browser because it can be decompiled and reverse engineered to figure out the logic / the answer. Or, as you said, the binary could be tampered with and substituted for something else. (Though if someone can go to that much trouble maybe they deserve some marks anyways :stuck_out_tongue: )
  2. If learners are using a very low-resource environment, it may be preferable for them to run it on server instead of in the browser? Though from a quick test of the online REPL it seems to only use about 50MB of memory to run python and import scipy and numpy which isnā€™t too bad.

So I think that if the question is just a practice question or code that illustrates something interactively, it totally makes sense to run in the browser. Otherwise, itā€™s probably not worth the effort to develop the code signing and/or obfuscation youā€™d need to overcome the trust issues.

Okay, I donā€™t know what I was looking at before, but that is fantastic news.

I donā€™t know how much work there is left to do the conversion though, since their stuff assumes the browser bindings (emscripten rather than WASI). Would those libraries work out of the box since they donā€™t interact with the UI?

Right. I think it makes sense to think of the two in a very decoupled manner to start with. You can put pyodide in your XBlock today and be using nice notebooks to work through your calculations, and then put the answer into a boring ProblemBlock response. Or you can have a pretty JS-powered frontend diagram that submits an answer to a python-wasm powered script for grading.

Yeah, after looking a bit more, it seems that Pyodide is too tightly bound to emscripten and assumes a JS runtime, so it could not be used with wasmtime (which can run the wasm parts but not the JS parts). A solution could be to use STPyV8 to run the emscripten bundle, since it can run JS+WASM (I think), or otherwise it would be necessary to go with your original approach and use Python 3.11 compiled to wasm once the various binary packages can also be compiled the same way.

Random thought I had last night, for whenever we build a standalone service version of this: Could we speed up the code by pre-initializing the sandboxed interpreter up front, and then forking it off into a subprocess when running the actual sandboxed code? I have no idea if this is really feasible given the shared file handles and relatively limited options for communicating with the executing wasm code, but itā€™d be something like:

safe-exec-wasm main process

  • reads config
  • dispatches requests
  • initializes python-wasm config and executes imports and prolog code/patching
  • forks new worker process whenever a new request comes in and sends it the unsafe code to execute.

safe-exec-warm worker process

  • handles a single request in terms of code execution

Maybe this could work if the shared file handles (necessary to read the libraries) are read-only? But I donā€™t know what sort of communication the child process could do to send the unsafe code for execution. We can get that code from the parent process to the child worker process easily enough, but I donā€™t think we can alter the WASI config environment in mid-execution. Itā€™d be convenient if we could just slip in an in-memory filesystem and send that to wasmtime (since the memory would be copied), but my understanding is that wasmtime works at a lower level of abstraction and is writing directly to the filesystem.

We can also have a variation of this where the workers pre-fork and are listening on the same port for request handling, so they do request/response and die off and the main processā€™s primary job is to monitor/fork and kill child processes if they timeout, much like gunicorn does.

This is definitely not the immediate priority, but itā€™s tantalizing to me because if we can remove the startup overhead, a whole lot of possibilities open up.

Wait, actually, I wonder if running a simple app in WSGI under gunicorn with max_requests=1 and the appropriate preload related options would do the thing weā€™d want?

Still leaves the last mile problem of switching input/outputā€¦ I wonder if wasmtime either supports or could be modified to do late binding of stdin/stdout to files (they support up front binding of those to files already). If that were the case, the forked processes could slip in their own temp files for stdin/stdout, and use thatā€¦