There have been many healthy conversations around the search functionality for Open edX and what providers are/should/could be used on the platform. A few options that have been discussed are: Meilisearch, ElasticSearch, OpenSearch, Typesense, Algolia (any others?). As I research what search providers may be best for any type of institution I began to wonder why we need to decide at all.
So my question is: what are the limiting factors around providing clients on the platform the ability to “choose their own search” rather than the Open edX Platform prototyping and implementing specific search functionality out of the box?
Currently, we do provide support for both elasticsearch and Meilisearch across many parts of the platform, but this is done by sticking to very basic usage of each search engine and using custom abstraction layers for each use case. Things like content libraries which use a lot of advanced search features only support Meilisearch as you’ll see below.
As far as I know, there is no up-to-date python library for abstracting search functionality like indexing across the various search engines. Something like SEAL (PHP) but in python. There was anysearch but it only supports the two most similar (elasticsearch/opensearch) and is not maintained anymore. I am not even aware of a maintained abstraction layer for elasticsearch+opensearch and they’re the two that are most similar by far.
There is a fairly nice abstraction layer for the frontend, instantsearch.js, but it only supports Algolia + TypeSense + Meilisearch, and not the older generation engines (elasticsearch/opensearch). There is a separate searchkit frontend which supports only elasticsearch/opensearch and is partially API compatible with instantsearch so you can re-use some instantsearch components, but as I understand it’s still a separate library.
We started using InstantSearch.js for the content libraries project but ran into problems when we needed to support filtering by multiple hierarchical tags. Instantsearch doesn’t support this and said they’ll probably never support it, and when I tried to implement it myself within Instantsearch I found that their codebase internally was incredibly complex and difficult to work with. So I replaced Instantsearch with direct use of the Meilisearch API using React Query and it not only let us implement all the features we want and gave us more control, but it decreased the overall size of the frontend bundle.
(Above: if you also want keyword search within the tags filter (e.g. “math”), as we do, it’s difficult to find a search engine that can support this at all, not to mention an abstraction layer for it. We managed to get Meilisearch to support it, but it requires pretty complex use of their advanced APIs.)
We areinvestigating supporting TypeSense + Meilisearch as two alternative search engines going forward, though based on the outcome of that investigation we may also decide to just support one option (either Meilisearch or TypeSense). We’re especially interested in hearing from operators of large Open edX installations who have experience with either one at scale or who are willing to help us compare them at scale.