Evaluating Meilisearch

Review

In the thread where we first discussed trying out Meilisearch (Is Meilisearch a viable upgrade alternative to OpenSearch?), I wrote the following:

With another followup message:

Now that Redwood is set to release on Monday, and others seem to be interested in using Meilisearch for more things, now is as good a time as any to have that discussion.

As far as I know, the big drawback that’s been identified so far (by @blarghmatey) is lack of high availability (HA) support outside of their cloud hosted service:

@braden covers some of the nuances of Meilisearch and HA support here:

As far as I’ve seen, the HA situation remains unchanged since it was last discussed in that thread.


Next Steps

@braden (and anyone else who has used Meilisearch, on platform or off): Could you please summarize your development experience with Meilisearch?

Site operators: Is anyone planning to turn this feature on when Redwood releases? Have you been testing with it already? Please comment here with your operational experiences, or let us know if you plan to run it soon and are willing to provide feedback then. If you have thoughts around high availability, please add those as well.

Thanks folks.

3 Likes

For more on the rationale to experiment with Meilisearch, please see the architecture decision record for it.

Hey!

I used meillisearch a while back when developing a product catalog for a website I was working on. My implementation used strapi as an integrating CMS and digitalocean to host the instance of meillisearch. All things seemed quite straightforward, their documentation is pretty good and their user interface for testing seems to be quite helpful.

I really liked their typo tolerance, searchable fields and field priority options, though I was not a fan of the complex nature of their filtering calls. Overall, great choice to add to the platform and I look forward to the implementation.

1 Like

I’ve been following the progress from Braden on the Meilisearch front and we have been having a few discussions in the large instances WG.

As an experiment I decided to run the indexing of the courses in one of our larger installations. We have around 40k courses, so I started testing. That instance is running on an old version of Open edX (nutmeg), so I had to hack a backport of the original Braden’s PR.

Initially I noticed that I couldn’t start the Indexing in our instance, we identified the problem and Braden landed the fix recently.

After that I just let the job run, the whole thing takes several days to go through all the 40k (it still hasn’t finished). It had indexed around 800k blocks for 8k~ courses.

I just performed a few queries in the Meilisearch dashboard, my first impression is that is really snappy, Queries take about 5ms to perform. Resource usage is at around 1.600MB of memory with close to no CPU usage (this is an environment without traffic).

My take is that it seems better overall in comparison with ES. The footprint is minimal, and even when filling up it doesn’t seem to crazy. I deployed in a k8s cluster with a single pod, mostly copying what Braden has in the tutor-contrib-meilisearch plugin.

I’m not that concerned about the HA problem, We don’t actually deploy that many ES clusters, even for large instances, mostly because search isn’t really a critical path in the overall experience so downtime there isn’t as terrible as it could be for Redis for example (no broker for celery).

I don’t know if we are going to enable it in redwood, but no so much because of Meilisearch but because we first want to get familiar with the actual new search feature first.

2 Likes

As a developer, Meilisearch has been extremely easy to work with. It has great documentation, and I really like how the usage examples with the official python and JavaScript clients are integrated into the documentation, so you don’t have to learn them separately.

Generally it has been painless to work with, and most things I tried “just worked” out of the box. The API was relatively straightforward, and both indexing and searching were easy to implement.

What’s more, they seem to be developing it quite actively. Between the time when we first started integrating Meilisearch and the Redwood release, they released two new versions including a very nice feature that we wanted (negative keyword search).

On the frontend, we first integrated it using Instantsearch, which worked really well. But things got a little more complicated when we needed to implement filtering by [multiple] hierarchical tags, including a keyword search field to refine the list of hierarchical tags. It turns out that neither Instantsearch nor Meilisearch support this (<HierarchicalMenu> does not allow multiple selections, and facet search doesn’t support a hierarchy nor keywords that occur in the middle or end of a tag value). So we had to replace the Instantsearch widgets on the frontend with a custom UI and a custom “search manager” built using React Query. This was actually not too difficult, and I’m happy with the result, which doesn’t have much more code but allowed us to remove Instantsearch as a dependency (it turns out that React Query provides a lot of the functionality we were getting from Instantsearch). Then I had to figure out how to use the Meilisearch APIs to achieve the functionality we need, even though it doesn’t technically support that use case. The approach we ended up with is a bit of a hack, though it should work for most cases. (Now, I found out this week that Meilisearch is getting a new feature, distinct attributes at search time; once that is implemented, I believe the “hacky” solution will actually work correctly in all cases. I’ve been corresponding with the Meilisearch team about this on Hierarchical Facet Search · meilisearch · Discussion #735 · GitHub .)

The other downsides I’m aware of:

  • Users: No boolean operators for keywords (x AND y), but it does support "exact phrase" and -negative keyword search. Since we have those operators and pretty advanced filtering, I don’t think the lack of boolean keywords is a big deal.
  • Operators: The upgrade process for major versions is a bit annoying. Well, it’s easy if you don’t mind deleting and rebuilding your whole index, such as on devstack, but on production where you probably need a faster solution, it requires you to create a dump, upgrade, then import the dump manually. This is the sort of thing that could be automated by Tutor’s upgrade workflow, though, like we already do for MongoDB etc.
  • Operators: Still no true HA support, though it may come in the future.

Out of curiosity, was it Meilisearch that was the limiting factor here, or was it the modulestore querying to collect the data to feed into Meilisearch? Several days isn’t the end of the world, but we could probably get a lot of speedup with more parallelism on celery workers if modulestore is the bottleneck–and that info might be relevant to others with large migrations to consider.

Its mostly spent processing the data, the actual indexing isn’t sweating.

I think parallelizing would be a great way to improve times.

Yup, our current code for the initial index is a very basic single-threaded single-worker all-courses approach, and it waits until the job has completed before it makes the index available. I’m sure there’s a ton of room for improvement.

In particular, I’m thinking we should add the ability to just create (and immediately start using) a blank index, and then queue a bunch of tasks (one for each course?) to index each course. These tasks would then get executed in parallel by however many celery workers are available.

It’s also worth pointing out that as far as I know, almost nobody has tried this out yet; Redwood hasn’t even technically released. So I’m hoping we’ll get more detailed feedback once more operators have upgraded to Redwood and elected to try it out.

Just wanted to mention here that there is a separate conversation to run Meilisearch for course search in the LMS: Auto-suggest course content on search (Meilisearch-compatible)

Yeah, good point. I think that means that we need to stretch out the transition to this another release. Right now it’s default-off in Redwood, but not required. I think we had hoped to have it just be on in Sumac with no toggle, but it probably makes sense to at least have the option to toggle it off in Sumac.

@braden (or anyone who’s familiar): How hard would it be to make our search implementation pluggable to use either Meilisearch or Algolia via configuration?

I ask because it seems like Meilisearch is built to the same basic outward features and architecture, and Algolia is established enough that it should alleviate concerns around HA and probably provide a better upgrade experience over time. It’s also popular enough where many organizations might already be using it as part of their stack (e.g. 2U runs it for their catalog browsing experience).

As you said, Meilisearch and Algolia intentionally have pretty similar APIs, design, and overall architecture, so it would definitely be a lot easier than, say, supporting Meilisearch and Elasticsearch.

Nevertheless, we’d have to write our own small abstraction layers on both the backend and the frontend. This is required on the backend because I’m not aware of any existing python abstraction layer that supports these two. And it would be required on the frontend for the same reason I mentioned earlier in this thread - because although there is a very nice frontend abstraction framework (Instantsearch), it doesn’t support key features we need (multi-select hierarchical filters, keyword refinement of hierarchical filters). (And for the record, I first tried extending Instantsearch to support our needs, but found the code was too abstruse; it was literally faster to write a replacement for everything we used from Instantsearch than to extend it to support this one feature.)

And then of course, supporting two search engines means more complexity in testing, more work in resolving bugs that only affect one or the other, etc.

So: I think that would be a reasonable path forward if adopting Meilisearch is a no-go for a stakeholder like 2U. But I’d definitely prefer just supporting Meilisearch for simplicity if that’s feasible.

That’s really disappointing. I had hoped that it wouldn’t be such a big lift to support both simultaneously.

To be clear: my understanding is that 2U currently has no stance on whether they’d want to use Algolia, and does not have interest in maintaining any such plugin. I’m curious if there are any other Algolia users out there who definitely would want to use it with Open edX.

I do think that there’s a lot of value in leaving the door open for other search engines of this style (Algolia, Typesense), both as a hedge against needing to move off someday and as a way to offer the choice of a more robust (albeit commercial) alternative for folks who are more risk averse. But yeah, I can’t justify doing extra work to make sure that extension point works well if it’s costly to do so and there’s literally nobody interested in running alternatives.

So I guess I await other comments on this thread. :stuck_out_tongue:

Do you think there are things we should be doing at the moment to better contain Meilisearch on the front and back end, to make an extension point easier to build later on?