Auto-suggest course content on search (Meilisearch-compatible)

@braden Your implementation caught my attention. I am considering extending it to include MeiliSearch support for LMS content. If you foresee any potential issues, please let me know. Otherwise, I plan to proceed with developing a minimal viable product based on my current ideas.

1 Like

@braden I have recorded a quick video to show you how I am thinking of the implementation, please let me know how you feel about this feature?

@braden I have created draft PRs, I am sharing here on thread to discuss the approach I have take to implement this feature. Please it a look and let me know if you think it should be implemented in any other way.

Backend: edx-search
Frontend: frontend-app-learning

@qasimgulzar Could you please post your question in a new thread, and ping me there? I’m happy to give you feedback, but I want to keep this thread focused on feedback about using Meilisearch in production and evaluation of the new Studio search feature that we’re launching with Redwood.

I’ve taken the liberty to move this sub-conversation to a new topic :slight_smile:

1 Like

Perfect, thank you @regis!

@qasimgulzar Thanks for looking into this! Here is some early feedback:

  1. I would not recommend putting the new code in edx-search (which has a rather messy API) but rather in the new content/search Django app. Basically it’s a lot simpler that way, and we can avoid an unnecessary abstraction layer.

  2. Your demo is using the same Studio index as the Studio Search feature, but the Studio search index has draft content. So you will need to create and maintain a separate index of published content for use in the LMS.

  3. You will have to think carefully about permissions, because the rules for what content learners can see (and search) are much more complex than the rules for what instructors can search for in Studio. I think this will actually be the most challenging piece; everything else is fairly straightforward.

  4. The “auto-complete” style dropdown you’re showing is not very useful - one cannot tell the difference between the three results in the video. You should include a preview of the content of each result too.
    Screenshot showing several rows of results that are identical

  5. We have purposefully held off on implementing LMS search until we get more feedback from people using Meilisearch in production. So even if you implement this now, people may ask you to wait before merging anything. But I still encourage you to develop this :slight_smile:

I can’t agree with this strongly enough. The root problem is that the inputs for whether someone is allowed to see a piece of courseware content in the LMS are both complex and opaque. You basically call has_access on every little piece of content, and trust the byzantine logic in there to figure things out. It would include things like:

  • Any special user roles (staff, beta tester, etc.)
  • Enrollment and enrollment track.
  • Dates on content, including inherited dates.
  • Overrides on dates/deadlines.
  • Cohort membership.
  • A/B test group, randomized problem group.
  • Subsection pre-reqs
  • Whether something has been hidden from students.
  • Whether a deadline has passed + configuration.
  • Exam settings.

And that’s literally just stuff off the top of my head. We had a courseware content search system in the past, and it spent the vast, vast majority of its processing time running has_access checks on the returned results to see whether this user was allowed to see it. edX only ever turned the feature on for course staff (where those checks are simple)–the performance of the system was deemed too risky to turn on for students.

There are underlying systems here that could be leveraged. A/B test groups, enrollment tracks, and randomized problems from libraries all share an underlying representation in the user partitioning system. But I think the design for this would require grabbing all the user information that affects the permutations of course content that they see, translating that to a list of parameters that are captured on the content (e.g. the content groups), and passing that through as search parameters for each user for a specific time frame. And that’s a lot of careful work.

Thanks for the excellent summary @dave.

I think it might make sense to develop the next version of course search in such a way that it covers most of the “regular” types of course content and simply excludes altogether anything with complex access rules.

This is, the search index would contain:

  • Any content that’s been published, and is past its release date, and is not: randomized, A/B tested, part of an exam, hidden from students, cohorted, hidden based on prereqs, staff-only, etc.
  • Any content that is visible only to staff or specific cohorts (these cases involving broad groups of users are easy to account for)

And anything else (e.g. exam content, randomized content, A/B tested content) would simply be “unsearchable” in the LMS. At least for the MVP version.

(Later on, one could build a per-user “supplementary index” for such content, and join it into the search, perhaps only for a few users who enable an “complete search mode” for the course… but I’m not sure it’s worth the resources that would take.)

As you add further search functionality, it might be useful to have have some sort of UX in the CMS that indicates whether content will be searchable in the outline and edit views for components. I think it will be hard for everyone to reason about why some things are searchable and others aren’t if they’re not already familiar with the innards of the system.

1 Like

Great idea. If we went with that sort of approach, it would probably make sense to display it to learners too - e.g. some icon indicating “personalized content - this is excluded from course search”. Otherwise they would also be confused about why some content is found and others not.

I recommend integrating Meilisearch using searcher backends. I understand the goal of removing edX-search, but when adding a new search engine to the platform, we must ensure the integration doesn’t hard-code any specific search engine into the codebase. Since we’re moving away from Elasticsearch, and given the rapidly evolving nature of search engines, we might adopt another search engine in the future. Additionally, as the edX platform is used by various self-hosted clients, different users may have different preferences.

Considering these factors, I have created a PR to implement a search backend. I will also work on building a common API interface, as suggested by @Bradan, to avoid abstraction issues and handle content permissions. However, this will take some time.

Here is the PR