We are considering moving the Open edX services that use Elasticsearch to use OpenSearch instead due to better support from AWS. We understand that there may be concerns regarding this move, and we wanted to start a discussion with the community first. Because of this, there is no acceptance date attached to the ticket as of yet. We will make another announcement if that changes.
Thank you for bringing this discussion into a public venue so that we can establish consensus as a community.
My personal preference in this situation is to side with OpenSearch for a number of reasons:
- The contributors have committed to contributing the entire project to a foundation to ensure there is no single corporate entity controlling the roadmap
- It has adopted the Apache 2 license which is OSI approved and compatible with the AGPL
- There is an AWS managed option for running OpenSearch which simplifies deployment options for a number of organizations that run their infrastructure on AWS cloud resources
- There are a number of organizations which have aligned themselves with the OpenSearch project beyond just AWS, including Logz.io, Instaclustr, etc. (Partners · OpenSearch)
While I understand the argument from Elastic, Inc., their overall approach to the situation and their willingness to be actively hostile to end users makes me hesitant to commit to locking the Open edX community into their ecosystem. In addition, their adopted license is not compatible with the AGPL that governs the Open edX project.
On the note of not having an acceptance date, I can appreciate the desire to ensure there is a thorough exploration of the situation. On the other hand, the longer this question goes unanswered, the greater the level of uncertainty for current and new users of Open edX users. Speaking for myself, I am working through an update to my hosting infrastructure and would appreciate clarity on this question as it can greatly influence the decisions that I make in my systems architecture. With the upcoming release of Maple, this question will likely become even more pertinent as more users look to the upgrade of their respective systems and look to the project for guidance on what will be required for the future of their search services.
These are all good points! I will say that the lack of acceptance date is mostly because we need to work out internally what our timeline is for this decision as well, and we are partially at the mercy of the OpenSearch community to understand how that ecosystem is going to look as we move forward.
I was erring on the side of putting this ticket and discussion up earlier rather than later, and I can push for us to get an acceptance date on this deprecation ticket ASAP.
OpenSearch is the result of an aggressive fork made by Amazon to prevent Elastic (company) from selling its products – namely to AWS and its customers. This is the first reason why I am against the switch to OpenSearch, but there are others.
First and foremost, we should remember that pretty much all arguments presented by Amazon/OpenSearch are propaganda arguments aimed at spreading FUD (fear, uncertainty and doubt). The OpenSearch vs Elasticsearch situation is a public relationship war, before anything. No, edX is not going to get sued by Elastic for using Elasticsearch in a paid product. Yes, we can still use Elasticsearch as we are used to – as long as we don’t sell Elasticsearch itself.
Here is an example of blatantly incorrect argument: Amazon claims that OpenSearch is a widely supported effort and that many companies contribute to its growth. Well, I’m not so convinced by this argument when I compare the number of commits pushed to either repos since the fork:
OpenSearch commits (source):
Elasticsearch commits (source):
Honestly, I’m worried that there are enough people outside of Elastic that are well versed in ElasticSearch to keep maintaining the project.
So, I think we should not make the switch to OpenSearch. It’s not the right moral choice, and not it’s also not the right technical choice.
I’ll make a bold proposal: we should take this opportunity to get rid of Elasticsearch entirely. Because as much as I despise Amazon’s tactics in this PR war, I think that Elasticsearch is the big wart in the architecture of Open edX. On an empty platform, in idle mode, Elasticsearch uses 1.5 Gb memory, which is 50% more than its heap size (and we can’t bring it lower than 650 Mb even when decreasing the heap size). This is the biggest memory hog from the Open edX stack. On the other hand, there are few components (edx-search, forums, discovery) that actually need Elasticsearch, which makes it easy to refactor them and abstract away their implementation. And search can be performed by other technical components of the stack, including components which are already present in the stack, such as MySQL. Getting rid of Elasticsearch will make it easier and cheaper (not to mention: greener) for everyone, small or large platforms, to run Open edX.
I appreciate the detailed response, and I can certainly get behind an effort to remove Elasticsearch. This, in combination with the efforts to remove Mongo as a requirement will hugely simplify the work of getting an edX deployment up and running. I also agree that Elastic is largely redundant for search functionality, particularly in community deployments that may not even make use of the search features of the platform.
In order to take this from a proposal to a potential reality, what are the concrete steps necessary to identify the work needed to make it happen? If we can remove Elastic and Mongo as hard dependencies of the platform by the next release that would be a huge win from my perspective.
@regis - that is simply not true. ElasticSearch made it impossible to use an open source distribution of their software. So Amazon stepped in with a 100% open source solution that’s openly governed - and I applaud them for it. ElasticSearch and other startups want us to believe that they have to go the non-OSS route in order to sell product. I reject that argument. Look no further than MongoDB - AWS implemented a MongoDB API-compatible service, Document DB. And yet, Mongo has had great success selling their SaaS service. I don’t like Mongo’s own license changes, but that doesn’t change the fact that AWS is free to re-implement the Mongo API, and Mongo still has a compelling, winning product, based on focusing on the developer as customer and operational excellence.
I support this change - it puts Open edX in the position of not depending on a single vendor for a core part of the tech stack.
Hey John I’m not sure I understand your point. As far as I understand, Mongodb and Elasticsearch both distribute their software under the exact same license : the source-available SSPL. To bypass the problem caused by the Mongodb license change, AWS created an API-compatible, closed source product (DocumentDb, stuck to v4.0). I’m pretty sure that Mongodb does not see this obsolete, closed source implementation with a keen eye.
(To be honest, I find it a little ironic that edX is posing as a champion of open source software when edx.org is actually using the closed source DocumentDb implementation of Mongodb…) EDIT: this statement is factually wrong, see below Dave’s answer.
My point is that the Elasticsearch and Mongodb situations are extremely similar, and that arguments made for one should also be valid for the other.
I’m not going to wade into the ES/MongoDB moral arguments here, but I just wanted to briefly clarify that edX doesn’t run DocumentDB. We have our own EC2 servers running MongoDB 4.2 on Ubuntu (I logged into our AWS console just now to double check).
I would love to drop ES from the stack entirely for all the reasons you mentioned, but I was under the impression that MySQL full text search is substantially worse in terms of fetching useful results. This may not matter when the search space is relatively small (e.g. Notes), but I would be concerned about negatively impacting something like the forums search experience. Is that no longer the case?
Sorry for my incorrect assumption; I edited my post above to avoid spreading uncertainty. I was under the impression that edX uses AWS services for all of its data storage backends.
I must say that this situation makes me even more confused:
- Why is the Elasticsearch license change an issue, but not Mongodb?
- If AWS support of Elasticsearch is problematic, why not self-host an ES cluster?
I must say that I have too little experience with full text search in Mysql to offer a definitive solution. However, my opinion is that this deprecation discussion should include an evaluation of technical alternatives to Elasticsearch – including a Mysql-backed search engine.
Hello, I’m reviving this discussion because edX/2U has been doing discovery on this question in advance of the Nutmeg release.
I’ve also posted this info on the DEPR ticket: Move from Elasticsearch to OpenSearch · Issue #16 · openedx/public-engineering · GitHub
Our current decision is to switch to using OpenSearch in several use cases where we need the performance offered by ES. These are:
- edx-search (courseware search)
For other use cases, we will be removing usage of ES:
- edx-analytics-api (endpoints are deprecated entirely. See [DEPR]: Remove Learner View in Insights, Data API, and Pipeline #36)
I am proposing an updated acceptance date on this deprecation for April 18, 2022.