Thursday, January 16, 2025

Software engineering flipped on its head.

Evolve your thinking into its optimal form: the sloth.

Home OpenSearch1. Getting Started [Series] Getting Started with ElasticSearch or OpenSearch

[Series] Getting Started with ElasticSearch or OpenSearch

by Trent
0 comments
opensearch vs elasticsearch featured image

Deciding between OpenSearch vs ElasticSearch can be daunting when starting a new project. Today’s article will cover when the AWS OpenSearch vs ElasticSearch fork began and what the search engines can be best used for. Finally, you’ll find some links that show you how to spin up a cluster of each type, which you can use when reading the plentiful supply of search related content on this very blog!

OpenSearch vs ElasticSearch: the Origin Story

Let’s be clear: ElasticSearch is the original technology. It was created by elastic.co and was an open source project that gained massive momentum in the search and observability space in the industry. However, all good things must come to an end.

On January 21, 2021, elastic.co changed its licensing model and in doing so removed the joys of open sourcing from new versions of ElasticSearch and other products in the suite.

During this time Amazon Web Services (AWS) was offering hosted services of ElasticSearch on their platform. In the face of being slapped with a licensing wall while also seeing tremendous potential for the technology they decided to make a move by forking ElasticSearch into a new proprietary technology called AWS OpenSearch.

To hear the AWS side of the story, check out this AWS article.

ElasticSearch vs OpenSearch Functionality: What Do They Do?

ElasticSearch and OpenSearch are distributed search engines that are built using Apache Lucene.

That sentence was content dense, so let’s break it down:

  1. Distributed means that your searchable content can be stored and searched across more than one server. This allows you to perform blazingly fast searches, because the job of performing a search can be spread across multiple machines, each of which take a small piece of the work and do it on a subset of the total data.
  2. Search engine means that ElasticSearch and OpenSearch are great for finding things, especially using full text search. The power of this comes from using an inverted index for storing and structuring the data to be searched.
  3. Apache Lucene is an open-source Java based search library. When we run a search query, it is Lucene that is doing the grunt work. ElasticSearch and OpenSearch sit on top Lucene and make it a distributed solution by exposing Lucene functionality over a JSON REST API, and managing Lucene instances in a multi node solution.

What can I Use ElasticSearch / OpenSearch For?

ElasticSearch and OpenSearch can be used for a huge range of things, from searching, to logs, to anomaly detection and more. This CodeSloth.blog focuses on using the technology as a search service.

The types of searching that can be performed are:

Complex Metadata Searching

Think of a user searching for products on an online store. They have typed some keywords, and then start refining their search by selecting things like item categories, brand, price range, size etc.

This type of searching is a breeze with both ElasticSearch and OpenSearch and works with very little tokenization. This means that it is relatively easy for software engineers to understand.

Auto Complete or Suggestions

Think of how Google search works. A user starts typing an arbitrary sequence of alphanumeric characters and meaningful suggestions seemingly appear out of nowhere. 

Suggestion based search is more complex than metadata based searching. This is because suggestions aren’t always just a prefix of what the user has typed. Complex business rules coupled with the multitude of ways that suggestions can be configured means that things can spiral in complexity quickly.

Software engineers will likely need to collaborate tightly with product owners to understand business requirements and shape less-exact definitions of expected outcomes. This is because it would be very difficult to plan the exact position of every suggested term for every possible input that a user could type.

Notifications and Alerts

Think of clicking the like button on a product in an online store. When the company puts on a sale, ElasticSearch and OpenSearch can be used to find similar products that match features of the original product and distribute notifications to the user about “flash markdowns” or “what’s new and trending”.

This type of searching is called percolation search and is the reverse style of searching to complex metadata searching.

What are the Differences Between ElasticSearch vs OpenSearch?

ElasticSearch has evolved quite significantly over the years. At time of writing the current version is 8.8, but you can find the latest release on their docs page here.

While AWS OpenSearch is comparatively younger than ElasticSearch, it has trailblazed in the space of hosting a cluster. Let’s compare the two:

OpenSearch vs ElasticSearch Hosting

ElasticSearch can be installed on any cloud provider’s virtual machines. However, this is often undesirable from a commercial software engineering perspective for a number of reasons. These machines often need updates to ensure that they have the latest security patches. Also, nodes within the cluster need to be configured to know about each other, and there can be issues with the underlying hardware that runs the cluster, or the nodes within the cluster itself.

This is where managed services are useful to a business.

ElasticSearch Cloud is a managed solution for ElasticSearch. This means that Elastic.Co handle the responsibility of making sure the hardware that runs your cluster is working well, and that each of the nodes in your cluster know about each other. This lets you focus on higher level concerns such as configuring the cluster to support the required throughput and writing your the search logic for your application.

This is where we see a difference between ElasticSearch vs OpenSearch.

Elastic.Co is not a cloud provider. Therefore, hosted solutions of ElasticSearch are ultimately launched on established cloud providers, such as AWS EC2 or Google cloud. This means that they are constrained by the options offered by the cloud providers, which impacts the granularity with which you can configure your hosted cluster.

Amazon Managed OpenSearch on the other hand is offered by a cloud provider – AWS themself. They use their own hardware, which is tuned specifically for running OpenSearch clusters. Pricing is based off EC2 instances with .search suffixes; a unique family that specifically targets hosting OpenSearch.

Furthermore, at the beginning of 2023 Amazon Web Services launched the next iteration of OpenSearch: OpenSearch Serverless. Currently both ElasticSearch Cloud and AWS Managed OpenSearch are not elastic in their configuration. This means that when you provision a cluster, the capability of the cluster (processing power, or throughput) will remain unchanged, despite the load that you throw at it.

OpenSearch Serverless changes the game by introducing elasticity into the underlying hardware that powers search. This means that you only pay for what you need and it can scale-out automatically to support an increasing workload over time. At time of writing this scaling is quite limited, and is not ideal for synchronous consumption. Therefore you are best working with the OpenSearch Serverless in background tasks which do not have strict throughput requirements, such as user facing Web APIs.

ElasticSearch vs OpenSearch SDKs

I have been working professionally with OpenSearch across multiple product offerings for multiple years now and can confirm that the development experience from a search perspective is similar to that of managed ElasticSearch in AWS. In fact, you can use either the NEST (ElasticSearch) or OpenSearch client NuGet packages synonymously up until version 7.10.2 of ElasticSearch and OpenSearch Version 1. This is because OpenSearch v1 was forked from 7.10.2 of ElasticSearch.

It is important to note that there are slight differences in some function names or terms in newer versions of OpenSearch, but the syntax is largely the same for search and mapping related functions. AWS Managed OpenSearch does not expose all of the administrative HTTP endpoints that Elastic.Co do, given that changing some of these values could understandably be detrimental to how the cluster functions within their managed ecosystem.

OpenSearch Serverless is an entirely different beast which contains even less configurability than the managed service. For example, you are unable to define the number of shards or replicas of an index within a serverless collection (the equivalent of a managed cluster).

Difference Between OpenSearch and ElasticSearch Local Development

Elastic.Co publish pre-compiled binaries of their product (such as .exe files that can be double clicked and run on Windows). It can also be run in a Docker container.

OpenSearch can be run in Docker.

Where to From Here: a Fork in the Road

There are two ways that you can use the posts in this series.

OpenSearch vs ElasticSearch

If you choose to go down the path of OpenSearch over ElasticSearch, then follow this link to learn how to run a cluster in Docker.

ElasticSearch vs OpenSearch

If you choose to go down the path of ElasticSearch over OpenSEarch, then follow this link to learn how to run a cluster on Windows.

Sloth out a Solution

Still undecided? Check out the articles in the OpenSearch and ElasticSearch categories on CodeSloth.blog to help you decide!

Happy searching!

You may also like