This article is a continuation of Querying the Keyword Field Data Type. While the first article focused on fetching documents that have an indexed keyword
field, this will focus on sorting and sort-scripting keywords
. We’ll continue to use the .Net OpenSearch.Client NuGet package as we did in part 1. Please read the Keyword Field Data Type Indexing Deep Dive article prior to completing this tutorial, as it contains useful prerequisite information.
At time of writing, Elastic.co documentation is far richer than OpenSearch, so a combination of links between the two vendors may be provided to reference the concepts discussed. These offerings are currently functionally equivalent.
Enriching Search with Sort
We covered searching for documents in the last article. However, a good search experience is not complete if we cannot order the results in a meaningful way for the consumer.
It is important to understand that there may be memory implications to performing an OpenSearch sort operation, discussed here. However, keywords are exempt to the additional considerations that we must give to text, numeric or geo based sorting, which makes them very simple to work with.
Let’s take a look at how easy it is to sort keywords using the test case below, which:
- Indexes two documents
- Issues a
match all
query to fetch them both - Sorts the documents by their
keyword
name
in descending order
[Fact] /// <summary> /// Keyword fields do not require anything special to support sorting /// </summary> public async Task KeywordMapping_CanBeUsedAsASortedField_WithoutAnySpecialConsiderations() { var indexName = "keyword-index"; await _fixture.PerformActionInTestIndex( indexName, mappingDescriptor, async (uniqueIndexName, opensearchClient) => { var productDocuments = new[] { new ProductDocument(1, "mouse"), new ProductDocument(2, "mouse pad"), }; await _fixture.IndexDocuments(uniqueIndexName, productDocuments); var result = await opensearchClient.SearchAsync<ProductDocument>(selector => selector .Index(uniqueIndexName) .Query(query => query.MatchAll()) .Sort(sort => sort .Descending(fieldName => fieldName.Name) ) ); // Our documents can be sorted alphabetically result.IsValid.Should().BeTrue(); var formattedResults = string.Join(", ", result.Documents.Select(doc => doc.Name)); formattedResults.Should().BeEquivalentTo("mouse pad, mouse"); } ); }
This query produces the following DebugInformation
in the response object:
Valid OpenSearch.Client response built from a successful(200)low level call on POST: /keyword-index2ccd7782-75e8-449a-94ac-7c20d10696f3/_search ? pretty = true & error_trace = true & typed_keys = true # Audit trail of this API call : - [1]HealthyResponse: Node: http: //localhost:9200/ Took: 00:00:00.2225050 # Request: { "explain": true, "query": { "match_all": {} }, "sort": [{ "name": { "order": "desc" } } ] } # Response: { "took": 57, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 2, "relation": "eq" }, "max_score": null, "hits": [{ "_shard": "[keyword-index2ccd7782-75e8-449a-94ac-7c20d10696f3][0]", "_node": "l7YV4K5YSFuy_CFfGwt8ig", "_index": "keyword-index2ccd7782-75e8-449a-94ac-7c20d10696f3", "_id": "2", "_score": null, "_source": { "id": 2, "name": "mouse pad" }, "sort": [ "mouse pad" ], "_explanation": { "value": 1.0, "description": "*:*", "details": [] } }, { "_shard": "[keyword-index2ccd7782-75e8-449a-94ac-7c20d10696f3][0]", "_node": "l7YV4K5YSFuy_CFfGwt8ig", "_index": "keyword-index2ccd7782-75e8-449a-94ac-7c20d10696f3", "_id": "1", "_score": null, "_source": { "id": 1, "name": "mouse" }, "sort": [ "mouse" ], "_explanation": { "value": 1.0, "description": "*:*", "details": [] } } ] } } # TCP states: Established: 50 TimeWait: 18 CloseWait: 14 # ThreadPool statistics: Worker: Busy: 1 Free: 32766 Min: 12 Max: 32767 IOCP: Busy: 0 Free: 1000 Min: 12 Max: 1000
Here we can see that the resulting documents are returned to us in descending name order. This has taken into consideration the fact that one of the keyword
s was multi-part word delimited with a space. It’s that simple!
Numeric Sorting with Keyword Fields
Careful consideration must be given when mapping numeric
fields in OpenSearch. This is because they are optimised for range queries, as discussed here. If you are performing term
queries, it is recommended to use keyword
fields for numbers instead of their numeric
mapping type.
But what does this mean for sorting? Are numbers treated differently in keyword
fields?
Let’s take a look in this example below, which:
- Indexes two documents with integer
name
- Issues a
match all
query to fetch them both - Sorts the documents by the name in descending order
[Fact] public async Task KeywordMapping_ShouldNotBeUsedToSortNumericData() { var indexName = "keyword-index"; await _fixture.PerformActionInTestIndex( indexName, mappingDescriptor, async (uniqueIndexName, opensearchClient) => { var productDocuments = new[] { new ProductDocument(1, "5"), new ProductDocument(2, "2000"), }; await _fixture.IndexDocuments(uniqueIndexName, productDocuments); var result = await opensearchClient.SearchAsync<ProductDocument>(selector => selector .Index(uniqueIndexName) .Query(query => query.MatchAll()) .Explain() .Sort(sort => sort .Descending(fieldName => fieldName.Name) ) ); // Our documents can be sorted alphabetically result.IsValid.Should().BeTrue(); var formattedResults = string.Join(", ", result.Documents.Select(doc => doc.Name)); formattedResults.Should().BeEquivalentTo("2000, 5"); } ); }
This produces the following DebugInformation
:
Valid OpenSearch.Client response built from a successful (200) low level call on POST: /keyword-index87cbe8dd-14ac-49b4-bd7b-f6fa2e0f3417/_search?pretty=true&error_trace=true&typed_keys=true # Audit trail of this API call: - [1] HealthyResponse: Node: http://localhost:9200/ Took: 00:00:00.1348488 # Request: {"explain":true,"query":{"match_all":{}},"sort":[{"name":{"order":"desc"}}]} # Response: { "took" : 12, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_shard" : "[keyword-index87cbe8dd-14ac-49b4-bd7b-f6fa2e0f3417][0]", "_node" : "l7YV4K5YSFuy_CFfGwt8ig", "_index" : "keyword-index87cbe8dd-14ac-49b4-bd7b-f6fa2e0f3417", "_id" : "1", "_score" : null, "_source" : { "id" : 1, "name" : "5" }, "sort" : [ "5" ], "_explanation" : { "value" : 1.0, "description" : "*:*", "details" : [ ] } }, { "_shard" : "[keyword-index87cbe8dd-14ac-49b4-bd7b-f6fa2e0f3417][0]", "_node" : "l7YV4K5YSFuy_CFfGwt8ig", "_index" : "keyword-index87cbe8dd-14ac-49b4-bd7b-f6fa2e0f3417", "_id" : "2", "_score" : null, "_source" : { "id" : 2, "name" : "2000" }, "sort" : [ "2000" ], "_explanation" : { "value" : 1.0, "description" : "*:*", "details" : [ ] } } ] } } # TCP states: Established: 173 TimeWait: 25 CloseWait: 12 # ThreadPool statistics: Worker: Busy: 1 Free: 32766 Min: 12 Max: 32767 IOCP: Busy: 0 Free: 1000 Min: 12 Max: 1000
The name of the test method likely gave it away, but still – gasp! Our documents have (unsurprisingly) been sorted by ASCII instead of numerically.
Something to keep in mind when navigating the complexities of numeric fields in your documents!
Next Level Sorting with Painless Scripts
If sorting on the (static) indexed value of a keyword
field is insufficient for your search use case, you can make your sort dynamic by writing a painless script. The Elastic Painless guide can also be found here, if you’d like to read a broader discussion of the scripting language and go through some use cases.
We’ll explore an example in the test below, which:
- Indexes our two documents
- Issues a
match all
query to fetch them both - Sorts the documents with a painless script, that performs a ternary comparison on the
keyword
value to produce an integer value on which the sort will be performed
[Fact] public async Task KeywordMapping_CanBeUsedToScriptASortedField() { var indexName = "keyword-index"; await _fixture.PerformActionInTestIndex( indexName, mappingDescriptor, async (uniqueIndexName, opensearchClient) => { var productDocuments = new[] { new ProductDocument(1, "mouse"), new ProductDocument(2, "mouse pad"), }; await _fixture.IndexDocuments(uniqueIndexName, productDocuments); var result = await opensearchClient.SearchAsync<ProductDocument>(selector => selector .Index(uniqueIndexName) .Query(query => query.MatchAll()) .Explain() .Sort(sort => sort .Script(sortScript => sortScript .Ascending() .Type("number") .Script(s => s.Source($"doc['{nameof(ProductDocument.Name).ToLowerInvariant()}'].value == 'mouse pad' ? 0 : 1") ) ) ) ); // Our scripted sort will return the mousepad at the top of the results result.IsValid.Should().BeTrue(); var formattedResults = string.Join(", ", result.Documents.Select(doc => doc.Name)); formattedResults.Should().BeEquivalentTo("mouse pad, mouse"); } ); }
This query produces the following DebugInformation
in the response object:
Valid OpenSearch.Client response built from a successful(200)low level call on POST: /keyword-index93e2a7a2-ad01-4d72-ba43-cf2ccea61308/_search ? pretty = true & error_trace = true & typed_keys = true # Audit trail of this API call : - [1]HealthyResponse: Node: http: //localhost:9200/ Took: 00:00:00.4245839 # Request: { "explain": true, "query": { "match_all": {} }, "sort": [{ "_script": { "script": { "source": "doc['name'].value == 'mouse pad' ? 0 : 1" }, "type": "number", "order": "asc" } } ] } # Response: { "took": 269, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 2, "relation": "eq" }, "max_score": null, "hits": [{ "_shard": "[keyword-index93e2a7a2-ad01-4d72-ba43-cf2ccea61308][0]", "_node": "l7YV4K5YSFuy_CFfGwt8ig", "_index": "keyword-index93e2a7a2-ad01-4d72-ba43-cf2ccea61308", "_id": "2", "_score": null, "_source": { "id": 2, "name": "mouse pad" }, "sort": [ 0.0 ], "_explanation": { "value": 1.0, "description": "*:*", "details": [] } }, { "_shard": "[keyword-index93e2a7a2-ad01-4d72-ba43-cf2ccea61308][0]", "_node": "l7YV4K5YSFuy_CFfGwt8ig", "_index": "keyword-index93e2a7a2-ad01-4d72-ba43-cf2ccea61308", "_id": "1", "_score": null, "_source": { "id": 1, "name": "mouse" }, "sort": [ 1.0 ], "_explanation": { "value": 1.0, "description": "*:*", "details": [] } } ] } } # TCP states: Established: 64 TimeWait: 1 CloseWait: 14 # ThreadPool statistics: Worker: Busy: 1 Free: 32766 Min: 12 Max: 32767 IOCP: Busy: 0 Free: 1000 Min: 12 Max: 1000
The sort
field in the response above highlights the integer values that were calculated by performing the ternary comparison on our keyword
field. Mouse pad
produces 0 and mouse
produces 1. These values are then used to order the results in ascending order.
It is worth mentioning that this sorting example is contrived and doesn’t reflect a good practical example of scripted sorting. While painless scripts may be relatively fast to execute at search time, we should always aim to keep runtime complexity to a minimum. This will not only make your queries easier to debug, but will also allow them to run as fast as possible.
The example above is not dynamic and should not be implemented with a script. This is because we can evaluate whether the name of the product is mouse pad
and index the result into separate field of the document at indexing time. It is only possible for this value to change when we re-index this document, at which point we can re-assert our expectations and store the new value accordingly.
An example of a dynamic query could be seen if the value mouse pad
were instead an interpolated string variable that was given to us in a HTTP request. This value could be different for every user, which would make it impossible for us to index a pre-calculated value when indexing our document.
Let’s take a look at how a dynamic query could be constructed:
var scriptedVariableValue = "mouse"; var result = await opensearchClient.SearchAsync<ScriptedProductDocument>(selector => selector .Index(uniqueIndexName) .ScriptFields(scriptFields => scriptFields .ScriptField( categoryFieldName, selector => selector.Source($"doc['{nameof(ProductDocument.Name).ToLowerInvariant()}'].value == '{scriptedVariableValue}' ? 'computer accessory' : 'mouse accessory'")) ) .Source(true) );
In the example above our query is populated with an interpolated local variable, which has been set to the same value as the hard coded example. In reality, depending on the nature of the application, this value could be anything at any point in time, making the use case dynamic and a sort script justified.
Note that during interpolation of the variable, we must remember to wrap it in single quotes '{scriptedVariableValue}'
!
Sloth Summary
Today we’ve covered how to sort keyword fields:
- They’re one of the simplest things to sort, as they don’t require any special considerations, mappings, or typecasting
- Numeric data will also be sorted by ASCII code
- Keyword fields can also be used in scripted sorting
- Scripted sort should only be used when we have a dynamic use case that drives the sort order. Ask yourself: “can I index this value at index-time instead?” to keep your search requests performant!