The Terms Aggregation is an excellent tool for counting all the terms appearing across our documents. It can be run on both single-value keywords and collections of strings. At times, though, we may not want to count all terms that match a query, and instead focus on particular values that we might want to include, or exclude. This article will dive deep into how OpenSearch Terms Aggregation include exclude parameters can limit what will be counted.
The latest version of code snippets used in this article can be found via the Code Sloth Code Samples page, under Java Search samples.
Including Filtered Matches in Term Aggregations on Single Keyword Field Mappings
The code below contains an integration test that explores running a terms aggregation with an include term filter. This is our simplest example to start with.
Including Filtered Terms
/** * This test verifies that keyword fields can be used for filtered terms aggregation. * It demonstrates how to use the 'includes' parameter with explicit term values to filter terms. * <p> * Unlike the regex version, this approach allows exact matching of specific terms without * the complexity of regular expressions. * * @param includeTerms Array of terms to include in the aggregation * @param expectedResults The expected aggregation results in "term:count" format * @param description A description of what the test case is evaluating * @throws Exception If an I/O error occurs */ @ParameterizedTest @CsvSource({ "mouse, mouse:3, 'Include only mouse - filters out mouse pad'", "mouse pad, mouse pad:2, 'Include only mouse pad - filters out mouse'", "'mouse, mouse pad', 'mouse:3, mouse pad:2', 'Include both terms - shows all terms'", "keyboard, '', 'Include non-existent term - no results'" }) public void keywordMapping_CanBeUsedForFilteredTermsAggregation_OnSingleKeywordWithIncludeTerms( @ConvertWith(StringArrayConverter.class) String[] includeTerms, String expectedResults, String description) throws Exception { // Create a test index with keyword mapping for the Name field try (OpenSearchTestIndex testIndex = fixture.createTestIndex(mapping -> mapping.properties("name", Property.of(p -> p.keyword(k -> k))))) { // Create and index product documents ProductDocument[] productDocuments = new ProductDocument[]{ new ProductDocument(1, "mouse", 1), new ProductDocument(2, "mouse pad", 2), new ProductDocument(3, "mouse", 3), new ProductDocument(4, "mouse", 4), new ProductDocument(5, "mouse pad", 5) }; testIndex.indexDocuments(productDocuments); // Create a search request with terms aggregation and includes filter using explicit terms SearchRequest searchRequest = new SearchRequest.Builder() .index(testIndex.getName()) .size(0) // We do not want any documents returned; just the aggregations .aggregations("product_counts", a -> a .terms(t -> t .field("name") .size(10) .include(i -> i.terms(Arrays.asList(includeTerms))) ) ) .build(); // Execute the search request SearchResponse<ProductDocument> response = openSearchClient.search(searchRequest, ProductDocument.class); // Verify the results assertThat(response.aggregations()).isNotNull(); StringTermsAggregate termsAgg = response.aggregations().get("product_counts").sterms(); // Extract each term and its associated number of hits Map<String, Long> bucketCounts = termsAgg.buckets().array().stream() .collect(Collectors.toMap( StringTermsBucket::key, StringTermsBucket::docCount )); // Format the results for verification String formattedResults = bucketCounts.entrySet().stream() .map(entry -> entry.getKey() + ":" + entry.getValue()) .collect(Collectors.joining(", ")); // Verify the expected results assertThat(formattedResults) .as(description) .isEqualTo(expectedResults); } }
Let’s break down what the test is doing:
- Creates a test index with a keyword mapping for the
name
keyword
field - Indexes five product documents with the single names
mouse
andmouse pad
- Builds a
terms aggregation
search request that filters the terms using aninclude
parameter- This is supplied via the parameterised test
- The
terms
method is used to supply the parameter(s) for exact match filtering
The test explores each combination of single and combined inputs, all results and no results.
Including Filtered Regular Expressions
Similar to this approach, we can apply regular expressions to reduce the counted terms in the aggregation.
/** * This test verifies that keyword fields can be used for filtered terms aggregation. * It demonstrates how to use the 'includes' parameter with regex patterns to filter terms. * * @param includesPattern The regex pattern to include terms * @param expectedResults The expected aggregation results in "term:count" format * @param description A description of what the test case is evaluating * @throws Exception If an I/O error occurs */ @ParameterizedTest @CsvSource({ "mouse, mouse:3, 'Exact match - matches only the exact term'", "mouse.*, 'mouse:3, mouse pad:2', 'Prefix match - matches terms starting with mouse'", ".*pad, mouse pad:2, 'Suffix match - matches terms ending with pad'", "keyboard, '', 'No matches - pattern matches no terms'" }) public void keywordMapping_CanBeUsedForFilteredTermsAggregation_OnSingleKeywordWithIncludeRegularExpression(String includesPattern, String expectedResults, String description) throws Exception { // Create a test index with keyword mapping for the Name field try (OpenSearchTestIndex testIndex = fixture.createTestIndex(mapping -> mapping.properties("name", Property.of(p -> p.keyword(k -> k))))) { // Create and index product documents ProductDocument[] productDocuments = new ProductDocument[]{ new ProductDocument(1, "mouse", 1), new ProductDocument(2, "mouse pad", 2), new ProductDocument(3, "mouse", 3), new ProductDocument(4, "mouse", 4), new ProductDocument(5, "mouse pad", 5) }; testIndex.indexDocuments(productDocuments); // Create a search request with terms aggregation and includes filter using regexp SearchRequest searchRequest = new SearchRequest.Builder() .index(testIndex.getName()) .size(0) // We do not want any documents returned; just the aggregations .aggregations("product_counts", a -> a .terms(t -> t .field("name") .size(10) .include(i -> i.regexp(includesPattern)) ) ) .build(); // Execute the search request SearchResponse<ProductDocument> response = openSearchClient.search(searchRequest, ProductDocument.class); // Verify the results assertThat(response.aggregations()).isNotNull(); StringTermsAggregate termsAgg = response.aggregations().get("product_counts").sterms(); // Extract each term and its associated number of hits Map<String, Long> bucketCounts = termsAgg.buckets().array().stream() .collect(Collectors.toMap( StringTermsBucket::key, StringTermsBucket::docCount )); // Format the results for verification String formattedResults = bucketCounts.entrySet().stream() .map(entry -> entry.getKey() + ":" + entry.getValue()) .collect(Collectors.joining(", ")); // Verify the expected results assertThat(formattedResults) .as(description) .isEqualTo(expectedResults); } }
The structure of this test is the same as the terms test. Note that the include
method call is supplied the result of a regexp
call instead of terms
.
A Closer Look With The Debugger; Regular Expressions
To debug the test, locate the small green play button in the gutter to the left of the test method name, then select Debug. This will automatically use docker-compose to create a running OpenSearch cluster.

Set a breakpoint on the assertion that result.aggregations
is not null.
To prepare for observing our variables, open the Threads & Variables
debug window.

Type the expressions below in the text box at the top of the panel and click the button pointed at in 3.
to keep the watch for future debugging, or hit enter to produce a result
that can be seen in the top row.
The request object can be seen by clicking the blue view
text against a saved watch of searchRequest.toJsonString()
.
The below compares the regular expression (left) against the terms filter from above (right). Notice that the only difference in syntax on the JSON object is the presence of an array.
{ "aggregations" : { "product_counts" : { "terms" : { "field" : "name", "include" : "mouse.*", "size" : 10 } } }, "size" : 0 }
{ "aggregations" : { "product_counts" : { "terms" : { "field" : "name", "include" : [ "mouse" ], "size" : 10 } } }, "size" : 0 }
Enter response.toJsonString()
to view the raw response from OpenSearch:
{ "took" : 43, "timed_out" : false, "_shards" : { "failed" : 0.0, "skipped" : 0.0, "successful" : 1.0, "total" : 1.0 }, "hits" : { "total" : { "relation" : "eq", "value" : 5 }, "hits" : [ ] }, "aggregations" : { "sterms#product_counts" : { "buckets" : [ { "doc_count" : 3, "key" : "mouse" }, { "doc_count" : 2, "key" : "mouse pad" } ], "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0 } } }
Setting the page size to 0 returns 0 hits, as expected. This is a useful strategy to avoid wasted compute and bandwidth if we are only observing aggregations.
The aggregations
section of the JSON response has a terms section that we named product_counts
, which contains 2 buckets of terms that matched the regular expression:
mouse
which was matched 3 timesmouse pad
which was matched 2 times
The other tests demonstrate that mouse
can be filtered out with .*pad
and no results will be counted with a pattern such as keyboard
which does not match.
Including Filtered Matches in Term Aggregations on Collection Keyword Field Mappings
Another practical use case for a filtered terms aggregation is counting specific terms in an array field. While we could use a query to reduce our overall corpus (be it a particular term or analyzed text), when dealing with an array on a document, a plain terms aggregation will count all values within the array; for each applicable document.
Query Filters Do Not Reduce Calculated Terms in Collections
This is demonstrated in the below test, which filters the corpus down to a single document. Despite this, all values in the names array are counted in the terms aggregation.
/** * This test verifies that terms aggregations on keyword arrays count all terms in matching documents, * even when the documents are filtered by a query. * <p> * When a document matches a query, all terms in its arrays are counted in the aggregation, * not just the terms that matched the query. * * @throws IOException If an I/O error occurs */ @Test public void keywordMapping_TermsAggregationOnKeywordArrayCountsAllTermsWhenFiltered() throws Exception { // Create a test index with keyword mapping for the names array field try (OpenSearchTestIndex testIndex = fixture.createTestIndex(mapping -> mapping.properties("names", Property.of(p -> p.keyword(k -> k))))) { // Create and index product documents with array of names ProductDocumentWithMultipleNames[] productDocuments = new ProductDocumentWithMultipleNames[]{ new ProductDocumentWithMultipleNames(1, new String[]{"mouse", "computer"}, 1), new ProductDocumentWithMultipleNames(2, new String[]{"mouse pad", "power cable"}, 2), new ProductDocumentWithMultipleNames(3, new String[]{"mouse", "mouse pad"}, 3), new ProductDocumentWithMultipleNames(4, new String[]{"mouse", "arm rest pad"}, 4), new ProductDocumentWithMultipleNames(5, new String[]{"mouse pad"}, 5) }; testIndex.indexDocuments(productDocuments); // Create a search request with a term query on "computer" and a terms aggregation SearchRequest searchRequest = new SearchRequest.Builder() .index(testIndex.getName()) // This test adds a query to reduce the overall applicable documents. Only 1 document will match and have its terms aggregated .query(q -> q .term(t -> t .field("names") .value(FieldValue.of("computer")) ) ) .size(0) // We do not want any documents returned; just the aggregations .aggregations("product_counts", a -> a .terms(t -> t .field("names") .size(10) ) ) .build(); // Execute the search request SearchResponse<ProductDocumentWithMultipleNames> response = openSearchClient.search(searchRequest, ProductDocumentWithMultipleNames.class); // Verify the results assertThat(response.aggregations()).isNotNull(); // Verify that the query matched only one document assertThat(response.hits().total().value()).isEqualTo(1); StringTermsAggregate termsAgg = response.aggregations().get("product_counts").sterms(); // Extract each term and its associated number of hits Map<String, Long> bucketCounts = termsAgg.buckets().array().stream() .collect(Collectors.toMap( StringTermsBucket::key, StringTermsBucket::docCount )); // Format the results for verification String formattedResults = bucketCounts.entrySet().stream() .map(entry -> entry.getKey() + ":" + entry.getValue()) .collect(Collectors.joining(", ")); // Verify the expected results - both "mouse" and "computer" are counted in the one matching document. // Despite a single document being produced from the match term query, all of its values in the field are aggregated over assertThat(formattedResults).isEqualTo("mouse:1, computer:1"); } }
This test builds the following query:
{ "aggregations" : { "product_counts" : { "terms" : { "field" : "names", "size" : 10 } } }, "query" : { "term" : { "names" : { "value" : "computer" } } }, "size" : 0 }
This produces the following response:
{ "took" : 64, "timed_out" : false, "_shards" : { "failed" : 0.0, "skipped" : 0.0, "successful" : 1.0, "total" : 1.0 }, "hits" : { "total" : { "relation" : "eq", "value" : 1 }, "hits" : [ ] }, "aggregations" : { "sterms#product_counts" : { "buckets" : [ { "doc_count" : 1, "key" : "computer" }, { "doc_count" : 1, "key" : "mouse" } ], "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0 } } }
In the response, we can see that there was 1 hit, and both of its terms (computer
and mouse
) were counted once each. This confirms that the aggregation will evaluate all terms on the returned documents, no matter what filters we apply in the query.
Filtering Counted Collection Terms With Terms Filters
While the include
filter in the single keyword
tests above completely removed documents from being counted in the terms aggregation, we can use the same filter to eliminate documents from aggregation, or refine counted terms to relevant values on a document.
/** * This test verifies that arrays of keyword fields can be used for filtered terms aggregation * using explicit term lists rather than regular expressions. * <p> * Unlike the regex version, this approach allows exact matching of specific terms without * the complexity of regular expressions. * <p> * See {@link KeywordAggregationTests#keywordMapping_CanBeUsedForTermsAggregationOnKeywordArray} for the base case * that counts all elements across all document arrays without filtering. * * @param includeTerms Array of terms to include in the aggregation * @param expectedResults The expected aggregation results in "term:count" format * @param description A description of what the test case is evaluating * @throws Exception If an I/O error occurs */ @ParameterizedTest @CsvSource({ "mouse, mouse:3, 'Include only mouse - matches exact term'", "mouse pad, mouse pad:3, 'Include only mouse pad - matches exact term'", "'mouse, mouse pad', 'mouse:3, mouse pad:3', 'Include mouse terms - matches both terms'", "'mouse, computer', 'mouse:3, computer:1', 'Include mixed terms - matches one common and one rare term'", "keyboard, '', 'Include non-existent term - no results'" }) public void keywordMapping_CanBeUsedForFilteredTermsAggregation_OnKeywordArrayWithIncludeTerms( @ConvertWith(StringArrayConverter.class) String[] includeTerms, String expectedResults, String description) throws Exception { // Create a test index with keyword mapping for the names array field try (OpenSearchTestIndex testIndex = fixture.createTestIndex(mapping -> mapping.properties("names", Property.of(p -> p.keyword(k -> k))))) { // Create and index product documents with array of names ProductDocumentWithMultipleNames[] productDocuments = new ProductDocumentWithMultipleNames[]{ new ProductDocumentWithMultipleNames(1, new String[]{"mouse", "computer"}, 1), new ProductDocumentWithMultipleNames(2, new String[]{"mouse pad", "power cable"}, 2), new ProductDocumentWithMultipleNames(3, new String[]{"mouse", "mouse pad"}, 3), new ProductDocumentWithMultipleNames(4, new String[]{"mouse", "arm rest pad"}, 4), new ProductDocumentWithMultipleNames(5, new String[]{"mouse pad"}, 5) }; testIndex.indexDocuments(productDocuments); // The includeTerms is now directly a String array, no need for parsing // Create a search request with terms aggregation and includes filter using explicit terms SearchRequest searchRequest = new SearchRequest.Builder() .index(testIndex.getName()) .size(0) // We do not want any documents returned; just the aggregations .aggregations("product_counts", a -> a .terms(t -> t .field("names") .size(10) .include(i -> i.terms(Arrays.asList(includeTerms))) ) ) .build(); // Execute the search request SearchResponse<ProductDocumentWithMultipleNames> response = openSearchClient.search(searchRequest, ProductDocumentWithMultipleNames.class); // Verify the results assertThat(response.aggregations()).isNotNull(); StringTermsAggregate termsAgg = response.aggregations().get("product_counts").sterms(); // Extract each term and its associated number of hits Map<String, Long> bucketCounts = termsAgg.buckets().array().stream() .collect(Collectors.toMap( StringTermsBucket::key, StringTermsBucket::docCount )); // Format the results for verification String formattedResults = bucketCounts.entrySet().stream() .map(entry -> entry.getKey() + ":" + entry.getValue()) .collect(Collectors.joining(", ")); // Verify the expected results assertThat(formattedResults) .as(description) .isEqualTo(expectedResults); } }
This test indexes documents with multiple names. Test cases provide inputs to assert single terms and combinations of terms.
When we supply multiple term filters, they apply a logical OR
to the target array field. This can be seen in the case mouse, computer
, which would only produce mouse:1, computer:1
if a logical AND
was performed. Instead, we see there are 3 mouse
terms counted.
This produces the following request:
{ "aggregations" : { "product_counts" : { "terms" : { "field" : "names", "include" : [ "mouse", "computer" ], "size" : 10 } } }, "size" : 0 }
The response can be seen as follows:
{ "took" : 2, "timed_out" : false, "_shards" : { "failed" : 0.0, "skipped" : 0.0, "successful" : 1.0, "total" : 1.0 }, "hits" : { "total" : { "relation" : "eq", "value" : 5 }, "hits" : [ ] }, "aggregations" : { "sterms#product_counts" : { "buckets" : [ { "doc_count" : 3, "key" : "mouse" }, { "doc_count" : 1, "key" : "computer" } ], "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0 } } }
Filtering Counted Collection Terms With Regular Expressions
Similar to the single keyword
scenario, we can run regular expressions over collections of keywords
.
/** * This test verifies that arrays of keyword fields can be used for filtered terms aggregation. * It demonstrates how to use the 'includes' parameter with regex patterns to filter terms. * <p> * See {@link KeywordAggregationTests#keywordMapping_CanBeUsedForTermsAggregationOnKeywordArray} for the base case * that counts all elements across all document arrays without filtering. * <p> * See {@link KeywordAggregationTests#keywordMapping_TermsAggregationOnKeywordArrayCountsAllTermsWhenFiltered} for an example * of how all terms in an array are counted in the aggregation even when documents are filtered by a query on one of those terms. * * @param includesPattern The regex pattern to include terms * @param expectedResults The expected aggregation results in "term:count" format * @param description A description of what the test case is evaluating * @throws Exception If an I/O error occurs */ @ParameterizedTest @CsvSource({ "mouse, mouse:3, 'Exact match - matches only the exact term'", "mouse.*, 'mouse:3, mouse pad:3', 'Prefix match - matches terms starting with mouse'", ".*pad, 'mouse pad:3, arm rest pad:1', 'Suffix match - matches terms ending with pad'", "keyboard, '', 'No matches - pattern matches no terms'", "'.*', 'mouse:3, computer:1, mouse pad:3, power cable:1, arm rest pad:1', 'Match all - pattern matches all terms'" }) public void keywordMapping_CanBeUsedForFilteredTermsAggregation_OnKeywordArrayWithIncludeRegularExpression(String includesPattern, String expectedResults, String description) throws Exception { // Create a test index with keyword mapping for the names array field try (OpenSearchTestIndex testIndex = fixture.createTestIndex(mapping -> mapping.properties("names", Property.of(p -> p.keyword(k -> k))))) { // Create and index product documents with array of names ProductDocumentWithMultipleNames[] productDocuments = new ProductDocumentWithMultipleNames[]{ new ProductDocumentWithMultipleNames(1, new String[]{"mouse", "computer"}, 1), new ProductDocumentWithMultipleNames(2, new String[]{"mouse pad", "power cable"}, 2), new ProductDocumentWithMultipleNames(3, new String[]{"mouse", "mouse pad"}, 3), new ProductDocumentWithMultipleNames(4, new String[]{"mouse", "arm rest pad"}, 4), new ProductDocumentWithMultipleNames(5, new String[]{"mouse pad"}, 5) }; testIndex.indexDocuments(productDocuments); // Create a search request with terms aggregation and includes filter using regexp SearchRequest searchRequest = new SearchRequest.Builder() .index(testIndex.getName()) .size(0) // We do not want any documents returned; just the aggregations .aggregations("product_counts", a -> a .terms(t -> t .field("names") .size(10) .include(i -> i.regexp(includesPattern)) ) ) .build(); // Execute the search request SearchResponse<ProductDocumentWithMultipleNames> response = openSearchClient.search(searchRequest, ProductDocumentWithMultipleNames.class); // Verify the results assertThat(response.aggregations()).isNotNull(); StringTermsAggregate termsAgg = response.aggregations().get("product_counts").sterms(); // Extract each term and its associated number of hits Map<String, Long> bucketCounts = termsAgg.buckets().array().stream() .collect(Collectors.toMap( StringTermsBucket::key, StringTermsBucket::docCount )); // Format the results for verification String formattedResults = bucketCounts.entrySet().stream() .map(entry -> entry.getKey() + ":" + entry.getValue()) .collect(Collectors.joining(", ")); // Verify the expected results assertThat(formattedResults) .as(description) .isEqualTo(expectedResults); } }
This produces the following request:
{ "aggregations" : { "product_counts" : { "terms" : { "field" : "names", "include" : "mouse.*", "size" : 10 } } }, "size" : 0 }
The response is as follows:
{ "took" : 48, "timed_out" : false, "_shards" : { "failed" : 0.0, "skipped" : 0.0, "successful" : 1.0, "total" : 1.0 }, "hits" : { "total" : { "relation" : "eq", "value" : 5 }, "hits" : [ ] }, "aggregations" : { "sterms#product_counts" : { "buckets" : [ { "doc_count" : 3, "key" : "mouse" }, { "doc_count" : 3, "key" : "mouse pad" } ], "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0 } } }
In this example, we can see that a mouse
prefix does not allow computer
to be counted as a term, but mouse pad
and mouse
are.
Excluding terms from aggregation
The same approach can be applied using the exclude
parameter. As the name indicates, this will exclude any term matching the provided pattern.
Example requests can be seen from a Code Sloth Code Sample test below. The left captures a regular expression, and the right, a terms exclude filter.
{ "aggregations" : { "product_counts" : { "terms" : { "exclude" : "mouse.*", "field" : "names", "size" : 10 } } }, "size" : 0 }
{ "aggregations" : { "product_counts" : { "terms" : { "exclude" : [ "mouse", "mouse pad" ], "field" : "names", "size" : 10 } } }, "size" : 0 }
Query Time v.s. Indexing Time Tradeoffs in Terms Aggregations
Suppose you are working with a single field (not a collection). In that case, you might best index a second representation of the data to form a query filter, rather than relying on the terms aggregation to do it on your behalf.
- Field 1: used for filtering applicable documents. In the case of a regular expression prefix match, an edge n-grams field (either analyzed as a tokenizer or token filter) could be used in the
query
part of the request to reduce the overall number of applicable documents - Field 2: A
keyword
field. This could then be used to aggregate the terms over the applicable corpus. If you want exact match filtering, a second field would not be required, and the keyword field could be leveraged for both.
Query time and indexing time strategies have different pros and cons:
- Using a single field can simplify the solution and reduce the amount of data we need to store. However, this may come with a performance cost at query time, resulting from analyzing more documents than necessary.
- We can use multiple fields, which take more space, but we can potentially increase query performance by avoiding global ordinals with an execution hint if we can sufficiently reduce the overall number of matching documents, so that they can be counted in memory.
Sloth Summary
In this post, we explored how to perform filtered terms aggregations using the include and exclude parameters on keyword
fields.
We learned that filtering documents with a query will have no impact on the counted terms when the aggregated field is a collection.
We also observed that a logical OR
is performed when multiple terms are provided in the terms filter against a collection of keywords.
Happy terms aggregation filtering 🦥