Elasticsearch Distinct search in Spring Boot & Kotlin

by coding flower

One of the most popular query in SQL world is to retrieve distinct values from the given table. Let's leave the SQL world and dive into Elasticsearch.

We have simple document:

@Document(type = "article", indexName = "data")
data class Product(
        @Id
        val id: String? = null,
        val category: String,
        val name: String,
)

and we want to get distinct categories from Elasticsearch.

How to do that?

Using Elasticsearch dev tools we can prototype query which returns us categories:

GET data/_search
{
  "size": 0, 
     "aggs":{
        "distinctCategory": {
            "terms": {
                "field": "category.keyword"
                , "size": 100
            }
        }
    }
}

We have to make aggregation distinctCategory on this index by category field. We have to change size value inside aggregation to more than the default (10) because we are expecting our index has more than 10 different categories.

We are setting size to 0 because we don't need any result from the data index, we care about aggregation.

Response for this query looks like:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
      ...
  },
  "hits" : {
      ...
  },
  "aggregations" : {
    "distinctCategory" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "road bikes",
          "doc_count" : 2130
        },
        {
          "key" : "gravel bikes",
          "doc_count" : 1916
        },
        {
          "key" : "bmx",
          "doc_count" : 1763
        }
      ]
    }
  }
}

The most important part of the response for us is aggregations. Field buckets contains our information.

How to make Elasticsearch a distinct query in Kotlin?

First we need to create aggregation, we are lucky because Elasticsearch create bunch of useful builders.

val aggregation = AggregationBuilders.terms("distinctCategory")
                .field("category.keyword")
                .size(100)

After that we only need to attach our aggregation to the search query:

val searchSourceBuilder = SearchSourceBuilder().size(0).aggregation(aggregation)
        val searchRequest = SearchRequest("data")
                .apply { source(searchSourceBuilder) }

To execute query:

val searchResponse = esConfiguration.client().search(searchRequest, RequestOptions.DEFAULT)

esConfiguration.client() returns RestHighLevelClient.

Our query works, but how to extract the distinct values?

It is very simple, as you can notice in JSON response, distinct categories names are placed in buckets list as keys.

val distincCategories = searchResponse.aggregations
    .get<Terms>("distinctCategory")
    .buckets.map { it.keyAsString }
    .toList()

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More