CodingFlower https://www.codingflower.com/ Check news in the world of Senior Software Engineers! Thu, 25 Jan 2024 16:30:47 +0000 en hourly 1 Senior Software Engineer Glossary: Kafka https://www.codingflower.com/2024/01/25/senior-software-engineer-glossary-kafka https://www.codingflower.com/2024/01/25/senior-software-engineer-glossary-kafka#respond Thu, 25 Jan 2024 16:29:53 +0000 https://www.codingflower.com/?p=1912 Senior Software Engineer Glossary: Kafka As a Senior Software Engineer, understanding the nuances of Apache…

The post Senior Software Engineer Glossary: Kafka appeared first on CodingFlower.

]]>
Senior Software Engineer Glossary: Kafka

As a Senior Software Engineer, understanding the nuances of Apache Kafka is essential for designing and implementing robust data processing systems. Here's an in-depth look at Kafka and its role in modern software architecture.

Introduction to Apache Kafka

Apache Kafka is a distributed event streaming platform used for building real-time data pipelines and streaming applications. Developed by LinkedIn and later open-sourced as part of the Apache Software Foundation, Kafka is written in Scala and Java.

Key Features

  • High Throughput: Handles high volumes of data, making it suitable for big data scenarios.
  • Scalability: Scales horizontally to manage increased loads efficiently.
  • Fault Tolerance: Built-in replication and partitioning for reliable data storage and processing.
  • Low Latency: Facilitates real-time data processing.

Understanding Kafka's core components is crucial for effective use:

  • Producers: Applications that send records to Kafka topics.
  • Consumers: Applications that read records from topics.
  • Topics: Named feeds to which records are published.
  • Brokers: Servers that store and distribute data.
  • Kafka Clusters: Clusters consist of multiple brokers to maintain load balance and ensure fault tolerance. Data is replicated across brokers for high availability.

Advanced Features

For senior engineers, mastering advanced features is key:

  • Kafka Streams: A library for building stream processing applications using Kafka.
  • Kafka Connect: A tool for streaming data between Kafka and other systems.
  • Exactly-Once Semantics: Ensures each message is processed exactly once, a critical feature for transactional systems.

Use Cases

Kafka's versatility makes it ideal for:

  • Event-Driven Architecture: As the backbone of a microservices architecture.
  • Real-Time Data Processing: For analytics and monitoring systems.
  • Data Integration: As a pipeline for data movement between systems.

Kafka in Practice

To implement Kafka effectively:

  • Understand Topic Design: Properly structure topics based on the use case.
  • Optimize Producer and Consumer Configuration: For efficiency and reliability.
  • Monitor Performance: Regularly check system health and throughput.

Challenges and Solutions

  • Data Consistency: Ensure proper configuration for exactly-once semantics.
  • System Complexity: Requires a deep understanding of its internal workings for optimal use.

Conclusion

Apache Kafka is a powerful tool in a Senior Software Engineer's toolkit. Its ability to handle real-time data streams and integrate seamlessly into distributed systems makes it indispensable in modern software development.

Check our full article on medium!
Medium

The post Senior Software Engineer Glossary: Kafka appeared first on CodingFlower.

]]>
https://www.codingflower.com/2024/01/25/senior-software-engineer-glossary-kafka/feed 0
Senior Software Engineer Glossary: Caching https://www.codingflower.com/2024/01/23/senior-software-engineer-glossary-caching https://www.codingflower.com/2024/01/23/senior-software-engineer-glossary-caching#respond Tue, 23 Jan 2024 18:34:07 +0000 https://www.codingflower.com/?p=1887 Caching for Senior Software Engineers Overview Caching is a technique used to store data temporarily…

The post Senior Software Engineer Glossary: Caching appeared first on CodingFlower.

]]>
Caching for Senior Software Engineers

Overview

Caching is a technique used to store data temporarily in a rapidly accessible storage layer, improving the performance and scalability of applications.

Key Concepts

  • Cache Invalidation: Crucial for maintaining data accuracy, involves updating or removing data in the cache when it changes in the source.
  • Consistency: Ensuring data in the cache reflects the latest data in the database.
  • Distributed Caching: Useful in high-scale systems, distributes the cache across multiple servers.

Strategies

  • Memory vs. Disk Caching: Memory caching is faster but limited by RAM, while disk caching offers more storage at the cost of speed.
  • Cache-Aside: Application code handles the cache, loading data into it as needed.
  • Read-Through/Write-Through: Cache automatically loads data on a cache miss and writes data to the source.

Technologies

  • Common caching solutions include Redis and Memcached, offering features like in-memory data storage and distributed caching.
    -You can always strive to use some in memory solutions like building it internally with the applications however they have many disadvantages - in the production ready scenario, please use the distributed approach to fully utilize the power of caching
    -Cache can also be in the Databases (like Aurora), CDN (like CloudFront) as well as the Object Relational Mapping tools like Hibernate. Each of those utilize the power of caching that speeds up the development of the solutions. Apart of those caches, you can also find some of the out of the box caches like CPU (L1, L2, L3) and Disk (i.e. SSD)

Best Practices

  • Choose the right caching strategy based on application needs.
  • Monitor cache performance and hit/miss ratios.
  • Plan for cache failure and data synchronization challenges.
  • Utilizing Machine Learning algorithms and Edge Caching to fully obtain the newest technologies and power for caching advantages

Summary

Caching, when correctly implemented, can significantly enhance the performance and user experience of software applications. This is a necessary concept that has to be known by all the Senior Software Engineers!

Check out full information about the article on Medium.

See you on the next posts with our Senior Software Engineer Glossary Series!

The post Senior Software Engineer Glossary: Caching appeared first on CodingFlower.

]]>
https://www.codingflower.com/2024/01/23/senior-software-engineer-glossary-caching/feed 0
Inline Functions in Kotlin: Pros and Cons https://www.codingflower.com/2023/09/07/inline-functions-in-kotlin-pros-and-cons https://www.codingflower.com/2023/09/07/inline-functions-in-kotlin-pros-and-cons#respond Thu, 07 Sep 2023 21:44:10 +0000 http://www.codingflower.com/?p=1881 Inline Functions in Kotlin: Pros and Cons Inline functions are a powerful tool that can…

The post Inline Functions in Kotlin: Pros and Cons appeared first on CodingFlower.

]]>
Inline Functions in Kotlin: Pros and Cons

Inline functions are a powerful tool that can be used to improve the performance of Kotlin code. However, they also have some drawbacks that should be considered before using them.

What is an inline function?

An inline function is a function that is copied into the code of the calling function instead of being called separately. This can improve performance by reducing the number of function calls that need to be made.

When to use inline functions

Inline functions should be used when the performance benefits outweigh the drawbacks. This is typically the case for functions that are called frequently, especially within loops or in performance-critical code.

Pros of inline functions

  • Performance optimization: Inline functions can improve performance by reducing the number of function calls that need to be made. This is because the function's code is copied into the code of the calling function, so there is no need to make a separate function call.
  • Control over function inlining: The inline modifier can be used to control when and where a function is inlined. This can be useful for ensuring that performance-critical functions are always inlined.

Cons of inline functions

  • Code size increase: Inline functions can increase the size of the compiled code, as the function's code is copied at every call site. This can be a concern for applications with limited memory resources.
  • Compile time increase: Inline functions can increase the compile time, as the compiler needs to analyze the function's code to determine if it can be inlined. This can be a concern for applications that need to be compiled quickly.
  • Complexity increase: Inline functions can make code more complex, as the function's code is copied into the code of the calling function. This can make the code more difficult to read and maintain.

When not to use inline functions

Inline functions should not be used when the performance benefits are not significant or when the drawbacks outweigh the benefits. This is typically the case for functions that are called infrequently or that are not performance-critical.

Conclusion

Inline functions are a powerful tool that can be used to improve the performance of Kotlin code. However, they should be used carefully, as they can also have some drawbacks. When deciding whether or not to use an inline function, it is important to consider the specific needs of the application.

The post Inline Functions in Kotlin: Pros and Cons appeared first on CodingFlower.

]]>
https://www.codingflower.com/2023/09/07/inline-functions-in-kotlin-pros-and-cons/feed 0
How to create Mono? https://www.codingflower.com/2021/12/18/how-to-create-mono https://www.codingflower.com/2021/12/18/how-to-create-mono#respond Sat, 18 Dec 2021 17:41:33 +0000 http://www.codingflower.com/?p=1873 How to create Mono ? In this article, I would like to show you several…

The post How to create Mono? appeared first on CodingFlower.

]]>
How to create Mono ?

In this article, I would like to show you several different ways of creating reactive Mono. I'll present how to:

  • create mono without any value
  • create mono with value
  • create mono which contains an exception
  • create eager mono
  • create lazy mono
  • create mono per new subscription

but first, let's start with a short intro about Mono<T> itself.

Mono<T>

In simple words Mono<T> is a customized version of Publisher<T> which is able to emit at most one item.

Create Mono without any value

The most basic version of Mono. This Mono will complete without emitting any item.
Let's test that.

    @Test
    fun `should create empty mono`() {
        val emptyMono = Mono.empty<String>()

        StepVerifier
            .create(emptyMono)
            .expectNextCount(0)
            .verifyComplete()
    }

Create Mono with value

It would be nice to have Mono with some value inside of it, to achieve that we can use just method which accepts any type of data and returns Mono<T>. We can say that we are wrapping given data with Mono.

    @Test
    fun `should create mono with value`() {
        val createdMono = Mono.just("Mono")

        StepVerifier
            .create(createdMono)
            .expectNext("Mono")
            .verifyComplete()
    }

Create Mono which contains exception

Due to the fact that exception exists and we have to deal with them there is also a possibility to create Mono which contains an exception.

    @Test
    fun `should create error mono`() {
        val monoError = Mono.error<RuntimeException> { RuntimeException("Some fishy exception") }

        StepVerifier
            .create(monoError)
            .expectError(RuntimeException::class.java)
            .verify()
    }

Create eager Mono

What does it mean eager Mono? Basically, when we create the Mono we would like to
emit it instantly, during the creation. The method from the second paragraph will do that
for us Mono.just() emits wrapped item at the instantiation time.

    @Test
    fun `mono just is eager`() {
        val atomicInteger = AtomicInteger(0)
        val eagerMono = Mono.just(atomicInteger.incrementAndGet())

        Assertions.assertEquals(1, atomicInteger.get())
    }

Inside the Mono we are incrementing atomicInteger, but we are never subscribing
to given eagerMono, but still, incrementation takes place.

Create lazy Mono

Lazy Mono is opposite to the eager Mono, so we are not emitting anything
straight away, we are waiting for a subscription. To achieve that we can use the method
fromCallable which takes Callable as a parameter.

    @Test
    fun `mono from callable is lazy`() {
        val atomicInteger = AtomicInteger(0)
        val lazyMono = Mono.fromCallable { atomicInteger.incrementAndGet() }

        Assertions.assertEquals(0, atomicInteger.get())
    }

Inside the Mono we are incrementing atomicInteger, but we are never subscribing
to lazeMono and incrementation doesn't take a place. To execute
atomicInteger.incrementAntGet() we have to subscribe lazyMono.

Create new Mono per each subscription

Method Mono.defer() will create a provider that will deliver the given Mono for each
subscriber. Let's analyze the example below.

    @Test
    fun `mono defer` () {
        val atomicInteger = AtomicInteger(0)

        val deferedMono = Mono.defer { Mono.just(atomicInteger.incrementAndGet()) }

        StepVerifier
            .create(deferedMono)
            .expectNext(1)
            .verifyComplete()

        StepVerifier
            .create(deferedMono)
            .expectNext(2)
            .verifyComplete()
    }

We are doing two subscriptions on the same deferedMono provider. The result is quite
simple to predict, each subscription executes incrementation emitted by Mono.just()
delivered by Mono.defer() creator.

The post How to create Mono? appeared first on CodingFlower.

]]>
https://www.codingflower.com/2021/12/18/how-to-create-mono/feed 0
Kotlin coroutines run blocking https://www.codingflower.com/2021/02/25/kotlin-coroutines-run-blocking https://www.codingflower.com/2021/02/25/kotlin-coroutines-run-blocking#respond Thu, 25 Feb 2021 00:23:23 +0000 http://www.codingflower.com/?p=1866 In this short article, I will explain how to use runBlocking with multiple threads. Let's…

The post Kotlin coroutines run blocking appeared first on CodingFlower.

]]>
In this short article, I will explain how to use runBlocking with multiple threads. Let's analyze the code below. This is a quite popular use case of coroutines utilization. We can imagine that we make two different calls to two different services asynchronously and return the combined response.

fun main() {
    val response =
        runBlocking {
            val responseA = async { callToServiceA() }
            val responseB = async { callToServiceB() }
            return@runBlocking "${responseA.await()} + ${responseB.await()}"
        }

    println(response)
}

suspend fun callToServiceA(): Int {
    println("Start -> Service A thread: ${Thread.currentThread().id}")
    delay(1000)
    println("End -> Service A thread: ${Thread.currentThread().id}")
    return Random(1).nextInt()
}

suspend fun callToServiceB(): Int {
    println("Start -> Service B thread: ${Thread.currentThread().id}")
    delay(1000)
    println("End -> Service B thread: ${Thread.currentThread().id}")
    return Random(1).nextInt()
}

That's how the results look like:

Start -> Service A thread: 1
Start -> Service B thread: 1
End -> Service A thread: 1
End -> Service B thread: 1
600123930 + 600123930

So we can easily notice that both asynchronous tasks use the same thread. Even though async block uses Default coroutine dispatcher, because context is inherited from the outer scope (in this case from runBlocking). If we don't specify any CoroutineDispatcher , then all coroutines will be running on the current thread.

How can we execute async in a different thread?

The solution will be very easy. We can change the context of runBlocking to context = Dispatchers.Default . Then the output of the above code snippet will be:

Start -> Service A thread: 14
Start -> Service B thread: 15
End -> Service A thread: 14
End -> Service B thread: 15
600123930 + 600123930

Finally, our async blocks run concurrently.

The post Kotlin coroutines run blocking appeared first on CodingFlower.

]]>
https://www.codingflower.com/2021/02/25/kotlin-coroutines-run-blocking/feed 0
Killing mutations with Kotlin https://www.codingflower.com/2021/01/24/killing-mutations-with-kotlin https://www.codingflower.com/2021/01/24/killing-mutations-with-kotlin#respond Sun, 24 Jan 2021 23:48:35 +0000 http://www.codingflower.com/?p=1861 What are these mutations? Why do we need them? Do we really need to kill…

The post Killing mutations with Kotlin appeared first on CodingFlower.

]]>
What are these mutations? Why do we need them? Do we really need to kill them?

Mutation - application modified by PIT test.

We are creating tests to check that our implementations are working as we want. We are keeping high test coverage, but how can we check that our test really works? Traditional test coverage measures only which code was executed in your tests. It doesn't mean that this test is able to detect errors, it means that this code was used during tests.

How can we deal with this problem? We can run our tests against automatically modified versions of our application code. When the application code changes, it should produce different results and cause the test to fail. If after modification the test doesn't fail it could be broken.

Mutation testing

Running modified versions of applications looks reasonable, but we aren't going to change our code by hand, we are going to use the PIT test.

PIT test is mutation testing system which is compatible with JVM.

Let`s consider example, we are testing method which returns true if value is positive and false if is not.

class Mutations {
    fun isPositive(value: Int): Boolean {
        return value > 0
    }
}

Here is simple Kotest test:

class MutationsTest : ShouldSpec({
    should("return true if value is bigger than zero") {
        val mutation = Mutations()

        mutation.isPositive(20) shouldBe true
        mutation.isPositive(-20) shouldBe false
    }
})

If we run this test in Intellij Idea with code coverage we will see 100% of code coverage and test of course passed.

The mutation report will show us that our test is not perfect.

Our test had three mutations, two of them are dead, but one is still alive, but why? We already had 100% code coverage. The edge case is a problem, the zero number. Mutation modified condition from > to >= . The fix for this situation is very simple, we have to put another test case for zero number, then all mutation will be dead. By default only a few mutations are active, but you can enable more.

Summary

Mutation testing is a quite good way to verify that your tests really detect faults in code instead of just improving code coverage. It's really easy to integrate with kotest framework. The sad thing is this project doesn't have a supportive community, there are almost 300 GitHub issues already. If we have a bigger amount of tests, running mutation testing takes much time, so I don't recommend attaching it to the pipeline. I would say, this tool should be more used when your application failing but your test are green and have high code coverage.

The post Killing mutations with Kotlin appeared first on CodingFlower.

]]>
https://www.codingflower.com/2021/01/24/killing-mutations-with-kotlin/feed 0
Kotlin Contracts – cooperation with the compiler https://www.codingflower.com/2021/01/11/kotlin-contracts-cooperation-with-the-compiler https://www.codingflower.com/2021/01/11/kotlin-contracts-cooperation-with-the-compiler#respond Mon, 11 Jan 2021 22:59:45 +0000 http://www.codingflower.com/?p=1850 Sometimes compiler needs help and more information. Kotlin contracts are a way to transmit more…

The post Kotlin Contracts – cooperation with the compiler appeared first on CodingFlower.

]]>
Sometimes compiler needs help and more information. Kotlin contracts are a way to transmit more information to the compiler.

Kotlin contracts with the custom contract are experimental, but the stdlib already uses it.

This is a new language mechanism, it allows developers to pass more detailed information to the compiler and let the compiler utilize and analyze more data.

The new mechanism gives new possibilities:

  • improving smartcasts analysis
  • improve variable initialization analysis

Improve smartcasts analysis

Smartcast dissappears whenever we extract any checks to a new function, to keep the same behavior we can apply kotlin contracts.

Let's imagine the situation when we want to make object validation before processing. Validation is not so simple so we decide to extract it to the new function.

private fun validateUserWithoutContract(user: User?) {
    if (user == null) {
        throw Exception("We have big problem!")
    }
        ...
}

When we try to reach properties of a given user object without a safe call, we will meet a compilation error.

fun processWithoutContract(user: User?) {
    validateUserWithoutContract(user)
    println(user.name)        // Compilation error
}

We know that this object cannot be null because we already make a null-check, but the compiler doesn't know that. Kotlin contracts allow us to pass this information to the compiler.

@ExperimentalContracts
private fun validateUserWithContract(user: User?) {
    contract {
        returns() implies (user != null)
    }
    if (user == null) {
        error("We have big problem!")
    }
}

With this contract assumption, we can use user.name to reach name property, without a safe-call outside of the function, so processWithoutContract method will compile without any errors. The example above is the good practice of using this feature, another good way of utilization is casting.

Improve variable initialization analysis for higher-order functions

Let's check the second part of the contracts, we can tell the compiler how many times a function will be called by callsInPlace and InvocationKind .

Kotlin contracts have four invocation kinds:

  • UNKNOWN - can't initialize var or val, doesn't have to return.
  • AT_MOST_ONCE - same as UNKNOWN.
  • AT_LEAST_ONCE - can initialize var, but can't be used to initialize val, has to return after returned in the block.
  • EXACTLY_ONCE- can initialize val and var, but also has to return after returned in the block.

Let's have a look at run function:

@kotlin.internal.InlineOnly
public inline fun <R> run(block: () -> R): R {
    contract {
        callsInPlace(block, InvocationKind.EXACTLY_ONCE)
    }
    return block()
}

This function uses EXACTLY_ONCE so it means that you can use it to initialization of var and val variables. That's why the below code will compile, without this contract statement compiler will complain about Captured values initialization is forbidden due to possible reassignment and suggest changing to var instead of val.

This example illustrates one of the most common ways to use kotlin contracts, the initialization of objects in lambdas.

fun main() {
    val website: Website
    run {
        website = Website("www.codingflower.com")
    }
    print(website)
}

Limitations

  • Contract has to be on the very first lines of the code.
  • It can be used only on top functions or member functions.
  • There is no verification of the contracts.

Summary

This mechanism works in compile-time, the developer may provide additional guarantees or restrictions, which could be utilized by the compiler to perform analysis. This is the solution for the problem when we know that the object is not null but we had to use a safe call because of removed smartcasting. Some of the functions in stdlib are annotated with contracts. This contract concept is nothing new because this mechanism is already implemented in C++.

The post Kotlin Contracts – cooperation with the compiler appeared first on CodingFlower.

]]>
https://www.codingflower.com/2021/01/11/kotlin-contracts-cooperation-with-the-compiler/feed 0
How to visualize JMH Benchmarks? https://www.codingflower.com/2021/01/03/how-to-visualize-jmh-benchmarks https://www.codingflower.com/2021/01/03/how-to-visualize-jmh-benchmarks#respond Sun, 03 Jan 2021 21:45:10 +0000 http://www.codingflower.com/?p=1839 The java microbench harness allow results to exported to four different types of format: csv…

The post How to visualize JMH Benchmarks? appeared first on CodingFlower.

]]>
The java microbench harness allow results to exported to four different types of format:

  • csv
  • json
  • text
  • scsv

Exporting data to the csv file and uploading it to the excel or Google sheet and drawing some charts is the most obvious way to visualize data from the benchmark. There is faster way, you can find it here https://jmh.morethan.io/.

JMH Visualizer

Instead of exporting data to CSV, this time you have to export it to the JSON type. What are the features of this tool?

  • You can simultaneously evaluate multiple runs.
  • You can switch between modes (thrpt, avgt, ss).
  • There are different charts for each benchmark.
  • If you want to see values other than score, like gc.time or gc.count you can do that too.

Underneath you can find some screenshots with some sample visualizations:
Benchmarks
Benchmark detail
GC Options
MultiRUN
Compare Runs

Summary

A very fast and simple tool. It would be the right option for you if you don't need to carry out any complex analysis.

The post How to visualize JMH Benchmarks? appeared first on CodingFlower.

]]>
https://www.codingflower.com/2021/01/03/how-to-visualize-jmh-benchmarks/feed 0
Kotlin Books Review from the backend developer perspective https://www.codingflower.com/2020/12/01/kotlin-books-review-from-the-backend-developer-perspective https://www.codingflower.com/2020/12/01/kotlin-books-review-from-the-backend-developer-perspective#respond Tue, 01 Dec 2020 23:35:44 +0000 http://www.codingflower.com/?p=1810 I've already read some books related to the Kotlin language so I can make a…

The post Kotlin Books Review from the backend developer perspective appeared first on CodingFlower.

]]>
I've already read some books related to the Kotlin language so I can make a short review.

Kotlin In Action

You can buy this book here.

  • For whom? For beginners, excellent for the first touch with the language.
  • When is it worth reading? At the beginning of your journey with Kotlin.

Authors: Dmitry Jemerov and Svetlana Isakova, developers on the Kotlin team.

This book was the first book that I ever read about Kotlin. The knowledge included in this book was enough to create backend services.

Effective Kotlin

You can buy this book here.

  • For whom? If you have some experience in Kotlin and you would like to know more advanced stuff.
  • When is it worth reading? From my point of view, you should read this book right after Kotlin In Action.

Author: Marcin MoskaƂa.

If you come from the Java world you probably noticed that this title is very similar to the popular book Effective Java written by Joshua Bloch. This similarity is not insignificant because Effective Kotlin has the same structure as Effective Java, but all topics describe in the book are Kotlin related instead of Java. Best-practices for Kotlin's development.

Joy Of Kotlin

You can buy this book here.

  • For whom? Advanced developers, fluent with Kotlin.
  • When is it worth reading? If you want to start functionally writing Kotlin code.

Author: Pierre-Yves Saumont

Long story short, functional programming in Kotlin, a lot of exercises and examples. I don't recommend this book if you looking for a book to start with Kotlin. To be honest the most difficult book about Kotlin that I ever read, but before reading this book I've never touch functional programming.

Learning Concurrency In Kotlin

  • For whom? Android developers, will learn some concurrency features in Kotlin.
  • When is it worth reading? If you want to use concurrency in Kotlin, but all examples are related to mobile development.

Author: Miguel Angel Castiblanco Torres

You are guided by the author through the process of creating a mobile application using coroutines and other Kotlin specific concurrency features.

Summary

  • If you java developer I recommend Kotlin In Action. After reading this book, you will be able to use Kotlin on your daily basis.
  • If you already made your hands dirty with Kotlin, I would suggest reading the Effective Kotlin, you will gain knowledge of how to write your code in a more Kotlin way.
  • If you are looking for advanced topics and functional programming choose Joy Of Kotlin but be aware that, this book is really difficult, especially if you never touch functional programming.

The post Kotlin Books Review from the backend developer perspective appeared first on CodingFlower.

]]>
https://www.codingflower.com/2020/12/01/kotlin-books-review-from-the-backend-developer-perspective/feed 0
Elasticsearch Distinct search in Spring Boot & Kotlin https://www.codingflower.com/2020/11/01/elasticsearch-distinct-search-in-spring-boot-kotlin https://www.codingflower.com/2020/11/01/elasticsearch-distinct-search-in-spring-boot-kotlin#respond Sun, 01 Nov 2020 20:18:02 +0000 http://www.codingflower.com/?p=1797 One of the most popular query in SQL world is to retrieve distinct values from…

The post Elasticsearch Distinct search in Spring Boot & Kotlin appeared first on CodingFlower.

]]>
One of the most popular query in SQL world is to retrieve distinct values from the given table. Let's leave the SQL world and dive into Elasticsearch.

We have simple document:

@Document(type = "article", indexName = "data")
data class Product(
        @Id
        val id: String? = null,
        val category: String,
        val name: String,
)

and we want to get distinct categories from Elasticsearch.

How to do that?

Using Elasticsearch dev tools we can prototype query which returns us categories:

GET data/_search
{
  "size": 0, 
     "aggs":{
        "distinctCategory": {
            "terms": {
                "field": "category.keyword"
                , "size": 100
            }
        }
    }
}

We have to make aggregation distinctCategory on this index by category field. We have to change size value inside aggregation to more than the default (10) because we are expecting our index has more than 10 different categories.

We are setting size to 0 because we don't need any result from the data index, we care about aggregation.

Response for this query looks like:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
      ...
  },
  "hits" : {
      ...
  },
  "aggregations" : {
    "distinctCategory" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "road bikes",
          "doc_count" : 2130
        },
        {
          "key" : "gravel bikes",
          "doc_count" : 1916
        },
        {
          "key" : "bmx",
          "doc_count" : 1763
        }
      ]
    }
  }
}

The most important part of the response for us is aggregations. Field buckets contains our information.

How to make Elasticsearch a distinct query in Kotlin?

First we need to create aggregation, we are lucky because Elasticsearch create bunch of useful builders.

val aggregation = AggregationBuilders.terms("distinctCategory")
                .field("category.keyword")
                .size(100)

After that we only need to attach our aggregation to the search query:

val searchSourceBuilder = SearchSourceBuilder().size(0).aggregation(aggregation)
        val searchRequest = SearchRequest("data")
                .apply { source(searchSourceBuilder) }

To execute query:

val searchResponse = esConfiguration.client().search(searchRequest, RequestOptions.DEFAULT)

esConfiguration.client() returns RestHighLevelClient.

Our query works, but how to extract the distinct values?

It is very simple, as you can notice in JSON response, distinct categories names are placed in buckets list as keys.

val distincCategories = searchResponse.aggregations
    .get<Terms>("distinctCategory")
    .buckets.map { it.keyAsString }
    .toList()

The post Elasticsearch Distinct search in Spring Boot & Kotlin appeared first on CodingFlower.

]]>
https://www.codingflower.com/2020/11/01/elasticsearch-distinct-search-in-spring-boot-kotlin/feed 0