Table of Contents
ElasticSearch is most commonly used in CQRS (Command and Query Responsibility Segregation) pattern where we separate read and update operations. This approach is used to increase performance, scalability and security.
The task was to enable filtering of the users based on the experience that a user had in his career. To better understand that we need to take a look at the structure of the user experience property inside our main index.
When you have nested properties like in the example above and you want to preform complex queries and aggregations, do yourself a favour and alongside the main index where you save the main data create a new index where you will save the data that is needed to preform these complex operations.
In this way you avoid getting the problems of how to access the data in the nested properties.
Here is a document from the experience index that I created alongside the main index, in here I save only the necessary data that is needed to preform the aggregations:
As you can see in the code example above we declare the aggregation property and then define the terms by which we want to group_by, in our case we group our documents by the field userId which is stored in our index.
We now have the result of our first aggregation, the most important part here is to see the buckets property, which is an array of bucket objects that contain the key which is our userId and the doc_count property which is the number of documents that contain the same userId.
As we saw in the previous example, we start the aggregation by declaring the property aggregation and now as we are doing the sum aggregation we want to declare a property that will be displayed in our bucket when the aggregated is completed.
The sum of our startDate fields will have the property name of start and for the sum of our endDate fields will have the property name of end.
Note that this aggregation will be nested under the first aggregation group_by property that we made in the example above, you can also follow with the full query which is at the end of this article.
Here is the result of our aggregation and now our bucket contains two new properties which are start&end which we defined in the code example 0.5 and they also contain properties for our dates. The value property will be particularly usefull in the next part...
Now that we summed up the start and the end dates we need to subtract the end date with the start date to get the number of months that the user has been employeed. The result of that operation will be in milliseconds, to convert the milliseconds into actual integers we will apply the Math.round() method in Java.
This process sounds more complicated then it is, but don't worry will take it step by step:
- Fist step is to create a bucket script and name it accordingly, in our case we are trying to find the total employment duration for a given user, so will name our bucket duration.
- Then on the second step will designate our field duration as a bucket_script which will tell Elastic to view it as a bucket aggregation.
- Inside the bucket script we have the bucket_path property which enables us to define the properties that we want to use in our bucket aggregation but those properties need to be present in the bucket. To make sure that we are using the properties that are in our buckets will take a look at the code snippet 0.6 and there we see the start and end properties, which both have the value property that we will use in the next part of this aggregation. Inside the buckets_path property we define the start and end properties which will point at the sum aggregation bucket properties start.value and end.value.
- Then we define the script property, this is the main part because all the magic happens here, in here we define the params that can contain anything we want and it is the preffered way to define static variables which will be used in the source property to preform operations.
- And at the very end the source property. The duration filed in the bucket will contain the value that is calculated in this property in our case as described previously we subtract the end property with the start property and then divide it with the variable month_in_milliseconds then the result will be rounded using the Java Math.round() method.
And as we can see the duration property contains the number 8 which is the difference between the end and start values, so our user has 8 months of experience which is correct.
To make sure that the aggregation result is OK you can look at the string representation in the value_as_string property for both start and end and calculate the difference between them by hand.
As you are looking at the code example above in the code snippet 0.9 you can see a pattern that emerges when it comes to bucket aggregations. We first define the name, in this case duration_bucket_filter then we define the type of a bucket aggregation, in the example 0.7 we had a bucket_script which is used for aggregating data, now instead of a bucket script we define bucket_selector that tells Elastic that we want to select a bucket.
To select a property inside a bucket as in code snippet 0.7 we add the buckets_path property where we define the durationBucket as an alias for the duration property that is aggregated in the previous step (code snippet 0.8).
Now that the property is defined we want to filter our buckets. By adding the scripts property we define the params which we want to use inside our source. Here we defined min and max number of months that a user needs to have so that his bucket gets returned after filtering.
And at the end of this code snippet is the source which contains the logic for filtering the buckets:
Based on these conditions the buckets are filtered and returned:
As you see we set the min and max values to match the user with the key 101 to get only the user in the result of our bucket filtering.
Now we almost did it but there is one more thing, what if the user is still employed? How do we calculate the difference between start and end date if the end date is null. The answer is very simple runtime fields which are supported from the version code 7.11 so make sure you have the proper Elastic version.
The runtime fields enable us to populate fields at runtime. That means the value is generated when it is needed in our case when someone is filtering for the users that have some years of experience and are still employed the endDate field insteadd of null will be populated with the date at the moment of querying for the data.
To implement the runtime fields in our query will add the the runtime_mappings property at the top of the query. Then we will add a type to it which is date in our case and then we define the script property following with the source where the runtime_mappings preform their magic, will go over the source property now so you can understand it and tailor it to your needs:
First we check if the user is employed if (doc['isEmployed'].value.equals(true)) through the field isEmployed which is contained in every document of our index
If he isn't employed that means that the endDate field is actually populated and we need to emit it in the new field updatedEndDate
Now we have assembled our query and delivered a solution to the problem that is described at the beginning of the article. You can checkout the full query down below in the code snippet 1.2