Maintain elasticsearch: the indexes

-

When using elasticsearch to store and query your data, there are a number of tasks that you want to perform for every project again. I am going to write a series of blog posts that discuss one area of these tasks each. The first one is about creating/duplicating/removing indexes. I will show how to do some actions manually and I will introduce a tool I created to help you with some of these index related tasks.

Creating indexes

Within elasticsearch an index is used to store related data. The easiest way to create an index in the default elasticsearch instance is to insert a document. That way an index is created with default settings. There is also a specific api to manage an index. Using the api you can configure a number of things for that index when creating it. Configuring index specific settings is done by passing a settings object. Some settings need to be provided while creating an index (i.e. the number of shards) and others can be changed on a running index (i.e. the number of replicas). The structure and the documents in the index is specified using the mapping. You can use a dynamic mapping, than the mapping is created on the fly when data is added. This is also what is happening when a document is inserted with a non existing index. You can also specify the mapping yourself. For most projects this is most of the work with elasticsearch. When specifying your own mapping, you can specify the analysers to use on the content, the type of the data, etc. Within an index you can have multiple types, each type has it’s own mapping. When creating the index you can pass the mapping. You could add the mapping later on, but it is not possible to change mapping of fields. This is important, you can add mapping for non existing fields, but you cannot change the mapping.

Creating a new index is as easy as providing the name of the index and optionally add a settings object and one or more mapping objects. When creating indexes there is a difference between time based indexes and kind of static indexes. Having time based indexes has an advantage, it gives you the opportunity to change your mapping every time period. Since every time you create a new index you can create a new mapping for that index. It is common to use index-templates when using time based indexes. That is beyond the scope for this article. For now we focus on the static kind of indexes. With this I mean we have an index containing for instance all our blog posts. You can use the REST interface to create the index, or the sense panel of the marvel plugin. Below you’ll find the command to create the blog index including some settings and a mapping.

PUT /blog-20150112140200
{
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 3
  },
  "mappings": {
    "gridshore": {
      "properties": {
        "title": {
          "type": "string"
        },
        "tags": {
          "type": "string",
          "index": "not_analyzed"
        },
        "body": {
          "type": "string"
        },
        "publiccation_date": {
          "type": "date"
        }
      }
    }
  },
  "aliases": {
    "blog": {}
  }
}

Take notice of the final part of the command. Here we create an alias with the name blog. If you look at the top, you can see we create an index with the name blog and appended a timestamp to the name. This is a best practice. Your application should use the blog alias. That way when we want can change something in the index without problems for the application. We can create a new index, move the alias and our application keeps working like it should. Already have an index without this alias construction, no worries, I will introduce a way to create this structure based on an existing index.

Introduction of maintain-elastic tool

If you are like me and keep copy-pasting code with these create commands, than this tool might be good for you. You still need to create the settings.json and mapping.json files, however you can just drag and drop them in this tool to create an index. You can find the tool on Github, at the time of writing I am on version 0.1.

Maintain elastic release
Maintain elastic main page

To give you an idea of what it looks like, the following image shows you the index screen

Creating indexes with the tool

To create an index with a gui, push the  sign near the actions in the table.

The result is a popup with the option to write the name of the index to create. By default the name is used as an alias and an index with a timestamp is created. If that is not what you want, tick the use name as exact name checkbox. Now upload the settings and mapping files. In my example I upload the settings.json and the gridshore-mapping.json files. Their content is also below, right after the image. In my case gridshore is the type of the content in the blog index.

{
    "number_of_shards": 3,
    "number_of_replicas": 1
}

{
    "properties": {
        "title": {
          "type": "string"
        },
        "tags": {
          "type": "string",
          "index": "not_analyzed"
        },
        "body": {
          "type": "string"
        },
        "publiccation_date": {
          "type": "date"
        }
    }
}

The naming of the files is important. Settings needs to be called settings.json and for each type in your index you need to provided a type-mapping.json file. After pushing the upload button, push the create button and your index is created. The next screen shows a few lines of the same table as before. Notice the different in the status of each index. Our fresh created index is YELLOW. Do you have an idea why? Of course you do, that is because we have configured a replica and we have just one node.

The first column contains the actions you can perform. The second column the name of the index. The third column contains the name of the alias. If there is no alias, there is a button. Using this button you can create an alias with the same name as the index. Beware, this can be a heavy operation if the index is big. The other columns should explain themselves.

The next section will discuss the other things you can do with you indexes.

Index actions

the table below shows you the actions that you can perform on the indexes.

If you are just interested in what to do with indexes, you can skip the next section. If you are interested in how the tool I have shown is created, you should read the next section

The technology

For this tool I have decided to use a java application. Using the java drivers I create a Transport client to elasticsearch. All the interaction with elasticsearch is done using java. Jax-rs is used to expose a json api specific to this application. This makes it possible to embed all sorts of security measures and you can just boot the tool when you need it. At the moment there are no security measure but I intend to create at least a user management fucntionality in the tool. The rest api is used by an AngularJS application. The front-end code is glued together using GruntJS and Bower. To make life easier, I have used Dropwizard as the framework for the backend. I like all the options you get for free like performance monitoring, healthchecks, etc.

Structure of the project

Drowizzard works with assets from the resource folder. In the end we generate one big jar file containing all the java classes, html files, javascript file, etc. The Javascript files that need to be copied into one big file and than minified are taken from the folder src/main/web/js and from the bower installation folder. The same is done for the sass files. They are taken from src/main/web/sass and compiled into one big stylesheet. The results of these actions can be found in src/main/resources/assets. This copying, minify, sass is done using GruntJS and some plugins. If you want to learn more about GruntJs, read my other blog: Improve my AngularJS project with grunt. Other components I have used are: bootstrap-uibootstrap-sassfontawesome and angular-file-upload.

The Dropwizard project is structured like any java maven project. In the next section I take a look at some of the java code.

Creating indexes with java

In a previous post I described the connection from a dropwizard application to elasticsearch. In here I want to have a look at the code of using the most important class for this tool. The class IndexCreator. This class is meant to be used as a builder. You start the creation of the class using the build method. You always need to provide an elasticsearch client and the name of the index to create. Than you can specify options like the settings file or mapping files. The other options like copyOldData, removeOldIndex are also available. The following code block shows an example.

IndexCreator.build(clientManager.obtainClient(), request.getName())
    .copyFrom(request.getCopyFrom())
    .removeOldAlias()
    .removeOldIndices()
    .copyOldData(new ScrollAndBulkIndexContentCopier(clientManager.obtainClient()));
    .execute();

Notice that we provide an instance of ScrollAndBulkIndexContentCopier. This is an implementation of the IndexContentCopier interface used to scroll the existing index and use the bulk api to insert the data into the new index.

Other index action using java

Within the class IndexResource, the requests for index actions are received and than used to call the elasticsearch api. Below an example for optimizing the index. The code for most of the actions resembles this code.

@POST
@Path("/{index}/optimize")
public String optimizeIndex(@PathParam("index") String index, @QueryParam("max") int maxSegments) {
    OptimizeRequestBuilder optimizeRequestBuilder = indicesClient().prepareOptimize(index);
    if (maxSegments != 0) {
        optimizeRequestBuilder.setMaxNumSegments(maxSegments);
    }
    optimizeRequestBuilder.execute().actionGet();
    return "OK";
}

If you do not provide a maxSegments, elasticsearch just checks if an optimize needs to take place.

Final thoughts and Next steps

Hope you got some ideas about what you can do with indexes. You understand the advantage of using an alias and learn to value the fact that elasticsearch stores the original document. That way you can copy data from an old index to a new index using a new mapping.

In the next blog I am going have a look at the snapshot/restore functionality

references