How to set up an Elasticsearch Instance

1 Elasticsearch set up#

Most experiments on Pragmalingu focus on search applications. To reproduce these experiments, you need a search engine to play with. We chose to focus on Elasticsearch because it's the leading open-source solution, has an active community and a lot of documentation.

1.1 Elasticsearch with Docker#

Installing and configuring a full search engine on your machine just to run some experiments is not fun. To make it easiert we will use the available Docker containers to install, set up, run and remove Elasticsearch.

1.2 Install Docker#

You can install docker by following the official guide on the docker homepage. If you are using Linux as your operating system you also have to install Docker Compose. Once you are finished check if docker is running by executing the following command in your command line or terminal:

docker -v

The response should be something like this: Docker version 19.03.8, build afacb8b. If you get an error instead, troubleshoot your Docker installation, see troubleshoot for windows, Mac or Linux.

1.3 Run Elasticsearch with docker#

Now that you have docker up and running we can start by installing Elasticsearch with Docker. All you have to do is open your terminal or command line and enter

docker run --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.8.0

This will tell Docker to download the container docker.elastic.co/elasticsearch/elasticsearch:7.8.0 and run it with the name elasticsearch.

What does "-p 9200:9200 -p 9300:9300" mean?

The -p 9200:9200 -p 9300:9300 arguments tell Docker to map the network ports 9200 and 9300 of the container to the same ports on your local machine. A network port is like a post box through which you can communicate with an application running on a computer.

What does -e "discovery.type=single-node" mean?

The argument -e "discovery.type=single-node tells Elasticsearch to run in single-node mode, this means that only one machine will host the Elasticsearch application. Normally, Elasticsearch is run in cluster mode on several machines to enable scaling and make it resilient against single computer failures.

After you ran the above command you will see a lot of logging output on your command line, that is Elasticsearch telling you what it is doing. You should now be able to access and talk to Elasticsearch via http://localhost:9200/. If you access this site, you should get a JSON response similar to this one:

{  name: "c8dab6730e0b",  cluster_name: "docker-cluster",  cluster_uuid: "z1hhnWwmTHaX_VSAqrlXFA",  version: {    number: "7.8.0",    build_flavor: "default",    build_type: "docker",    build_hash: "757314695644ea9a1dc2fecd26d1a43856725e65",    build_date: "2020-06-14T19:35:50.234439Z",    build_snapshot: false,    lucene_version: "8.5.1",    minimum_wire_compatibility_version: "6.8.0",    minimum_index_compatibility_version: "6.0.0-beta1"  },  tagline: "You Know, for Search"}

This means that you successfully installed and started Elasticsearch on your machine. Woohoo! Great job! Now we can go ahead and play with it.

1.4 Install and run Elasticsearch & Kibana with Docker-Compose#

Before we actually index our first documents and run our first search, we will make it easier for us to work with Elasticsearch and Docker. Therefore, we will also use Kibana. Kibana is an application to visualize and manage data in Elasticsearch and is offered by the company behind Elasticsearch, elastic.

To run two applications simultaneously inside docker, we will use docker-compose to make it easier for us. With Docker-compose we can specifiy the containers we want to use and their configuration inside a single file. For running Elasticsearch and Kibana a simple docker-compose.yml file would look like this:

version: "3.8"services:  elasticsearch:    image: docker.elastic.co/elasticsearch/elasticsearch:7.8.0    environment:      - discovery.type=single-node    ports:      - 9200:9200      - 9300:9300  kibana:    image: docker.elastic.co/kibana/kibana:7.8.0    ports:      - 5601:5601

The services section declares and names the applications we want to use. For our use case elasticsearch and kibana. The image sections tell docker which container to use for the declared service. With the environment section we can pass application specific environment variables to the container. The ports section works just like the -p argument from the normal docker run command and maps ports from your host computer to the container.

Copy the above code and save it to a docker-compose.yml file on your machine.

Before you can start these applications using docker-compose we have to stop the currently running elasticsearch container. To do that just run the following in your command line:

docker stop elasticsearch

You can see all currently running containers by executing docker ps.

Now that the old Elasticsearch container is stopped you can go to the folder where you saved the docker-compose.yml file and open a command line there. Finally, with running docker-compose up you just have to wait until docker is finished downloading and setting up all missing containers. You will see both Elasticsearch and Kibana printing their logs to your command line. That`s it, you have them both set up and running!

You should now be able to access the Kibana UI by opening the following link in your browser http://localhost:5601. There is a lot going on there, but you can find the section that is most useful for us is located at http://localhost:5601/app/kibana#/dev_tools/console.

What happens if I don`t stop the old container?

Since we configured both the original container and the one that is started by docker-compose to use the ports 9200 and 9300, these ports will already be in use when the second container wants to start. This will prevent the second container from starting, since only one running process can hold a port at a time.

How does Kibana know how to reach Elasticsearch?

There is some magic happening in the background when you run docker-compose up. Docker-compose is actually setting up an isolated network inside your computer, where each container gets a network address assigned that corresponds to its service name. The Elasticsearch service will be reachable under the address http://elasticsearch:9200 inside this network. This is the default address configured for Kibana in docker to look for Elasticsearch.

2 Use Elasticsearch from Kibana#

Kibana offers a lot of different tools to work with Elasticsearch and to visualize data. We will use the Console. It is an interface that offers autocompletion and makes it easy to send HTTP requests to Elasticsearch.

We start by sending a simple requests that lists all indexes in Elasticsearch.

GET /_cat/indices?v

documentation for GET _cat/indices
The response should look similar to this:

health status index                          uuid                   pri rep docs.count docs.deleted store.size pri.store.sizegreen  open   .kibana-event-log-7.8.0-000001 fHgbEl2-SGSSC5KYc_l_6w   1   0          2            0     10.4kb         10.4kbgreen  open   .apm-custom-link               Kuikvz1aQTadtTKjn-1JMg   1   0          0            0       208b           208bgreen  open   .kibana_task_manager_1         iijJqOO5S-ymQBbqmvXDAw   1   0          5            4     54.4kb         54.4kbgreen  open   .apm-agent-configuration       _ig6Y8JwRby4NIiPzGCXGw   1   0          0            0       208b           208bgreen  open   .kibana_1                      3GkEN2iyQHaHpCaEjO8saQ   1   0         15            2     94.7kb         94.7kb

What is an index?

Indexes are where search engines store the information about words and the documents that these words appear in. Similar to an index in a book where you can lookup words in alphabetical order and find the pages that contain or explain these words. In Elasticsearch an index is the place where you store all the documents that have a similar structure and that you want users to find when they search.

Because we haven't created an index yet you only see internal indexes from Kibana that are prefixed with a dot. To create an index now just type in:

PUT myfirstindex

Elasticsearch should respond with:

{  "acknowledged" : true,  "shards_acknowledged" : true,  "index" : "myfirstindex"}

You can check again with GET /_cat/indices?v to make sure your new index is present. Now that you created your first index you can start adding documents to it.

The following command would add a simple document with only one field called text and the id 1 to the index myfirstindex:

PUT /myfirstindex/_doc/1{  "text": "this is my first document"}

Now that you created your first document, you can start searching on it. To find all documents that contain the word document run the following query:

GET myfirstindex/_search{  "query": {    "match": {      "text": {        "query": "document"      }    }  }}

Elasticsearch should respond with the document that you just added to the index.

That's it, you successfully set up your search server, indexed your first document and were able to search on it. Starting from here you can explore natural language processing techniques and configurations that help your search engine find matches, even if the user searches for the term documents instead or uses completely different but semantically related terms. You can read more about these techniques in our advanced guides.

3. Where to go from here#

You learned how to set up your own Elasticsearch instance using docker and docker-compose. You now know how to talk to Elasticsearch via Kibana, create indexes and documents and search on them. Most of the experiments on Pragmalingu don't require to set up your own Elasticsearch instance, but if you want to dive deeper and maybe host your service powered by Elasticsearch or conduct longer running experiments you now know how to start.

Acknowledgements:
Thanks to Kenny Hall and Irina Temnikova for proofreading this article.

Written by Samy Ateia, July 2020