Big Data is becoming a huge part of the modern world, and the ability to quickly sift through it and discover insights is becoming more and more valuable. ElasticSearch NoSQL document store used for creating and anylzing scalable, searchable, data warehouses. It is used by the likes of Wikipedia, Netflix, Uber, and many more of todays largest tech companies.

Today I’ll just be playing around with it, setting up a google cloud serrver and running some queries. In a future post I may show how to create a visualization board in Kibana.

⚠️ Warning: This is not intended to be an in-depth tutorial for an absolute beginner. It is however an example of working with the garage door open and I think novices may still find value in it.


Setup and Start

First of all you’ll need to get the ELK stack up and running (ElasticSearch, Logstash, Kibana). Here are some great resources for getting started witht he basics:


Making some Queries

ElasticSearch is different from SQL based analysis in that it makes all of its requests in JSON. Personally I learn best by example, so I’ve listed some desired queries and their corresponding ElasticSearch JSON.


Question 1

“Display count of all the documents in the nyc311calls index”

GET /nyc311calls/_count


Question 2

“Display count of all calls with the word “heat” in descriptor in the nyc311calls index”

GET /nyc311calls/calls/_count
{
  "query": {
      "match": { "Descriptor": "heat"}
  }
}


Question 3

“Write an aggregation to show the top 10 cities with the highest number of calls”

GET /nyc311calls/calls/_search
{
  "size":0,
  "aggs": {
    "group_by_city": {
      "terms": {
        "field": "City.keyword",
        "size": 10
      }
    }
  }
}


Question 4

“Write a query to see the status of all cases ina given borough.”

GET /nyc311calls/calls/_search
{
  "query":{
    "match":{
      "City": "BROOKLYN"
    }
  },
  "size":0,
  "aggs": {
    "group_by_status": {
      "terms": {
        "field": "Status.keyword"
        }
      }
    }
  }
}