Introduction

Elasticsearch is designed for high-speed searching and analytics, but improper indexing strategies, inefficient queries, and lack of caching can lead to slow response times and high system resource utilization. Common pitfalls include unoptimized mappings, too many shards, excessive wildcard searches, failing to use filters, and not leveraging aggregations efficiently. These issues become particularly problematic in large-scale applications where search performance directly impacts user experience. This article explores Elasticsearch query performance bottlenecks, debugging techniques, and best practices for optimizing indexing and search execution.

Common Causes of Slow Query Performance in Elasticsearch

1. Inefficient Index Mapping Causing Large Storage and Slow Lookups

Using incorrect field types and dynamic mappings increases index size and slows queries.

Problematic Scenario

PUT my_index
{
  "mappings": {
    "properties": {
      "user_id": {"type": "text"},
      "created_at": {"type": "text"}
    }
  }
}

Using `text` for numeric and date fields prevents efficient filtering and sorting.

Solution: Use Proper Field Types in Mapping

PUT my_index
{
  "mappings": {
    "properties": {
      "user_id": {"type": "keyword"},
      "created_at": {"type": "date"}
    }
  }
}

Using `keyword` for exact matches and `date` for timestamps improves performance.

2. Excessive Sharding Leading to High Resource Usage

Using too many shards increases overhead and reduces query efficiency.

Problematic Scenario

PUT my_index
{
  "settings": {
    "index": {
      "number_of_shards": 10,
      "number_of_replicas": 1
    }
  }
}

Creating too many shards for a small dataset causes unnecessary overhead.

Solution: Optimize the Number of Shards Based on Data Volume

PUT my_index
{
  "settings": {
    "index": {
      "number_of_shards": 2,
      "number_of_replicas": 1
    }
  }
}

Reducing shard count for small datasets minimizes resource usage.

3. Using Wildcard Searches Inefficiently

Wildcard queries cause full index scans, leading to slow searches.

Problematic Scenario

GET my_index/_search
{
  "query": {
    "wildcard": {"username": "*john*"}
  }
}

Using leading wildcards forces Elasticsearch to scan all terms.

Solution: Use `match` Queries or `edge_ngram` Indexing

PUT my_index
{
  "settings": {
    "analysis": {
      "tokenizer": {
        "edge_ngram_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 3,
          "max_gram": 10,
          "token_chars": ["letter", "digit"]
        }
      },
      "analyzer": {
        "edge_ngram_analyzer": {
          "type": "custom",
          "tokenizer": "edge_ngram_tokenizer"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "username": {"type": "text", "analyzer": "edge_ngram_analyzer"}
    }
  }
}

Using `edge_ngram` allows fast partial matching without full index scans.

4. Not Using Filters for Caching in Repeated Queries

Queries with dynamic scoring are slower and cannot be cached efficiently.

Problematic Scenario

GET my_index/_search
{
  "query": {
    "match": {"status": "active"}
  }
}

`match` queries perform scoring calculations unnecessarily.

Solution: Use `term` Filters for Caching

GET my_index/_search
{
  "query": {
    "term": {"status": "active"}
  }
}

`term` filters improve performance by leveraging Elasticsearch’s caching.

5. Overloading Queries with Expensive Aggregations

Complex aggregations on large datasets can cause slow query execution.

Problematic Scenario

GET my_index/_search
{
  "size": 0,
  "aggs": {
    "group_by_status": {
      "terms": {"field": "status"}
    },
    "average_age": {
      "avg": {"field": "age"}
    }
  }
}

Running multiple aggregations on large indexes increases query execution time.

Solution: Use `composite` Aggregations for Large Data

GET my_index/_search
{
  "size": 0,
  "aggs": {
    "composite_agg": {
      "composite": {
        "sources": [{"status": {"terms": {"field": "status"}}}]
      }
    }
  }
}

`composite` aggregations improve performance for large datasets.

Best Practices for Optimizing Elasticsearch Performance

1. Use Proper Index Mappings

Define field types correctly to avoid unnecessary indexing overhead.

Example:

PUT my_index
{
  "mappings": {
    "properties": {
      "user_id": {"type": "keyword"},
      "created_at": {"type": "date"}
    }
  }
}

2. Optimize Shard Allocation

Use appropriate shard count based on data size.

Example:

PUT my_index
{
  "settings": {"index": {"number_of_shards": 2, "number_of_replicas": 1}}
}

3. Avoid Expensive Wildcard Searches

Use `edge_ngram` indexing for partial matches.

Example:

"analyzer": "edge_ngram_analyzer"

4. Use Term Filters Instead of Match Queries

Leverage Elasticsearch caching for better performance.

Example:

"query": {"term": {"status": "active"}}

5. Use Composite Aggregations for Large Datasets

Improve performance of aggregation queries.

Example:

"composite": {"sources": [{"status": {"terms": {"field": "status"}}}]}

Conclusion

Elasticsearch query performance degradation often results from inefficient index mapping, excessive sharding, wildcard queries, improper filtering, and expensive aggregations. By optimizing index structures, reducing query complexity, leveraging caching, and using efficient aggregation strategies, developers can significantly improve Elasticsearch search speed and reduce resource consumption.