Beginner’s Crash Course to Elastic Stack

Created time
Feb 15, 2023 02:45 PM
Summary
Best search engine
Progress
In progress
Category
Programming
Business
Source
Youtube

Part 1: CRUD

Nothing important, simple CRUD.
ElasticSearch is basically Apache Lucene with scalability. It provids scalability as in MongoDB: Read/Write core Nodes as Masters, Read-only Nodes as Slaves. Information can be parsed in parallel, data can be sharded by nodes and clusters.
Elastic Stack is Stack of 4 components:
  • ElasticSearch: search engine
  • Logstash: a light-weight, open-source, server-side data processing pipeline that allows you to collect data from a variety of sources, transform it on the fly, and send it to your desired destination.
  • Kibana: Frontend for visualization, management and analytics
  • Beats: Single-purpose data shipper for ES.

Part 2: Relevance of a search

How do we measure relevance?

  • Precision - favors quality over quantity
notion image
  • Recall - favors quantity over quality
notion image

What is relevance of a search?

  • True positives - relevant documents that are returned to the user
  • False positives - NOT relevant documents that are returned to the user
  • True negatives - NOT relevant document that are NOT returned to the user
  • False negatives - relevant document that are NOT returned to the user

What is ranking?

Every doc has a score, higher is better

What is score?

Score - value that represents how relevant a document is to that specific query
Score is computed for each document that is a hit
There are multiple types of analysis, but there it will be focused only on two:

Term Frequency

notion image

Inverse Document Frequency

notion image

Key terms

  • minimum should match
  • AND/OR operators
  • Aggregation (very strong)
notion image

Part 3: Full text queries

Match queries

  • match query - basic one
  • match_phrase query - matches phrases
  • multi_match - match only in specific fields and score increase (^2), can add phrase type match
  • bool query
notion image

Part 4: Aggregations

  • metric: lowest unit, highest unit, filtration, average, stats, sum or all at the same time
  • cardinality: number of unique values.

Bucket aggregation

notion image
  • Date Histogram: fixed_interval: “2015-08-14”, calendar_interval: “1M”
  • Order aggregation: descending, ascending, so on
  • Histogram aggregation
  • Transactions per price interval
  • Range aggregation
  • Terms aggregation: count on unique terms?
notion image
notion image

Part 5: Mapping

What is mapping?

it’s like elasticsearch db schema. Names and types of an index

Dynamic mapping

when a user does not define it manually, ES does it automatically.
Types:
  • Text - for full-text searches
  • Keyword - exact searches, aggregations, sorting (like categories)
notion image
notion image

Runtime field

You can actually create a temporary field to run any request that was done on runtime (IDK)