Beginner’s Crash Course to Elastic Stack

Created time

Feb 15, 2023 02:45 PM

Summary

Best search engine

Progress

In progress

Category

Programming

Business

URL

https://www.youtube.com/watch?v=CCTgroOcyfM

Source

Youtube

Part 1: CRUD

Nothing important, simple CRUD.

ElasticSearch is basically Apache Lucene with scalability. It provids scalability as in MongoDB: Read/Write core Nodes as Masters, Read-only Nodes as Slaves. Information can be parsed in parallel, data can be sharded by nodes and clusters.

Elastic Stack is Stack of 4 components:

ElasticSearch: search engine

Logstash: a light-weight, open-source, server-side data processing pipeline that allows you to collect data from a variety of sources, transform it on the fly, and send it to your desired destination.

Kibana: Frontend for visualization, management and analytics

Beats: Single-purpose data shipper for ES.

Part 2: Relevance of a search

How do we measure relevance?

Precision - favors quality over quantity

notion image

Recall - favors quantity over quality

notion image

What is relevance of a search?

True positives - relevant documents that are returned to the user

False positives - NOT relevant documents that are returned to the user

True negatives - NOT relevant document that are NOT returned to the user

False negatives - relevant document that are NOT returned to the user

What is ranking?

Every doc has a score, higher is better

What is score?

Score - value that represents how relevant a document is to that specific query

Score is computed for each document that is a hit

There are multiple types of analysis, but there it will be focused only on two:

Term Frequency

notion image

Inverse Document Frequency

notion image

Key terms

minimum should match

AND/OR operators

Aggregation (very strong)

notion image

Part 3: Full text queries

Match queries

match query - basic one

match_phrase query - matches phrases

multi_match - match only in specific fields and score increase (^2), can add phrase type match

bool query

notion image

Part 4: Aggregations

metric: lowest unit, highest unit, filtration, average, stats, sum or all at the same time

cardinality: number of unique values.

Bucket aggregation

notion image

Date Histogram: fixed_interval: “2015-08-14”, calendar_interval: “1M”

Order aggregation: descending, ascending, so on

Histogram aggregation

Transactions per price interval

Range aggregation

Terms aggregation: count on unique terms?

notion image

notion image

Part 5: Mapping

What is mapping?

it’s like elasticsearch db schema. Names and types of an index

Dynamic mapping

when a user does not define it manually, ES does it automatically.

Types:

Text - for full-text searches

Keyword - exact searches, aggregations, sorting (like categories)

notion image

notion image

Runtime field

You can actually create a temporary field to run any request that was done on runtime (IDK)