Monday, September 4, 2023

Elasticsearch installation and indexing

 


ElasticSearch


Installation 


Steps pretty much same as available on internet different website, but mistake was found in elasticsearch.service file where it was unable to start elasticsearch service on the ubuntu server, below is the elasticsearch.service file


New one

elasticsearch.service


[Unit]

Description=Elasticsearch

Documentation=https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

Wants=network-online.target

After=network-online.target


[Service]

Type=simple

User=elasticsearch

Group=elasticsearch

ExecStart=/usr/share/elasticsearch/bin/elasticsearch

Restart=on-failure

RestartSec=5s

StartLimitInterval=0


[Install]

WantedBy=multi-user.target



Old one


[Unit]

Description=Elasticsearch

Documentation=https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

Wants=network-online.target

After=network-online.target


[Service]

Type=simple

User=your_username

Group=your_username

ExecStart=/home/your_username/elasticsearch-7.15.0/bin/elasticsearch

Restart=on-failure

RestartSec=5s

StartLimitInterval=0


[Install]

WantedBy=multi-user.target



Step 1: Install Java

  • Ensure that Java Development Kit (JDK) is installed on your server. ELK stack requires Java to run.

  • You can check if Java is already installed by running the command: java -version.

  • If Java is not installed or an older version is present, install JDK by following the appropriate instructions for your server's operating system.

Step 2: Install Elasticsearch

Add the Elasticsearch repository key:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -


Add the Elasticsearch repository:


echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list


Update the package lists and install Elasticsearch:

sudo apt update

sudo apt install elasticsearch


Start Elasticsearch service and enable it to start on boot:


sudo systemctl start elasticsearch

sudo systemctl enable elasticsearch



Confirm Elasticsearch is running by accessing its status:

sudo systemctl status elasticsearch



Step 3: Install Logstash

Add the Logstash repository key and repository:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list


Update the package lists and install Logstash:

sudo apt update

sudo apt install logstash


Step 4: Install Kibana

Add the Kibana repository key and repository:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list


Update the package lists and install Kibana:

sudo apt update

sudo apt install kibana


Step 5: Install MySQL Connector for Logstash

Install the MySQL Connector for Logstash:



sudo /usr/share/logstash/bin/logstash-plugin install logstash-integration-jdbc


Step 6: Configure Logstash

Create a Logstash configuration file, for example, car-details.conf

sudo nano /etc/logstash/conf.d/car-details.conf

Add the following configuration to the file:

input {

  jdbc {

jdbc_driver_library => "/path/to/mysql-connector-java.jar"

jdbc_driver_class => "com.mysql.jdbc.Driver"

jdbc_connection_string => "jdbc:mysql://localhost:3306/your_database_name"

jdbc_user => "your_mysql_username"

jdbc_password => "your_mysql_password"

statement => "SELECT * FROM car_details"

  }

}


output {

  elasticsearch {

hosts => ["localhost:9200"]

index => "car_details_index"

  }

}


  • Replace /path/to/mysql-connector-java.jar with the actual path to the MySQL Connector JAR file.

  • Modify the jdbc_connection_string, jdbc_user, and jdbc_password to match your MySQL database configuration.

  • Adjust the statement to your specific SQL query for retrieving car details from the car_details table.

  • Save the file and exit the text editor.

Step 7: Configure Elasticsearch and Kibana

Open the Elasticsearch configuration file using a text editor:

sudo nano /etc/elasticsearch/elasticsearch.yml


Uncomment the line containing #network.host and replace it with:

network.host: localhost


  • Save the file and exit the text editor.

  • Open the Kibana configuration file using a text editor:

sudo nano /etc/kibana/kibana.yml


Uncomment the line containing #server.host and replace it with:

server.host: "localhost"


  • Save the file and exit the text editor.

Step 8: Start and Enable Services

Start Logstash service and enable it to start on boot:

sudo systemctl start logstash

sudo systemctl enable logstash


Start Kibana service and enable it to start on boot:

sudo systemctl start kibana

sudo systemctl enable kibana


Import data from mysql to elasticsearch

—-----------------------------------------------------------------------------------

Create your file on below location like below:

vi /etc/logstash/conf.d/car-details.conf

input {

  jdbc {

jdbc_connection_string => "jdbc:mysql://localhost:3306/test_au2.database"

jdbc_user => "test_au2.database"

jdbc_password => "t1_0Ng27i"

jdbc_driver_library => "/usr/share/java/mysql-connector-java-8.0.27/mysql-connector-java-8.0.27.jar"

jdbc_driver_class => "com.mysql.cj.jdbc.Driver"

statement => "SELECT * FROM tbl_country"

  }

}


output {

  elasticsearch {

hosts => ["localhost:9200"]

index => "tbl_country_car_details"

document_id => "%{id}"

  }

}

—--------------------------------------------------------------------------------------

In above file we taking one table  tbl_country after connection with mysql whatever data we have in this table we importing in elastic search.


How to run above config file and importing data  - belos is the cmd

sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/car-details.conf


Issue during python script execution

Issue : elasticsearch was installed and running then also we were getting an error : can not import elasticsearch.

[root@master-node ~]# python3 --version

Python 3.6.8

[root@master-node ~]# pip3 --version

pip 21.3.1 from /usr/local/lib/python3.6/site-packages/pip (python 3.6)

[root@master-node ~]#


[root@master-node ~]# pip3 show elasticsearch

Name: elasticsearch

Version: 8.8.2

Summary: Python client for Elasticsearch

Home-page: https://github.com/elastic/elasticsearch-py

Author: Honza Král, Nick Lang

Author-email: honza.kral@gmail.com, nick@nicklang.com

License: Apache-2.0

Location: /usr/local/lib/python3.6/site-packages

Requires: elastic-transport

Required-by:

Note: - Make soure pip –version location should match with when you checking elasticsearch package by pip show cmd

Also check if there is any cache and other elasticsearch.py creating conflict that you can delete like

__pycache__  and elasticsearch.py


How to do full-text search in Elasticsearch ?

To achieve full-text search in Elasticsearch, you need to follow these steps:

  1. Install Elasticsearch: Start by installing Elasticsearch on your system or setting it up in a distributed cluster if you have a larger deployment.

  2. Create an Index: Define an index in Elasticsearch to store your documents. An index is a logical namespace that holds a collection of documents. You can specify settings for the index, such as the number of shards and replicas, and define mappings to specify how the fields in your documents should be indexed and analyzed.

  3. Ingest Documents: Index your documents into Elasticsearch by sending JSON payloads to the appropriate API endpoint. Elasticsearch will automatically parse and index the documents based on the specified mappings.

  4. Perform Full-Text Searches: Utilize the search API provided by Elasticsearch to execute full-text searches on your indexed documents. You can construct queries using Elasticsearch's Query DSL, which supports a wide range of search options and features. You can search for specific terms, use fuzzy matching, perform phrase searches, apply filters, and more.


So step 1 is done, 

Step 2: Create an Index:

Open kibana console - Open Dev tool


Field in tmp_tbl_cars in mysql that we would like to index or want to creat index

cartitlename

sellerttype

carmake

carmodel

kilometers

price

registrationmonth

registrationyears

power

emissionclass

carcondition

type

engine_type

color

latitude

longitude

city

postcode

transmission

makeyear

carfeatures

co2

Consumption


To create an index in Elasticsearch for the fields you mentioned from the tmp_tbl_cars table, Open Kibana in your web browser and log in.

  1. Go to the Kibana home page.

  2. Click on the "Dev Tools" tab in the top navigation bar.

  3. In the Kibana Dev Tools console, you can use the following command to create an index with the desired fields, in how case we taking above fields:

Query for creating index:

—------------------------------------------------------------------------------------

PUT tmp_car_detail

{

  "mappings": {

"properties": {

   "cartitlename": {

     "type": "text"

   },

   "sellerttype": {

     "type": "integer"

   },

   "carmake": {

     "type": "keyword"

   },

   "carmodel": {

     "type": "keyword"

   },

   "kilometers": {

     "type": "integer"

   },

   "price": {

     "type": "integer"

   },

   "registrationmonth": {

     "type": "integer"

   },

   "registrationyears": {

     "type": "integer"

   },

   "power": {

     "type": "integer"

   },

   "emissionclass": {

     "type": "keyword"

   },

   "carcondition": {

     "type": "keyword"

   },

   "type": {

     "type": "keyword"

   },

   "engine_type": {

     "type": "keyword"

   },

   "color": {

     "type": "keyword"

   },

   "latitude": {

     "type": "double"

   },

   "longitude": {

     "type": "double"

   },

   "city": {

     "type": "keyword"

   },

   "postcode": {

     "type": "keyword"

   },

   "transmission": {

     "type": "keyword"

   },

   "makeyear": {

     "type": "keyword"

   },

   "carfeatures": {

     "type": "text"

   },

   "co2": {

     "type": "text"

   },

   "Consumption": {

     "type": "text"

   }

}

  }

}

—---------------------------------------------------------

After creating the index in Elasticsearch, you can view the index and its data in Kibana. Here's how you can do it:

  1. Go to the Kibana home page.

  2. Click on the "Discover" tab in the top navigation bar.

  3. If you haven't created an index pattern yet, you will be prompted to create one. Click on "Create index pattern".

  4. In the "Index pattern" field, enter the name of your index (tmp_car_detail in this case).

  5. Follow the prompts and configure any advanced settings if needed.

  6. Once the index pattern is created, you will be redirected to the Discover page.

  7. In the "Available Fields" section on the left, you will see all the fields present in your index.

  8. You can click on the field names to view the data in each field and perform searches and filters on the data.

  9. Use the search bar and filter options to query and explore the data in your index.

Note in our case we just created index, so did not upload document in the index, for that we can use a python script that will help in uploading data from mysql field to elasticsearch indexes.

—---------------------------------------------------------------------------------------------------------------------------------------

Script path:

/root/elasticsearch_indexing_development/index_to_elastic/indexing_to_elastic_with_limit.py

from elasticsearch import Elasticsearch

import pymysql


# Connect to MySQL

mysql_conn = pymysql.connect(

host="localhost",

user="test_au2.database",

password="t1_0Ng27i",

database="test_au2.database"

)


# Connect to Elasticsearch

es = Elasticsearch([{'host': 'localhost', 'port': 9200, 'scheme': 'http'}])


# Retrieve data from MySQL

cursor = mysql_conn.cursor()

cursor.execute("SELECT cartitlename, sellerttype, carmake, carmodel, kilometers, price, registrationmonth, registrationyears, power, emissionclass, carcondition, type, engine_type, color, latitude, longitude, city, postcode, transmission, makeyear, carfeatures, co2, Consumption FROM tmp_tbl_cars")

rows = cursor.fetchall()


# Limit the number of records to index

limit = 5000

counter = 0


# Iterate and index data in Elasticsearch

for row in rows:

data = {

     "cartitlename": row[0],

     "sellerttype": row[1],

     "carmake": row[2],

     "carmodel": row[3],

     "kilometers": row[4],

     "price": row[5],

     "registrationmonth": row[6],

     "registrationyears": row[7],

     "power": row[8],

     "emissionclass": row[9],

     "carcondition": row[10],

     "type": row[11],

     "engine_type": row[12],

     "color": row[13],

     "latitude": row[14],

     "longitude": row[15],

     "city": row[16],

     "postcode": row[17],

     "transmission": row[18],

     "makeyear": row[19],

     "carfeatures": row[20],

     "co2": row[21],

     "Consumption": row[22]

}

    

# Index the data in Elasticsearch

es.index(index="tmp_car_detail", body=data)

    

# Increment the counter

counter += 1

    

# Check if the limit has been reached

if counter >= limit:

     break


# Close connections

cursor.close()

mysql_conn.close()

—----------------------------------------------------------------------


Run above script it will upload 5000 document in your index tmp_car_detail

Once it upload 5000 document you can see in your kibana Discover page


Now you are ready for full-text search

Open Dev Tool again - 


And run your full text search query: query can be anything, just beside tab you will its response

Here are 10-15 full-text search queries you can try with the given data:

  1. Search for cars with the make "Dacia":

GET tmp_car_detail/_search

{

  "query": {

    "match": {

      "carmake": "Dacia"

    }

  }

}

  1. Search for cars with the model "dacia jogger":

GET tmp_car_detail/_search

{

  "query": {

    "match": {

      "carmodel": "dacia jogger"

    }

  }

}

  1. Search for cars in Gummersbach city:

GET tmp_car_detail/_search

{

  "query": {

    "match": {

      "city": "Gummersbach"

    }

  }

}

  1. Search for cars with a price greater than 20,000:

GET tmp_car_detail/_search

{

  "query": {

    "range": {

      "price": {

        "gt": 20000

      }

    }

  }

}

  1. Search for cars with a transmission type of "Manual gearbox":

GET tmp_car_detail/_search

{

  "query": {

    "match": {

      "transmission": "Manual gearbox"

    }

  }

}

  1. Search for cars with a specific color, e.g., "grün":

GET tmp_car_detail/_search

{

  "query": {

    "match": {

      "color": "grün"

    }

  }

}

  1. Search for cars with a specific engine type, e.g., "Benzin":

GET tmp_car_detail/_search

{

  "query": {

    "match": {

      "engine_type": "Benzin"

    }

  }

}


  1. Search for cars with a specific CO2 emission value, e.g., 130:

GET tmp_car_detail/_search

{

  "query": {

    "match": {

      "co2": "130"

    }

  }

}

  1. Search for cars with a specific power value, e.g., 110:

GET tmp_car_detail/_search

{

  "query": {

    "match": {

      "power": "110"

    }

  }

}

  1. Search for cars with a specific kilometer value, e.g., 10:

GET tmp_car_detail/_search

{

  "query": {

    "match": {

      "kilometers": "10"

    }

  }

}

Remember to adjust the index name (tmp_car_detail) according to your Elasticsearch setup. Also, note that the queries assume the fields are of type text or keyword. If the fields have a different type, you may need to modify the queries accordingly.



How do we enable multilanguage support?

 we enable multilanguage support in your "tmp_car_detail" index, you can modify the existing mapping to include language-specific fields and analyzers. Here's an example of how you can enable multilanguage support for the "cartitlename" field:

—---------------------------------------------------------------------------------------------

PUT tmp_car_detail/_mapping

{

  "properties": {

"cartitlename": {

   "type": "text",

   "fields": {

     "english": {

       "type": "text",

       "analyzer": "english"

     },

     "french": {

       "type": "text",

       "analyzer": "french"

     },

     "german": {

       "type": "text",

       "analyzer": "german"

     }

     // Add more language-specific analyzers as needed

   }

}

  }

}

—---------------------------------------------------------------------------------


How to see above multilanguage support enabled ?

GET tmp_car_detail/_search

{

  "query": {

"match": {

   "cartitlename.english": "car"

}

  }

}

Check in German.

GET tmp_car_detail/_search

{

  "query": {

"match": {

   "cartitlename.german": "Auto"

}

  }

}



Note: - Above all index and document under index are with multi language (currently only applied one field cartitlename) and without multi language  support, but for adding geo_point for latitude/ longitude under one field, we need to drop index and its data(document) and we will recreate it as below:


Mapping:

Start by defining the mapping for your index, specifying a field as a geo_point type to store the geolocation information. Here's an example of how you can create a mapping:

—----------------------------------------------------------------------------------------------------------------------------

PUT tmp_car_detail

{

  "mappings": {

"properties": {

   "cartitlename": {

     "type": "text",

     "fields": {

       "english": {

         "type": "text",

         "analyzer": "english"

       },

       "french": {

         "type": "text",

         "analyzer": "french"

       },

       "german": {

         "type": "text",

         "analyzer": "german"

       }

       // Add more language-specific analyzers as needed

     }

   },

   "sellerttype": {

     "type": "integer"

   },

   "carmake": {

     "type": "keyword"

   },

   "carmodel": {

     "type": "keyword"

   },

   "kilometers": {

     "type": "integer"

   },

   "price": {

     "type": "integer"

   },

   "registrationmonth": {

     "type": "integer"

   },

   "registrationyears": {

     "type": "integer"

   },

   "power": {

     "type": "integer"

   },

   "emissionclass": {

     "type": "keyword"

   },

   "carcondition": {

     "type": "keyword"

   },

   "type": {

     "type": "keyword"

   },

   "engine_type": {

     "type": "keyword"

   },

   "color": {

     "type": "keyword"

   },

   "location": {

     "type": "geo_point"

   },

   "city": {

     "type": "keyword"

   },

   "postcode": {

     "type": "keyword"

   },

   "transmission": {

     "type": "keyword"

   },

   "makeyear": {

     "type": "keyword"

   },

   "carfeatures": {

     "type": "text"

   },

   "co2": {

     "type": "text"

   },

   "Consumption": {

     "type": "text"

   }

}

  }

}

—-----------------------------------------------------------------------------------------------------------------------------

Indexing Documents: 

When you index your documents, make sure to include the latitude and longitude values in the location field. Here how you upload document via python script.

—--------------------------------------------------------------------------------------------------------------------------

from elasticsearch import Elasticsearch

import pymysql


# Connect to MySQL

mysql_conn = pymysql.connect(

host="localhost",

user="test_au2.database",

password="t1_0Ng27i",

database="test_au2.database"

)


# Connect to Elasticsearch

es = Elasticsearch([{'host': 'localhost', 'port': 9200, 'scheme': 'http'}])


# Retrieve data from MySQL

cursor = mysql_conn.cursor()

cursor.execute("SELECT cartitlename, sellerttype, carmake, carmodel, kilometers, price, registrationmonth, registrationyears, power, emissionclass, carcondition, type, engine_type, color, latitude, longitude, city, postcode, transmission, makeyear, carfeatures, co2, Consumption FROM tmp_tbl_cars")

rows = cursor.fetchall()


# Limit the number of records to index

limit = 5000

counter = 0


# Iterate and index data in Elasticsearch

for row in rows:

data = {

     "cartitlename": row[0],

     "sellerttype": row[1],

     "carmake": row[2],

     "carmodel": row[3],

     "kilometers": row[4],

     "price": row[5],

     "registrationmonth": row[6],

     "registrationyears": row[7],

     "power": row[8],

     "emissionclass": row[9],

     "carcondition": row[10],

     "type": row[11],

     "engine_type": row[12],

     "color": row[13],

     "location": {

         "lat": row[14],

         "lon": row[15]

     },

     "city": row[16],

     "postcode": row[17],

     "transmission": row[18],

     "makeyear": row[19],

     "carfeatures": row[20],

     "co2": row[21],

     "Consumption": row[22]

}

    

# Index the data in Elasticsearch

es.index(index="tmp_car_detail", body=data)

    

# Increment the counter

counter += 1

    

# Check if the limit has been reached

if counter >= limit:

     break


# Close connections

cursor.close()

mysql_conn.close()

—---------------------------------------------------------------------------------------------------------------------------

Performing Geolocation Queries:

To find cars with the name "DACIA Jogger" and their locations in the city of "Gummersbach", you can use the following query:

—----------------------------------------------------------------------------------------------------------------------------

GET tmp_car_detail/_search

{

  "query": {

"bool": {

   "must": [

     {

       "match": {

         "city": "Gummersbach"

       }

     },

     {

       "match": {

         "cartitlename": "DACIA Jogger"

       }

     }

   ]

}

  },

  "_source": ["cartitlename", "location"]

}

—---------------------------------------------------------------------------------------------------------------------------


Tokenization in Elasticsearch. 

Tokenization is the process of breaking a text string into individual words or tokens, often removing punctuation and applying lowercasing. This process is essential for indexing and searching textual data efficiently. Elasticsearch uses tokenization during the indexing phase to split text into tokens and create an inverted index, which allows for efficient full-text search capabilities.

By default, Elasticsearch applies a standard tokenizer to text fields, which splits the text into tokens based on whitespace, punctuation, and other language-specific rules. However, you can customize tokenization by using different analyzers, tokenizers, and token filters. Analyzers define the entire process of text analysis, including tokenization, while tokenizers and token filters are components within an analyzer.

By understanding the tokenization process, you can configure Elasticsearch to handle various search scenarios, such as case-insensitive searches, stemming, synonyms, and more.

Mapping

Drop previous index and create again / Mapping:

—----------------------------------------------------------------------------------------------------------------

PUT tmp_car_detail

{

  "settings": {

"analysis": {

   "analyzer": {

     "custom_analyzer": {

       "tokenizer": "standard",

       "filter": [

         "lowercase",

         "my_stemmer",

         "my_synonyms"

       ]

     },

     "english_analyzer": {

       "tokenizer": "standard",

       "filter": [

         "lowercase",

         "english_stemmer"

       ]

     },

     "german_analyzer": {

       "tokenizer": "standard",

       "filter": [

         "lowercase",

         "german_stemmer"

       ]

     },

     "french_analyzer": {

       "tokenizer": "standard",

       "filter": [

         "lowercase",

         "french_stemmer"

       ]

     }

   },

   "filter": {

     "my_stemmer": {

       "type": "stemmer",

       "name": "english"

     },

     "my_synonyms": {

       "type": "synonym",

       "synonyms": [

         "automatic, auto",

         "condition, state",

         "make1, brand1",

         "make2, brand2"

       ]

     },

     "english_stemmer": {

       "type": "stemmer",

       "name": "english"

     },

     "german_stemmer": {

       "type": "stemmer",

       "name": "light_german"

     },

     "french_stemmer": {

       "type": "stemmer",

       "name": "light_french"

     }

   }

}

  },

  "mappings": {

"properties": {

   "cartitlename": {

     "type": "text",

     "analyzer": "custom_analyzer"

   },

   "sellerttype": {

     "type": "keyword"

   },

   "carmake": {

     "type": "text",

     "analyzer": "custom_analyzer"

   },

   "carmodel": {

     "type": "text",

     "analyzer": "custom_analyzer"

   },

   "kilometers": {

     "type": "integer"

   },

   "price": {

     "type": "integer"

   },

   "registrationmonth": {

     "type": "integer"

   },

   "registrationyears": {

     "type": "integer"

   },

   "power": {

     "type": "integer"

   },

   "emissionclass": {

     "type": "text",

     "analyzer": "custom_analyzer"

   },

   "carcondition": {

     "type": "text",

     "analyzer": "custom_analyzer"

   },

   "type": {

     "type": "text",

     "analyzer": "custom_analyzer"

   },

   "engine_type": {

     "type": "text",

     "analyzer": "custom_analyzer"

   },

   "color": {

     "type": "text",

     "analyzer": "custom_analyzer"

   },

   "latitude": {

     "type": "float"

   },

   "longitude": {

     "type": "float"

   },

   "city": {

     "type": "text",

     "analyzer": "custom_analyzer"

   },

   "postcode": {

     "type": "keyword"

   },

   "transmission": {

     "type": "text",

     "analyzer": "custom_analyzer"

   },

   "makeyear": {

     "type": "keyword"

   },

   "carfeatures": {

     "type": "text",

     "analyzer": "custom_analyzer"

   },

   "co2": {

     "type": "text",

     "analyzer": "custom_analyzer"

   },

   "Consumption": {

     "type": "text",

     "analyzer": "custom_analyzer"

   },

   "location": {

     "type": "geo_point"

   }

}

  }

}

—----------------------------------------------------------------------------------------------------------------


Indexing Documents: 

—-----------------------------------------------------------------------------------------------------

from elasticsearch import Elasticsearch

import pymysql


# Connect to MySQL

mysql_conn = pymysql.connect(

host="localhost",

user="test_au2.database",

password="t1_0Ng27i",

database="test_au2.database"

)


# Connect to Elasticsearch

es = Elasticsearch([{'host': 'localhost', 'port': 9200, 'scheme': 'http'}])


# Retrieve data from MySQL

cursor = mysql_conn.cursor()

cursor.execute("SELECT cartitlename, sellerttype, carmake, carmodel, kilometers, price, registrationmonth, registrationyears, power, emissionclass, carcondition, type, engine_type, color, latitude, longitude, city, postcode, transmission, makeyear, carfeatures, co2, Consumption FROM tmp_tbl_cars")

rows = cursor.fetchall()


# Limit the number of records to index

limit = 5000

counter = 0


# Iterate and index data in Elasticsearch

for row in rows:

data = {

     "cartitlename": row[0],

     "sellerttype": row[1],

     "carmake": row[2],

     "carmodel": row[3],

     "kilometers": row[4],

     "price": row[5],

     "registrationmonth": row[6],

     "registrationyears": row[7],

     "power": row[8],

     "emissionclass": row[9],

     "carcondition": row[10],

     "type": row[11],

     "engine_type": row[12],

     "color": row[13],

     "location": {

         "lat": row[14],

         "lon": row[15]

     },

     "city": row[16],

     "postcode": row[17],

     "transmission": row[18],

     "makeyear": row[19],

     "carfeatures": row[20],

     "co2": row[21],

     "Consumption": row[22]

}


# Index the data in Elasticsearch

es.index(index="tmp_car_detail", body=data)


# Increment the counter

counter += 1


# Check if the limit has been reached

if counter >= limit:

     break


# Close connections

cursor.close()

mysql_conn.close()

—---------------------------------------------------------------------------------------------------------

Query String Query.

GET tmp_car_detail/_search

{

  "query": {

"query_string": {

   "query": "i want all cars in city Gummersbach with manual gearbox",

   "default_field": "*"

}

  }

}

Response:

Got 3 response for query

The query type used in the provided example is the Query String Query.

In the query, the "query_string" key indicates the use of the Query String Query. This query allows you to construct queries using a query string syntax, where you can define complex search expressions with operators and modifiers.

The "query" field specifies the actual query string: "i want all cars in city Gummersbach with manual gearbox". This query string expresses the desired search criteria.

The "default_field" field specifies the default field to search within when no specific field is specified. In this case, "*" is used as the wildcard to search across all fields in the index.

By using the Query String Query, you can perform flexible searches by combining terms, phrases, and operators to express your search criteria.

Here are the 4 sample results you provided:

  1. Document 1:

    • _index: tmp_car_detail

    • _id: CJuEaIkBTFMFHfJZjNPR

    • _score: 10.458993

  2. Document 2:

    • _index: tmp_car_detail

    • _id: CZuEaIkBTFMFHfJZjNPW

    • _score: 10.458993

  3. Document 3:

    • _index: tmp_car_detail

    • _id: <ID>

    • _score: <Score>

  4. Document 4:

    • _index: tmp_car_detail

    • _id: <ID>

    • _score: <Score>

As the _score is the same for Documents 1 and 2, both are considered equally relevant. Elasticsearch uses internal algorithms to calculate the relevance score based on the matching terms, their frequency, and other factors. The specific details of how the relevance score is calculated are complex and depend on the scoring algorithm and configuration.

If you want to fine-tune the relevance of your search results, you can experiment with the following:

  1. Boosting: You can assign higher weights (boost) to specific fields or terms to make them more important in the relevance calculation. For example, you can boost the "city" or "transmission" field to give them more relevance.

  2. Similarity Models: Elasticsearch provides different similarity models that can influence the scoring. You can experiment with different similarity models to see which one produces more relevant results for your use case.

  3. Relevance Feedback: You can use techniques like relevance feedback to refine search results based on user feedback. This involves analyzing user interactions with search results and using that feedback to improve future search results.


Geo Point:

In mysql DB we have a table tbl_zipcodes that we can use for geoLocation query in elasticsearch.

Table format in MYSQLDB

[test_au2.database]> describe tbl_zipcodes; +---------+--------------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------+--------------------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | 

| zipcode | int(5) unsigned zerofill | YES | | NULL | | 

| name | varchar(500) | NO | | NULL | | 

| lat | double | NO | | NULL | | 

| lon | double | NO | | NULL | | +---------+--------------------------+------+-----+---------+----------------+ 5 rows in set (0.00 sec)


To index the tbl_zipcodes table from your MySQL database into Elasticsearch with geolocation (latitude and longitude), you can use the following Python script:



Mapping and indexing

Script Name: index_mysql_data_with_geo.py

—--------------------------------------------------------------------------------------------------------------

from elasticsearch import Elasticsearch

from elasticsearch.helpers import bulk

import pymysql


# Connect to your MySQL database

db = pymysql.connect(

host="localhost",

user="test_au2.database",

password="t1_0Ng27i",

database="test_au2.database"

)


cursor = db.cursor()


# Fetch data from tbl_zipcodes

def fetch_data():

query = "SELECT id, zipcode, name, lat, lon FROM tbl_zipcodes"

cursor.execute(query)

columns = [column[0] for column in cursor.description]

return [dict(zip(columns, row)) for row in cursor.fetchall()]


# Create Elasticsearch index

def create_index(es, index_name):

settings = {

     "settings": {

         "number_of_shards": 1,

         "number_of_replicas": 0

     },

     "mappings": {

         "properties": {

             "zipcode": {"type": "integer"},

             "name": {"type": "text"},

             "location": {

                 "type": "geo_point"

             }

         }

     }

}


es.indices.create(index=index_name, body=settings)


# Index the data into Elasticsearch with geolocation

def index_data(es, index_name, data):

actions = [

     {

         "_index": index_name,

         "_id": doc["id"],

         "_source": {

             "zipcode": doc["zipcode"],

             "name": doc["name"],

             "location": {

                 "lat": doc["lat"],

                 "lon": doc["lon"]

             }

         }

     }

     for doc in data

]


bulk(es, actions)


if __name__ == "__main__":

# Connect to Elasticsearch

es = Elasticsearch(['http://localhost:9200'])

index_name = "zipcodes_index"


# Fetch data from MySQL

data = fetch_data()


# Check if the index already exists

if not es.indices.exists(index=index_name):

     create_index(es, index_name)


# Index the data

index_data(es, index_name, data)

—---------------------------------------------------------------------------------------------------------------


This script will create an Elasticsearch index named zipcodes_index and index the data from the tbl_zipcodes table with the appropriate mapping for geolocation. The lat and lon fields from the table will be indexed as a geo_point type in Elasticsearch, allowing you to perform geospatial queries based on latitude and longitude.

Queries:

1. Geo Distance Query

2. Geo Bounding Box Query

3. Geo Polygon Query

4. Geo Shape Query


1. Geo Distance Query for Berlin

Search for locations within a specific distance (e.g., 20 km) from Berlin (latitude 52.5200, longitude 13.4050):

GET zipcodes_index/_search

{

  "query": {

"geo_distance": {

   "distance": "20km",

   "location": {

     "lat": 52.5200,

     "lon": 13.4050

   }

}

  }

}

202 Record found


2. Geo Bounding Box Query for Berlin

In this example, the query is searching for documents within the geographic bounding box with top-left coordinates (52.55, 13.35) and bottom-right coordinates (52.48, 13.46).

GET zipcodes_index/_search

{

  "query": {

"geo_bounding_box": {

   "location": {

     "top_left": {

       "lat": 52.55,

       "lon": 13.35

     },

     "bottom_right": {

       "lat": 52.48,

       "lon": 13.46

     }

   }

}

  }

}

78 record found

3. Geo Polygon Query for Berlin

Search for locations within a specific polygon around Berlin:

GET zipcodes_index/_search

{

  "query": {

"geo_polygon": {

   "location": {

     "points": [

       { "lat": 52.53, "lon": 13.39 },

       { "lat": 52.50, "lon": 13.41 },

       { "lat": 52.49, "lon": 13.35 },

       { "lat": 52.53, "lon": 13.33 }

     ]

   }

}

  }

}

The given Elasticsearch query is searching for documents within the "zipcodes_index" index that fall within a specified geographic polygon. The polygon is defined by a set of four coordinate points, each consisting of latitude ("lat") and longitude ("lon") values.

The coordinate points for the polygon are as follows:

  1. { "lat": 53.0, "lon": 13.0 }

  2. { "lat": 52.0, "lon": 13.0 }

  3. { "lat": 52.0, "lon": 14.0 }

  4. { "lat": 53.0, "lon": 14.0 }


4. Geo Shape Query for Berlin

Search for locations within a predefined geo shape (circle) around Berlin:

GET zipcodes_index/_search

{

  "query": {

"geo_shape": {

   "location": {

     "shape": {

       "type": "circle",

       "radius": "15km",

       "coordinates": [13.4050, 52.5200]

     },

     "relation": "within"

   }

}

  }

}

192 record found

Please remember to replace the latitude and longitude coordinates in the queries with the actual coordinates for the city of Berlin. These queries will help you find locations within specified distances or shapes around Berlin in the Elasticsearch index.


Step 1: Configure Elasticsearch

vi /etc/elasticsearch/elasticsearch.yml

xpack.security.enabled: true

elastic is main user that is admin
pass- PXXktEs9!

restart elasticsearch

vi /etc/kibana/kibana.yml

elasticsearch.hosts: ["http://localhost:9200"]

elasticsearch.username: "kibana_system"
elasticsearch.password: "test@123"

restart kibana


cmd to add a user in ES
curl -u elastic -X POST "http://localhost:9200/_security/user/abhikumar" -H 'Content-Type: application/json' -d '{
  "password": "PXXktEs9!",
  "roles": ["superuser"]
}'



cmd to reset password on ES
curl -u abhikumar -X PUT "http://localhost:9200/_security/user/elastic/_password" -H 'Content-Type: application/json' -d '{
  "password": "PXXktEs9!"
}'

cmd to delete password on ES
curl -u elastic -X DELETE "http://localhost:9200/_security/user/manjeet" 


How to enable apikey auth in elasticsearch


vi /etc/elasticsearch/elasticsearch.yml
xpack.security.authc.api_key.enabled: true

restart elasticsearch service
create api key with a role_descriptor
in below example we using superuser role

GET /_security/api_key

GET /_security/role/superuser


POST /_security/api_key
{
  "name": "all-api-call"
  , "expiration": "1d"
  , "role_descriptors": {
    "superuser" : {
    "cluster" : [
      "all"
    ],
    "indices" : [
      {
        "names" : [
          "*"
        ],
        "privileges" : [
          "all"
        ],
        "allow_restricted_indices" : false
      }
    ],
    "applications" : [
      {
        "application" : "*",
        "privileges" : [
          "*"
        ],
        "resources" : [
          "*"
        ]
      }
    ],
    "run_as" : [
      "*"
    ],
    "metadata" : {
      "_reserved" : true
    },
    "transient_metadata" : { }
  }
  }
}


{
  "id" : "vD9mnIkBQ-PDyNAeSTnU",
  "name" : "all-api-call",
  "expiration" : 1690632496578,
  "api_key" : "rSL8n89lQzKw6ryVyjrqOg",
  "encoded" : "dkQ5bW5Ja0JRLVBEeU5BZVNUblU6clNMOG44OWxRekt3NnJ5VnlqcnFPZw=="
}


vD9mnIkBQ-PDyNAeSTnU:rSL8n89lQzKw6ryVyjrqOg

give me a curl cmd with below Autherization apikey

dkQ5bW5Ja0JRLVBEeU5BZVNUblU6clNMOG44OWxRekt3NnJ5VnlqcnFPZw==


curl -X GET "http://elk.shiiky.com:9200/_cluster/health" -H "Authorization: ApiKey dkQ5bW5Ja0JRLVBEeU5BZVNUblU6clNMOG44OWxRekt3NnJ5VnlqcnFPZw=="


----------

Install a logstash service.
logstash binary location
/usr/share/logstash/bin/logstash


create conf file in logstash
/usr/share/logstash/config/test.conf


general formate of conf file as below:

input {
  jdbc {
    jdbc_driver_library => "/usr/share/java/mysql-connector-java-8.0.27/mysql-connector-java-8.0.27.jar"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/test_au2.database?zeroDateTimeBehavior=convertToNull"
    jdbc_user => "test_au2.database"
    jdbc_password => "Xd_x532x9"
    statement => "SELECT *, UNIX_TIMESTAMP(modification_time) AS unix_ts_in_secs, CASE WHEN deleted_flag = 1 THEN true ELSE false END AS deleted_in_mysql FROM es_table WHERE (UNIX_TIMESTAMP(modification_time) > :sql_last_value AND modification_time < NOW()) ORDER BY modification_time ASC"
    use_column_value => true
    tracking_column => "unix_ts_in_secs"
    tracking_column_type => "numeric"
    schedule => "*/5 * * * * *"
    jdbc_paging_enabled => true
  }
}

filter {
  mutate {
    copy => { "id" => "[@metadata][_id]"}
    remove_field => ["id", "@version", "unix_ts_in_secs"]
  }

  # Add a 'ruby' filter to set the 'action' field conditionally
  ruby {
    code => '
      if event.get("deleted_in_mysql") == true
        event.set("action", "delete")
      else
        event.set("action", "index")
      end
    '
  }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    user => "elastic"
    password => "PXXktEs9!"
    index => "rdbms_sync_idx"
    document_id => "%{[@metadata][_id]}"
    action => "%{action}"
  }
}


what we did for index update and delete of tmp_tbl_cars data in mysql to sync it to ES.

we created mysql table compatible, - we added 3 column in table - modification_time, insertion_time  and as is_deleted column

then we created conf file
/usr/share/logstash/config/synchronization.conf

and tested like by running this
ExecStart=    

once its update insertion working fine.

then created service for this eslogstash.service.

and for delete created one python file so is_deleted status true document will delete by this python script every minute.
/root/elasticsearch_indexing_development/index_to_elastic/synchronized_delete_document.py


Elastic APM

Fleet Server- It used to centrally manage Elastic Agents. It’s launched as part of an Elastic Agent on a host intended to act as a server. One Fleet Server process can support many Elastic Agent connections, and serves as a control plane for updating agent policies, collecting status information, and coordinating actions across Elastic Agents.

Fleet Server handles communication between Elastic Agent


https://www.elastic.co/guide/en/fleet/7.17/fleet-server.html

APM server
The APM Server in Elasticsearch stands for Application Performance Monitoring Server. It is a component of the Elastic Stack designed to collect and process data from various sources related to application performance monitoring. APM Server sits between your application and Elasticsearch, facilitating the collection, aggregation, and forwarding of performance metrics, traces, and other monitoring data to Elasticsearch.

APM server should be on a server where application running.
install apm server - download APM server package
in pacakge we have apm-server.yaml file
in this file we configure some info.

 

Steps
Install app.
Go to APM
Download APM server package


 

Install APM agents
need to copy code and put it your application code

 Run APM by cmd and check on UI under APM - Services
go to apm-server cmd
./apm-server -

No comments:

Post a Comment