How to setup ElasticSearch index structure with multiple entity bindings

Tags:

recently I started working on ElasticSearch (ES) implementation into legacy e-commerce app written in PHP using MySQL. I am completely new to all this stuff and reading the docs is fine, yet I really need somebody with experience to advise me.

From the ES documentation I was able to setup a new cluster and I also found out that rivers are deprecated and should be replaced, so I replaced them with Logstash and JDBC MySQL connector.

At this point I have:

ElasticSearch
Logstash
JDBC MySQL driver
MySQL server

The database structure of the application is not really optimal and is very hard to replace, but I'd like to replicate it into the ES index in the best possible way.

DB Structure:

Products

+-------------------------------+-------+--------+
|              Id               | Title | Price  |
+-------------------------------+-------+--------+
| 00c8234d71c4e94f725cd432ebc04 | Alpha | 589,00 |
| 018357657529fef056cf396626812 | Beta  | 355,00 |
| 01a2c32ceeff0fc6b7dd4fc4302ab | Gamma | 0,00   |
+-------------------------------+-------+--------+

Flags

+------------+-------------+
|     Id     |    Title    |
+------------+-------------+
| sellout    | Sellout     |
| discount   | Discount    |
| topproduct | Top Product |
+------------+-------------+

flagsProducts (n:m pivot)

+------+-------------------------------+------------+------------+
|  Id  |           ProductId           |   FlagId   | ExternalId |
+------+-------------------------------+------------+------------+
| 1552 | 00c8234d71c4e94f725cd432ebc04 | sellout    | NULL       |
| 2845 | 00c8234d71c4e94f725cd432ebc04 | topproduct | NULL       |
| 9689 | 018357657529fef056cf396626812 | discount   | NULL       |
| 4841 | 01a2c32ceeff0fc6b7dd4fc4302ab | discount   | NULL       |
+------+-------------------------------+------------+------------+

Those string IDs are a complete disaster (but I have to deal with them now). At first I thought I should do a flat structure of Products index to ES, but what about multiple entity bindings?

486

asked Apr 28 '16 12:04

Jan Richter

1 Answers

That's a great start!

I would definitely flatten it all out (i.e. denormalize) and come up with product documents that look like the one below. That way you get rid of the N:M relationship between products and flags by simply creating a flags array for each product. It will thus be easier to query those flags.

{
   "id": "00c8234d71c4e94f725cd432ebc04",
   "title": "Alpha",
   "price": 589.0,
   "flags": ["Sellout", "Top Product"]
}
{
   "id": "018357657529fef056cf396626812",
   "title": "Beta",
   "price": 355.0,
   "flags": ["Discount"]
}
{
   "id": "01a2c32ceeff0fc6b7dd4fc4302ab",
   "title": "Gamma",
   "price": 0.0,
   "flags": ["Discount"]
}

The product mapping type would look like this:

PUT products
{
    "mappings": {
        "product": {
            "properties": {
                "id": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "title": {
                    "type": "string"
                },
                "price": {
                    "type": "double",
                    "null_value": 0.0
                },
                "flags": {
                    "type": "string",
                    "index": "not_analyzed"
                }
            }
        }
    }
}

Since you have the logstash jdbc input already, all you're missing is the proper SQL query to fetch the products and associated flags.

  SELECT p.Id as id, p.Title as title, p.Price as price, GROUP_CONCAT(f.Title) as flags
    FROM Products p
    JOIN flagsProducts fp ON fp.ProductId = p.Id
    JOIN Flags f ON fp.FlagId = f.id
GROUP BY p.Id

Which would get you rows like these:

+-------------------------------+-------+-------+---------------------+
| id                            | title | price | flags               |
+-------------------------------+-------+-------+---------------------+
| 00c8234d71c4e94f725cd432ebc04 | Alpha |   589 | Sellout,Top product |
| 018357657529fef056cf396626812 | Beta  |   355 | Discount            |
| 01a2c32ceeff0fc6b7dd4fc4302ab | Gamma |     0 | Discount            |
+-------------------------------+-------+-------+---------------------+

Using Logstash filters you can then split the flags into an array and you're good to go.

answered Sep 21 '22 16:09

Val

Related questions
                            
                                What does set_locale(LC_CTYPE, 'C'); actually do?
                            
                                Practical applications of PHP magic methods - __get, __set, and __call
                            
                                Ical VALARM not registrering
                            
                                PHP OOP a lot of setters, getters
                            
                                How to keep extending session life when user is active?
                            
                                Is it good practice to use serialize in PHP in order to store data into the DB?
                            
                                Performance with time related algorithm
                            
                                How to build a PHP Dependency Injection Container
                            
                                How can I indent HTML or PHP code in Notepad ++ editor?
                            
                                Handle decimal numbers in mysqli
                            
                                PHP vs Java for enterprise web applications [closed]
                            
                                List of values : Code constants or database?
                            
                                Symfony Form Field Attribute empty_data Ignored
                            
                                xdebug Time-out connecting to client. :-( using phpstorm 7.1.3/vagrant/virtualbox/magento
                            
                                Intellij PHP remote debugging: Failed to execute validation script
                            
                                Enable global middleware only for one environment in Laravel 5
                            
                                How to properly check if all form elements are filled with JavaScript
                            
                                Single Laravel Route for multiple controllers
                            
                                phpexcel insert qrcode into excel and save it to client end
                            
                                How to call a trait method with alias

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to setup ElasticSearch index structure with multiple entity bindings

Tags:

php

data-structures

mysql

elasticsearch

logstash

Jan Richter

People also ask

1 Answers

Val

Recent Activity

Donate For Us