Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can someone explain Magentos Indexing feature in detail?

I kind of get how the indexing in Magento works, but I haven't seen any good documentation on this. I would kind of like to know the following.

  • How it works
  • What is its purpose
  • Why is it important
  • What are the details everyone should know about it
  • Anything else that can help someone fully understand what indexing is and how it is used in Magento

I think having this information will be of great use for others in my boat that don't fully get the indexing process.

UPDATE: After the comment on my question and Ankur's answer, I am thinking I am missing something in my knowledge of just normal Database Indexing. So is this just Magento's version of handling indexing and is it better for me to get my answer in terms of Database indexing in general, such as this link here How does database indexing work?

like image 614
dan.codes Avatar asked Feb 09 '11 13:02

dan.codes


2 Answers

Magento's indexing is only similar to database-level indexing in spirit. As Anton states, it is a process of denormalization to allow faster operation of a site. Let me try to explain some of the thoughts behind the Magento database structure and why it makes indexing necessary to operate at speed.

In a more "typical" MySQL database, a table for storing catalog products would be structured something like this:

PRODUCT:
    product_id INT
    sku        VARCHAR
    name       VARCHAR
    size       VARCHAR
    longdesc   VARCHAR
    shortdesc  VARCHAR
    ... etc ...

This is fast for retrieval, but it leaves a fundamental problem for a piece of eCommerce software: what do you do when you want to add more attributes? What if you sell toys, and rather than a size column, you need age_range? Well, you could add another column, but it should be clear that in a large store (think Walmart, for instance), this would result in rows that are 90% empty and attempting to maintenance new attributes is nigh impossible.

To combat this problem, Magento splits tables into smaller units. I don't want to recreate the entire EAV system in this answer, so please accept this simplified model:

PRODUCT:
    product_id INT
    sku        VARCHAR

PRODUCT_ATTRIBUTE_VALUES
    product_id   INT
    attribute_id INT
    value        MISC

PRODUCT_ATTRIBUTES
    attribute_id
    name

Now it's possible to add attributes at will by entering new values into product_attributes and then putting adjoining records into product_attribute_values. This is basically what Magento does (with a little more respect for datatypes than I've displayed here). In fact, now there's no reason for two products to have identical fields at all, so we can create entire product types with different sets of attributes!

However, this flexibility comes at a cost. If I want to find the color of a shirt in my system (a trivial example), I need to find:

  1. The product_id of the item (in the product table)
  2. The attribute_id for color (in the attribute table)
  3. Finally, the actual value (in the attribute_values table)

Magento used to work like this, but it was dead slow. So, to allow better performance, they made a compromise: once the shop owner has defined the attributes they want, go ahead and generate the big table from the beginning. When something changes, nuke it from space and generate it over again. That way, data is stored primarily in our nice flexible format, but queried from a single table.

These resulting lookup tables are the Magento "indexes". When you re-index, you are blowing up the old table and generating it again.

Hope that clarifies things a bit!

Thanks, Joe

like image 90
Joe Mastey Avatar answered Oct 14 '22 12:10

Joe Mastey


Magento indexing is not similar to normal database indexing and is more like database denormalization (http://en.wikipedia.org/wiki/Denormalization) process. In most cases it takes the EAV structure and makes it available for flat table structure which is by no doubt faster to access and search through.

If your normal EAV query would be 200 left joins to get all the products in catalog and data over their attributes and layered navigation values then after "indexing" this data is available through denormalized data structure for faster querying/access

like image 21
Anton S Avatar answered Oct 14 '22 14:10

Anton S