I kind of get how the indexing in Magento works, but I haven't seen any good documentation on this. I would kind of like to know the following.
I think having this information will be of great use for others in my boat that don't fully get the indexing process.
UPDATE: After the comment on my question and Ankur's answer, I am thinking I am missing something in my knowledge of just normal Database Indexing. So is this just Magento's version of handling indexing and is it better for me to get my answer in terms of Database indexing in general, such as this link here How does database indexing work?
Magento's indexing is only similar to database-level indexing in spirit. As Anton states, it is a process of denormalization to allow faster operation of a site. Let me try to explain some of the thoughts behind the Magento database structure and why it makes indexing necessary to operate at speed.
In a more "typical" MySQL database, a table for storing catalog products would be structured something like this:
PRODUCT:
product_id INT
sku VARCHAR
name VARCHAR
size VARCHAR
longdesc VARCHAR
shortdesc VARCHAR
... etc ...
This is fast for retrieval, but it leaves a fundamental problem for a piece of eCommerce software: what do you do when you want to add more attributes? What if you sell toys, and rather than a size column, you need age_range
? Well, you could add another column, but it should be clear that in a large store (think Walmart, for instance), this would result in rows that are 90% empty and attempting to maintenance new attributes is nigh impossible.
To combat this problem, Magento splits tables into smaller units. I don't want to recreate the entire EAV system in this answer, so please accept this simplified model:
PRODUCT:
product_id INT
sku VARCHAR
PRODUCT_ATTRIBUTE_VALUES
product_id INT
attribute_id INT
value MISC
PRODUCT_ATTRIBUTES
attribute_id
name
Now it's possible to add attributes at will by entering new values into product_attributes
and then putting adjoining records into product_attribute_values
. This is basically what Magento does (with a little more respect for datatypes than I've displayed here). In fact, now there's no reason for two products to have identical fields at all, so we can create entire product types with different sets of attributes!
However, this flexibility comes at a cost. If I want to find the color
of a shirt in my system (a trivial example), I need to find:
product_id
of the item (in the product table)attribute_id
for color
(in the attribute table)value
(in the attribute_values table)Magento used to work like this, but it was dead slow. So, to allow better performance, they made a compromise: once the shop owner has defined the attributes they want, go ahead and generate the big table from the beginning. When something changes, nuke it from space and generate it over again. That way, data is stored primarily in our nice flexible format, but queried from a single table.
These resulting lookup tables are the Magento "indexes". When you re-index, you are blowing up the old table and generating it again.
Hope that clarifies things a bit!
Thanks, Joe
Magento indexing is not similar to normal database indexing and is more like database denormalization (http://en.wikipedia.org/wiki/Denormalization) process. In most cases it takes the EAV structure and makes it available for flat table structure which is by no doubt faster to access and search through.
If your normal EAV query would be 200 left joins to get all the products in catalog and data over their attributes and layered navigation values then after "indexing" this data is available through denormalized data structure for faster querying/access
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With