I have a table with ~1.2m rows in it. It has 6 columns indexed, including one varchar(255) field that contains urls.
I need to be able to scan the table to see whether a url exists in the table, hence the index, but I'm wondering whether I would see a performance gain by reducing the index size to around 50?
Of course this would mean that it may have to scan more rows when searching for a url in the database.. but I only have to do this query about once every 30 seconds, so I'm wondering if the smaller index size would be worth it. Thoughts?
Two reasons why lowering maybe better - (Assuming your index is useful)
1) Indexes too get loaded in memory, so there maybe a rare possibility that your index size grows to an extent that it is not completely cacheable in memory. Thats when you will see a performance hit (with all the new hardware specs... hardly a possibility with 1.2M rows, but still worth noting).
2) Manytimes, just the first 'n' characters are good enough to be able to quickly identify each record. You may not need to index the whole 255 characters at all.
Two reason why you may not care -
1) As stated, you may never see your indexes growing to be out of your key buffer, so why worry.
2) You will need to determine the first 'n' characters, and even after that the performance will less than or equal to a full index... never more. Do you really need to spend time on that? Is it worth the possible loss of accuracy?
From my SQL indexing tutorial (covers MySQL as well):
Tip: Always aim to index the original data. That is often the most useful information you can put into an index.
This is a general rule I suggest until there is a very strong reason to do something different.
Space is not the issue, in most cases.
Performance wise, the index tree depth grows logarithmically with the number of index leaf nodes. That means, cutting the index size half is probably not reducing the tree depth at all. Hence, the performance gain might be limited to the improved cache-hit-rate. But you mentioned you execute that query once every 30 seconds. On a moderately loaded machine, that means you index will not be cached at all (except, maybe, you search for the same URL every 30 seconds).
After all: I don't see any reason to act against the general advice mentioned above.
If you really want to save index space, try to find redundant indexes first (e.g., those starting with the same columns). These are typically the low-hanging fruits.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With