I have a map from strings to integers. To store this map in a MySQL database I created the following table:
CREATE TABLE map(
Argument TEXT NOT NULL,
Image INTEGER NOT NULL
)
I chose the TEXT type for the argument because its length is unpredictable, currently the longest record has 2290 chars and the average length is 88 chars.
After I'd met the performance troubles I tried to add index on Argument
column, but found that I must to specify length, so to avoid this limitation I added a new integer column containing hash values (md5 or else) of Argument column values.
ALTER TABLE map ADD COLUMN ArgumentHash INTEGER;
And combined index
CREATE INDEX argument_index USING HASH ON map(ArgumentHash, Argument(80));
Since that time the problems with performance has disappeared. I'd like to ask whether it is a correct way to solve this problem.
MySQL has support for full-text indexing and searching: A full-text index in MySQL is an index of type FULLTEXT . Full-text indexes can be used only with InnoDB or MyISAM tables, and can be created only for CHAR , VARCHAR , or TEXT columns.
To create indexes, use the CREATE INDEX command: CREATE INDEX index_name ON table_name (column_name); You can an index on multiple columns.
In some cases, it also makes sense to create a separate index that contains the ORDER BY clause's columns, as MySQL sometimes chooses to use it. Please note though that for this to happen, the index should contain all columns from the ORDER BY clause and they should all be specified with the same order (ASC / DESC).
Any index in SQL Server can be a maximum of 900 bytes per index entry - if your string column is longer than that, you cannot index it.
I don't think there is a "correct" way, it depends what you are using the column for.
In my experience, it is unusual to have to/want to select on a large text column; the text is usually data retrieved by some other key (unless indexed in some other way - egs. full text, Lucene - but that doesn't appear to be what you are doing)
If you do in fact need an exact match on a large field, then it may be more efficient to use the hash as it will likely let you keep the index smaller. My guess is that if you need to use an index size larger than the size of the hash (depends on how close to the start of the TEXT the values generally differ), use the hash.
Your best bet is to try it and see. Profile both approaches with representative data and find out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With