I have a table with codes and an other table with prefixes. I need to match the (longest) prefix for each code.
There is also a secondary scope in which I have to restrict prefixes (this involves bringing in other tables). I don't think this would matter in most cases, but here is a simplified (normalized) scheme (I have to set item.prefix_id):
group (id)
subgroup (id, group_id)
prefix (id, subgroup_id, prefix)
item (id, group_id, code, prefix_id)
It is allright to cache the length of the prefix in a new field and index it. It is allright to cache the group_id in prefix table (although groups are fairly small tables, in most cases I don't think any performance increase is gained). item table contains a few hundred thousand records, prefix contains at most 500.
Edit:
Sorry If the question was not defined enough. When using the word "prefix" I actually mean it, so the codes have to start with the actual prefix.
subgroup
id group_id
-------------
1 1
2 1
3 1
4 2
prefix
id subgroup_id prefix
------------------------
1 1 a
2 2 abc
3 2 123
4 4 abcdef
item
id group_id code prefix_id
-----------------------------------
1 1 abc123 NULL
2 1 abcdef NULL
3 1 a123 NULL
4 2 abc123 NULL
The expected result for the prefix column is (item.id, item.prefix_id):
(1, 2) Because: subroups 1, 2, 3 are under group 1, the code abc123
starts with the the prefix a
and the prefix abc
and abc
is the logest of the two, so we take the id of abc
which is 2 and put it into item.prefix_id
.
(2, 2) Because: even though prefix {4} (which is abcdef
) is the logest matching prefix, it's subgroup (which is 4) is under group 2 but the item is under group 1, so we can choose from subgroups 1, 2, 3 and still abc
is the logest match out of the three possible prefixes.
(3, 1) Because: a
is the logest match.
(4, NULL) Because: item 4 is under group 2 and the only prefix under group 2 is abcdef
which is no match to abc123
(because abc123
does not start with abcdef
).
But as I said the whole groping thing is not essential part of the question. My main concern is to match a table with possible prefixes to a table of strings, and how to do it the best way. (Best meaning an optimal tradeoff between readability, maintainability and performance - hence the 'best prectice' in the title).
Currently I'm doing something like:
UPDATE item USE INDEX (code3)
LEFT JOIN prefix ON prefix.length=3 AND LEFT(item.code,3)=prefix.prefix
LEFT JOIN subgroup ON subgroup.id=prefix.subgroup_id
WHERE subgroup.group_id == item.group_id AND
item.segment_id IS NULL
Where code3
is a KEY code3 (segment_id, group_id, code(3))
. - And the same logic is repeate with 1, 2, 3 and 4 as length. It seems pretty efficient, but I don't like the presence of duplication in it (4 queries for a single operation). - of course this is in the case when the maximum legth of prefixes is 4.
Thanks for everyone for sharing your ideas this far.
It is allright to cache the group_id in prefix table.
So let's create column group_id
in table prefix and fill the column with the appropriate values. I assume that you know how to do this, so let's go to the next step.
The biggest performance benefit we will get from this composite index:
ALTER TABLE `prefix` ADD INDEX `c_index` (
`group_id` ASC,
`prefix` ASC
);
And the UPDATE statement:
UPDATE item i
SET
prefix_id = (
SELECT p.id
FROM prefix p USE INDEX (`c_index`)
WHERE
p.group_id = i.group_id AND
p.prefix IN (
LEFT(i.code, 4),
LEFT(i.code, 3),
LEFT(i.code, 2),
LEFT(i.code, 1)
)
ORDER BY LENGTH(p.prefix) DESC
LIMIT 1
)
In this example I assume that prefix is variable length {1,4}. Together I decided to use IN clause instead of LIKE for to get the full benefit of c_index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With