the goal of the query is also to find possible duplicates of names that were mistyped. Example:
International Group Inc.
must be find as a duplicate of International, Group Inc
In order to accomplish this a used the next query:
SELECT C.id,
C.name,
C.address,
C.city_id
FROM company C
INNER JOIN (SELECT name
FROM company
GROUP BY name
HAVING Count(id) > 1) D
ON Replace(Replace(C.name, '.', ''), ',', '') =
Replace(Replace(D.name, '.', ''), ',', '')
It works very well and the result came at 40 secs
but adding an extra condition like AND C.city_id='4'
requires an extra minute or more; This is still acceptable but not preferable.
My real problem occurs when I try to add another condition to find out only duplicates of companies that have a specific string in the name, using this condition AND C.name LIKE '%International%'
, this just don't return any results.
Could somebody help me figure out what I am doing wrong?
Thanks
How do you inner join on two conditions? you have to use the AND for subsequent AND join criteria: SELECT * FROM EMPLOYEE.
You join two tables by creating a relationship in the WHERE clause between at least one column from one table and at least one column from another. The join creates a temporary composite table where each pair of rows (one from each table) that satisfies the join condition is linked to form a single row.
An SQL query can JOIN three tables (or more). Simply add an extra JOIN condition for the third table. 3-Table JOINs work with SELECT, UPDATE, and DELETE queries.
An inner join returns all rows from x with matching values in y, and all columns from both x and y. If there are multiple matches between x and y, all match combinations are returned.
Because you are joining on the result of a function, the query cannot use any index. Besides, the cost of executing the REPLACE()
on all rows is probably not negligible.
I suggest you first add an indexed column that receives the "stripped-down" version of the strings, and then run the query with a join on this column:
ALTER TABLE company ADD COLUMN stripped_name VARCHAR(50);
ALTER TABLE company ADD INDEX(stripped_name);
UPDATE TABLE company SET stripped_name = REPLACE(REPLACE(name, '.', ''), ',', '') ;
Running the UPDATE
could take a while the first time, but you could also set an ON UPDATE
and an ON INSERT
triggers on company
so that stripped_name
gets populated and update on-the-fly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With