I’m a complete newbie in SQL and therefore I’m not very familiar with its functionality.
So here is my problem.
I have the following table with >100.000 companies (let's call it 'comp'):
id | title | name ----+---------------------+-------------- 1 | XYZ | xyz ----+---------------------+-------------- 2 | Smarts | smarts ----+---------------------+-------------- 3 | XYZ LTD | xyzltd ----+---------------------+-------------- 4 | Outsmarts | outsmarts ----+---------------------+-------------- 5 | XYZ Entertainment | xyzentertainment ----+---------------------+-------------- 6 | Smarts Entertainment| smartsentertainment
where 'title' is a company name and 'name' is the same title but low cased and without spaces. Is there a way to find all companies with similar titles (using either 'title' or 'name')? So, basically, I want to receive:
id | title | name ----+---------------------+-------------- 1 | XYZ | xyz ----+---------------------+-------------- 3 | XYZ LTD | xyzltd ----+---------------------+-------------- 5 | XYZ Entertainment | xyzentertainment ----+---------------------+-------------- 2 | Smarts | smarts ----+---------------------+-------------- 6 | Smarts Entertainment| smartsentertainment
By similar I mean:
1) 'XYZ', 'XYZ LTD' and 'XYZ Entertainment'
2) 'Smart' and 'Smart Entertainment'
but 'XYZ Entertainment' is not similar to 'Smart Entertainment' and 'Smart' is not similar to 'Outsmarts'.
I tried this and it didn't work:
SELECT set_limit(0.8);
SELECT
similarity(c1.name, c2.name) AS sim,
c1.name,
c2.name
FROM comp AS c1
JOIN comp AS c2
ON c1.name != c2.name
AND c1.name % c2.name
ORDER BY sim DESC;
by 'didn't work' I mean that after 7 minutes it still didn't give me any results. I assume, I totally messed it up
Is it even possible to retrieve such similarities?
You could try the Levenshtein distance function, which gives you the number of edits to achieve the second from the first parameter:
SELECT levenshtein(c1.name, c2.name) AS sim, 0c1.name, c2.name
FROM comp AS c1 JOIN comp AS c2 ON c1.name != c2.name ORDER BY sim DESC;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With