http://sqlfiddle.com/#!2/e6382
id id_news word
1 6 superman
2 6 movie
3 6 review
4 6 excellent
5 7 review
6 7 guardians of the galaxy
7 7 great
8 8 review
9 8 superman
10 8 movie
11 8 great
I have a small problem, I'm trying to relate different news through words with a threshold setting, in the example provided id_news 6
should be related to 8
but not to 7
since 7
only has 2
words in common and I only want to detect those who have at least 3
words in common.
This will get you close to what you need:
SELECT wa1.id_news id, wa2.id_news related
FROM word_analysis wa1
JOIN word_analysis wa2
ON wa2.id_news != wa1.id_news
AND wa2.word = wa1.word
GROUP BY wa1.id_news, wa2.id_news
HAVING COUNT(*)>2
ORDER BY wa1.id_news, wa2.id_news
If you don't want the reverse relationships:
SELECT wa1.id_news id, wa2.id_news related
FROM word_analysis wa1
JOIN word_analysis wa2
ON wa2.id_news > wa1.id_news
AND wa2.word = wa1.word
GROUP BY wa1.id_news, wa2.id_news
HAVING COUNT(*)>2
ORDER BY wa1.id_news, wa2.id_news
If you want to investigate just one wa1.id_news
(6):
SELECT wa2.id_news related
FROM word_analysis wa1
JOIN word_analysis wa2
ON wa2.id_news != wa1.id_news
AND wa2.word = wa1.word
WHERE wa1.id_news = 6
GROUP BY wa1.id_news, wa2.id_news
HAVING COUNT(*)>2
ORDER BY wa2.id_news
If you want to investigate just one relationship (6->8), where a result means related and no result means unrelated:
SELECT 1
FROM word_analysis wa1
JOIN word_analysis wa2
ON wa2.id_news = 8
AND wa2.word = wa1.word
WHERE wa1.id_news = 6
GROUP BY wa1.id_news, wa2.id_news
HAVING COUNT(*)>2
Try this self-join:
SELECT
wa1.id_news id_news_1,
wa2.id_news id_news_2,
count(wa2.word) cnt_words
FROM word_analysis wa1
INNER JOIN word_analysis wa2
ON wa1.id_news <> wa2.id_news AND wa1.word = wa2.word
GROUP BY wa1.id_news, wa2.id_news
HAVING count(wa2.word) >= 3
ORDER BY wa1.id_news, wa2.id_news;
SQL Fiddle demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With