I'm trying to do a query that I'm not sure if it's possible I have a table called sentencess which contain ID, Sentences, and verify as shown in the picture bellow.
I have another table called, word count which contains ID, words, and there frequency. so I want when ever if a sentence entered updated, or deleted for this table to be updated accordingly or updated ones a day because there might be a lot of sentences
my expected output is something like the picture bellow.
any ideas is this doable can anyone help please.
If you are running MySQL 8.0, I would recommend a recursive common table expression for this. The idea is to iteratively walk each message, splitting it into words along the way. All that is then left to do is to aggregate.
with recursive cte as (
select
substring(concat(sent, ' '), 1, locate(' ', sent)) word,
substring(concat(sent, ' '), locate(' ', sent) + 1) sent
from messages
union all
select
substring(sent, 1, locate(' ', sent)) word,
substring(sent, locate(' ', sent) + 1) sent
from cte
where locate(' ', sent) > 0
)
select row_number() over(order by count(*) desc, word) wid, word, count(*) freq
from cte
group by word
order by wid
In earlier versions, you could emulate the same behavior with a numbers table.
Demo on DB Fiddle
Sample data:
sent | verif :------------------------- | ----: hello my name is alex | null hey alin and alex I'm tom | null hello alex my name is alin | null
Results:
wid | word | freq --: | :----- | ---: 1 | alex | 3 2 | alin | 2 3 | hello | 2 4 | is | 2 5 | my | 2 6 | name | 2 7 | and | 1 8 | hey | 1 9 | I'm | 1 10 | tom | 1
When it comes to maintaining the results of the query in a separate table, it is probably more complicated than you think: you need to be able to insert, delete or update the target table depending on the changes in the original table, which cannot be done in a single statement in MySQL. Also, keeping a flag up to date in the original table creates a race condition, where changes might occur while your are updating the target target table.
A simpler option would be to put the query in a view, so you get an always-up-to-date perspective on your data. For this, you can just wrap the above query in a create view
statement, like:
create view words_view as < above query >;
If performance becomes a problem, then you could also truncate and refill the words table periodically.
truncate table words;
insert into words < above query >;
Perl and PHP and others have a much more robust regexp engine for splitting. I would use one of them, not SQL.
I would use batch inserts, using
INSERT INTO words (word, ct)
VALUES ('this', 1), ('that', 1), ... -- about 100 words at a time
ON DUPLICATE KEY UPDATE ct = VALUES(ct) + 1;
CREATE TABLE words (
word VARCHAR(66) NOT NULL,
ct MEDIUMINT UNSIGNED NOT NULL,
PRIMARY KEY(word)
) ENGINE=InnoDB;
I see no need for having words and counts in separate tables, nor any need for an AUTO_INCREMENT
for a "word_id". The word
is a perfectly good "natural PK". However, you should decide what to do about case folding and accent stripping.
As for splitting into words... double-quotes and some other characters are clearly word boundaries. But some characters are ambiguous:
'
-- part of a contraction or a quote?.
-- abbreviation or end of a sentence
Etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With