So we have this database filled with a bunch of strings, in this case post titles.
What I want to do is:
I tried using the info from this SO question adapted to data.se as follows:
select word, count(*) from (
select (case when instr(substr(p.Title, nums.n+1), ' ') then substr(p.Title, nums.n+1)
else substr(p.Title, nums.n+1, instr(substr(p.Title, nums.n+1), ' ') - 1)
end) as word
from (select ' '||Title as string
from Posts p
)Posts cross join
(select 1 as n union all select 2 union all select 10
) nums
where substr(p.Title, nums.n, 1) = ' ' and substr(p.Title, nums.n, 1) <> ' '
) w
group by word
order by count(*) desc
Unfortunately, this gives me a slew of errors:
'substr' is not a recognized built-in function name. Incorrect syntax near '|'. Incorrect syntax near 'nums'.
So given a column of strings in SQL with a variable amount of text in each string, how can I get a list of the most frequently used X words?
In the above query, SQL contains is used to search for a word 'electronic' in all column values The first argument of SQL Contain operator is the asterisk (*), it specified all searches in all full-text index columns, and the second argument is the ‘electronic’ word to be search
The first argument in the SQL contains function is the * which indicated search in the all column values, the second argument is the NEAR operator with two arguments words to be search column and the second is the word which around the given word is to be searched
Given Strings List, write a Python program to get word with most number of occurrences. Explanation : gfg occurs 3 times, most in strings in total. Explanation : geeks occurs 2 times, most in strings in total. In this, we perform task of getting each word using split (), and increase its frequency by memorizing it using defaultdict ().
The first argument is the name of the table column you want to be searched; the second argument is the substring you want to find in the first argument column value SQL Contains is a predicate that can be used to search for a word, the prefix of a word, a word near another word, synonym of a word, etc.
Query solution (No Split Function Required)
PostgreSQL
select word, count(*) from
(
-- get 1st words
select split_part(title, ' ', 1) as word
from posts
union all
-- get 2nd words
select split_part(title, ' ', 2) as word
from posts
union all
-- get 3rd words
select split_part(title, ' ', 3) as word
from posts
-- can do this as many times as the number of words in longest title
) words
where word is not null
and word NOT IN ('', 'and', 'for', 'of', 'on')
group by word
order by count desc
limit 50;
for a concise version, see: https://dba.stackexchange.com/a/82456/95929
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With