Getting most used words from a column of strings in SQL

Tags:

So we have this database filled with a bunch of strings, in this case post titles.

What I want to do is:

Split the string up in to words
Count how many times words appear in strings
Give me to top 50 words
Not have this timeout in a data.se query

I tried using the info from this SO question adapted to data.se as follows:

select word, count(*) from (
select (case when instr(substr(p.Title, nums.n+1), ' ') then substr(p.Title, nums.n+1)
             else substr(p.Title, nums.n+1, instr(substr(p.Title, nums.n+1), ' ') - 1)
        end) as word
from (select ' '||Title as string
      from Posts p
     )Posts cross join
     (select 1 as n union all select 2 union all select 10
     ) nums
where substr(p.Title, nums.n, 1) = ' ' and substr(p.Title, nums.n, 1) <> ' '
) w
group by word
order by count(*) desc

Unfortunately, this gives me a slew of errors:

'substr' is not a recognized built-in function name. Incorrect syntax near '|'. Incorrect syntax near 'nums'.

So given a column of strings in SQL with a variable amount of text in each string, how can I get a list of the most frequently used X words?

535

asked May 26 '16 01:05

jmac

Video Answer

1 Answers

Query solution (No Split Function Required)

PostgreSQL

select word, count(*) from 
(
    -- get 1st words
    select split_part(title, ' ', 1) as word
    from posts

    union all

    -- get 2nd words
    select split_part(title, ' ', 2) as word
    from posts

    union all

    -- get 3rd words
    select split_part(title, ' ', 3) as word
    from posts

    -- can do this as many times as the number of words in longest title

) words
where word is not null
and word NOT IN ('', 'and', 'for', 'of', 'on')
group by word
order by count desc
limit 50;

for a concise version, see: https://dba.stackexchange.com/a/82456/95929

178

answered Sep 30 '22 19:09

Ali Saeed

Related questions
                            
                                LINQ select on a SQL View gets wrong answer
                            
                                SQL - remove duplicates from left join
                            
                                SQL - Pivot table and group by not working
                            
                                Dynamic SQL WHERE clause generation
                            
                                How to remove the first column from a temp table select
                            
                                load multiple csv into one table by SQLLDR
                            
                                SQL - using an alias in a where clause in a subquery
                            
                                How to Short-Circuit SQL Where Clause
                            
                                Why do NULL values come first when ordering DESC in a PostgreSQL query?
                            
                                Mapping lots of similar tables in SQLAlchemy
                            
                                How to Pass Bool (BIT) parameter to SQL server?
                            
                                Split comma separated values of a column in row, through Oracle SQL query
                            
                                c# - SqlConnection InfoMessage triggering only at end of execution
                            
                                querying and selecting specific column in SQLAlchemy
                            
                                How to create a table before using sqlbulkcopy
                            
                                Trying to show datediff greater than ten days
                            
                                How to sum a field based on a condition in another field in RDLC report?
                            
                                Extract string from a text after a keyword in sql?
                            
                                Table-Valued Function using IF statement in SQL Server
                            
                                How do I solve "Keyword not supported: 'metadata' "?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Getting most used words from a column of strings in SQL

Tags:

sql

sql-server

tsql

jmac

People also ask

Video Answer

1 Answers

Ali Saeed

Recent Activity

Donate For Us