Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can you get the n most common words in a set of many rows returned in a SQL Server query?

I want to return the 10 most common words from a query in SQL server, so running against a set of rows such as this:

quick brown fox
slow yellow fox
slow green fox

would return
fox
slow
quick
brown
yellow
green

like image 299
Nick Haslam Avatar asked Jul 08 '10 16:07

Nick Haslam


1 Answers

I'd try running a split function (separating each word through a space) over each returned row in order to get all separate words into an auxiliar table. Using the following code, you should be able to split a row by its spaces:

CREATE FUNCTION dbo.Split(@String varchar(8000), @Delimiter char(1))     
returns @temptable TABLE (items varchar(8000))     
as     
begin     
    declare @idx int     
    declare @slice varchar(8000)     

    select @idx = 1     
        if len(@String)<1 or @String is null  return     

    while @idx!= 0     
    begin     
        set @idx = charindex(@Delimiter,@String)     
        if @idx!=0     
            set @slice = left(@String,@idx - 1)     
        else     
            set @slice = @String     

        if(len(@slice)>0)
            insert into @temptable(Items) values(@slice)     

        set @String = right(@String,len(@String) - @idx)     
        if len(@String) = 0 break     
    end 
return     
end

You should call this function from within a cursor or something; inside it, simply use something like:

insert into #tmp (word) select * from dbo.split(' ', @row)

Finally, you would only have to use a simple query like:

select top 10 count(*) as number, word from separated_words_table order by number

Source here

like image 119
João Pereira Avatar answered Nov 15 '22 09:11

João Pereira