Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Word frequencies from strings in Postgres?

Is it possible to identify distinct words and a count for each, from fields containing text strings in Postgres?

like image 573
Marty Avatar asked Mar 07 '11 22:03

Marty


2 Answers

Should be split by a space ' ' or other delimit symbol between words; not by an 's', unless intended to do so, e.g., treating 'myWordshere' as 'myWord' and 'here'.

SELECT word, count(*)
FROM ( 
  SELECT regexp_split_to_table(some_column, ' ') as word
  FROM some_table
) t
GROUP BY word
like image 122
ycui Avatar answered Sep 19 '22 23:09

ycui


Something like this?

SELECT some_pk, 
       regexp_split_to_table(some_column, '\s') as word
FROM some_table

Getting the distinct words is easy then:

SELECT DISTINCT word
FROM ( 
  SELECT regexp_split_to_table(some_column, '\s') as word
  FROM some_table
) t

or getting the count for each word:

SELECT word, count(*)
FROM ( 
  SELECT regexp_split_to_table(some_column, '\s') as word
  FROM some_table
) t
GROUP BY word
like image 39
a_horse_with_no_name Avatar answered Sep 18 '22 23:09

a_horse_with_no_name