Word frequencies from strings in Postgres?

Question

Is it possible to identify distinct words and a count for each, from fields containing text strings in Postgres?

ycui · Accepted Answer

Should be split by a space ' ' or other delimit symbol between words; not by an 's', unless intended to do so, e.g., treating 'myWordshere' as 'myWord' and 'here'.

SELECT word, count(*)
FROM ( 
  SELECT regexp_split_to_table(some_column, ' ') as word
  FROM some_table
) t
GROUP BY word

a_horse_with_no_name · Answer

Something like this?

SELECT some_pk, 
       regexp_split_to_table(some_column, '\s') as word
FROM some_table

Getting the distinct words is easy then:

SELECT DISTINCT word
FROM ( 
  SELECT regexp_split_to_table(some_column, '\s') as word
  FROM some_table
) t

or getting the count for each word:

SELECT word, count(*)
FROM ( 
  SELECT regexp_split_to_table(some_column, '\s') as word
  FROM some_table
) t
GROUP BY word

Word frequencies from strings in Postgres?

Tags:

text

postgresql

nlp

word-frequency

Marty

2 Answers

ycui

a_horse_with_no_name

Recent Activity

Donate For Us

Word frequencies from strings in Postgres?

Tags:

text

postgresql

nlp

word-frequency

Marty

2 Answers

ycui

a_horse_with_no_name

Related questions

Recent Activity

Donate For Us