Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count word occurrences in R

Tags:

string

r

Is there a function for counting the number of times a particular keyword is contained in a dataset?

For example, if dataset <- c("corn", "cornmeal", "corn on the cob", "meal") the count would be 3.

like image 484
LNA Avatar asked Oct 16 '11 03:10

LNA


People also ask

How do you count occurrences of a word in R?

You can use the str_count function from the stringr package to get the number of keywords that match a given character vector. The pattern argument of the str_count function accepts a regular expression that can be used to specify the keyword.

Is there a counting function in R?

COUNTIF Function in R, As we know if we want to count the length of the vector we can make use of the length function. In case you want to count only the number of rows or columns that meet some criteria, Yes we can do it easily.

How do I count the number of observations in a group in R?

count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()) .


3 Answers

Let's for the moment assume you wanted the number of element containing "corn":

length(grep("corn", dataset)) [1] 3 

After you get the basics of R down better you may want to look at the "tm" package.

EDIT: I realize that this time around you wanted any-"corn" but in the future you might want to get word-"corn". Over on r-help Bill Dunlap pointed out a more compact grep pattern for gathering whole words:

grep("\\<corn\\>", dataset) 
like image 172
IRTFM Avatar answered Oct 02 '22 17:10

IRTFM


Another quite convenient and intuitive way to do it is to use the str_count function of the stringr package:

library(stringr) dataset <- c("corn", "cornmeal", "corn on the cob", "meal")  # for mere occurences of the pattern: str_count(dataset, "corn") # [1] 1 1 1 0  # for occurences of the word alone: str_count(dataset, "\\bcorn\\b") # [1] 1 0 1 0  # summing it up sum(str_count(dataset, "corn")) # [1] 3 
like image 37
petermeissner Avatar answered Oct 02 '22 15:10

petermeissner


You can also do something like the following:

length(dataset[which(dataset=="corn")])
like image 20
Junaid Avatar answered Oct 02 '22 17:10

Junaid