Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the unique count of strings from a text string

Tags:

r

dplyr

stringr

tm

I am wondering on how to get the unique number of characters from the text string. Let's say I am looking for a count of repetition of the words apples, bananas, pineapples, grapes in this string.

 A<- c('I have a lot of pineapples, apples and grapes. One day the pineapples person gave the apples person two baskets of grapes')

 df<- data.frame(A) 

Let's say I want to get all the unique count of the fruits listed in the text.

  library(stringr)
  df$fruituniquecount<- str_count(df$A, "apples|pineapples|grapes|bananas")

I tried this but I get the over all count. I would like to the answer as '3'. Please suggest your ideas.

like image 393
user3570187 Avatar asked Feb 25 '19 14:02

user3570187


People also ask

How do you count unique text values?

Count Unique Text Values in ExcelEnter the formula =SUM(IF(ISTEXT(range)*COUNTIF(range,range)=1,1,0)) in the destination cell and press Ctrl+Shift+Enter. The range denotes the start and end cells that house the elements. From the general formula, we have added the ISTEXT element to find the unique text values.

How do you count the number of unique characters in a string?

To count the unique characters in a string, convert the string to a Set to remove all the duplicate characters and access the size property on the Set , e.g. new Set(str). size . The size property will return the number of unique characters in the string.

How do I print unique elements in a string?

Given a string, find the all distinct (or non-repeating characters) in it. For example, if the input string is “Geeks for Geeks”, then output should be 'for' and if input string is “Geeks Quiz”, then output should be 'GksQuiz'. The distinct characters should be printed in same order as they appear in input string.


2 Answers

You could use str_extract_all and then calculate the length of the unique elements.

Input:

A <- c('I have a lot of pineapples, apples and grapes. One day the pineapples person gave the apples person two baskets of grapes')
fruits <- "apples|pineapples|grapes|bananas"

Result

length(unique(c(stringr::str_extract_all(A, fruits, simplify = TRUE))))
# [1] 3
like image 96
markus Avatar answered Sep 21 '22 10:09

markus


Not exactly elegant, but you could use str_detect like this.

sum(str_detect(df$A, "apples"), 
    str_detect(df$A, "pineapples"), 
    str_detect(df$A, "grapes"), 
    str_detect(df$A, "bananas"))

Or, based on the comments below, if you put all these terms in their own vector you could then use an apply function:

fruits <- c("apples", "pineapples", "grapes", "bananas")
sum(sapply(fruits, function(x) str_detect(df$A, x)))
like image 43
Ben G Avatar answered Sep 22 '22 10:09

Ben G