I am wondering on how to get the unique number of characters from the text string. Let's say I am looking for a count of repetition of the words apples, bananas, pineapples, grapes in this string.
A<- c('I have a lot of pineapples, apples and grapes. One day the pineapples person gave the apples person two baskets of grapes')
df<- data.frame(A)
Let's say I want to get all the unique count of the fruits listed in the text.
library(stringr)
df$fruituniquecount<- str_count(df$A, "apples|pineapples|grapes|bananas")
I tried this but I get the over all count. I would like to the answer as '3'. Please suggest your ideas.
Count Unique Text Values in ExcelEnter the formula =SUM(IF(ISTEXT(range)*COUNTIF(range,range)=1,1,0)) in the destination cell and press Ctrl+Shift+Enter. The range denotes the start and end cells that house the elements. From the general formula, we have added the ISTEXT element to find the unique text values.
To count the unique characters in a string, convert the string to a Set to remove all the duplicate characters and access the size property on the Set , e.g. new Set(str). size . The size property will return the number of unique characters in the string.
Given a string, find the all distinct (or non-repeating characters) in it. For example, if the input string is “Geeks for Geeks”, then output should be 'for' and if input string is “Geeks Quiz”, then output should be 'GksQuiz'. The distinct characters should be printed in same order as they appear in input string.
You could use str_extract_all
and then calculate the length of the unique elements.
Input:
A <- c('I have a lot of pineapples, apples and grapes. One day the pineapples person gave the apples person two baskets of grapes')
fruits <- "apples|pineapples|grapes|bananas"
Result
length(unique(c(stringr::str_extract_all(A, fruits, simplify = TRUE))))
# [1] 3
Not exactly elegant, but you could use str_detect
like this.
sum(str_detect(df$A, "apples"),
str_detect(df$A, "pineapples"),
str_detect(df$A, "grapes"),
str_detect(df$A, "bananas"))
Or, based on the comments below, if you put all these terms in their own vector you could then use an apply function:
fruits <- c("apples", "pineapples", "grapes", "bananas")
sum(sapply(fruits, function(x) str_detect(df$A, x)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With