I have a string:
str1 <- "This is a string, that I've written
to ask about a question, or at least tried to."
How would I :
1) count the number of commas
2) count the occurences of '-ion'
Any suggestions?
The stringr package provides a str_count() method which is used to count the number of occurrences of a certain pattern specified as an argument to the function. The pattern may be a single character or a group of characters. Any instances matching to the expression result in the increment of the count.
Please do as follows: Select the cell you will place the counting result, type the formula =LEN(A2)-LEN(SUBSTITUTE(A2,",","")) (A2 is the cell where you will count the commas) into it, and then drag this cell's AutoFill Handle to the range as you need.
The stringr
package has a function str_count
that does this for you nicely.
library(stringr)
str_count(str1, ',')
[1] 2
str_count(str1, 'ion')
[1] 1
EDIT:
Cause I was curious:
vec <- paste(sample(letters, 1e6, replace=T), collapse=' ')
system.time(str_count(vec, 'a'))
user system elapsed
0.052 0.000 0.054
system.time(length(gregexpr('a', vec, fixed=T)[[1]]))
user system elapsed
2.124 0.016 2.146
system.time(length(gregexpr('a', vec, fixed=F)[[1]]))
user system elapsed
0.052 0.000 0.052
The general problem of mathcing text requires regular expressions. In this case you just want to match specific characters, but the functions to call are the same. You want gregexpr
.
matched_commas <- gregexpr(",", str1, fixed = TRUE)
n_commas <- length(matched_commas[[1]])
matched_ion <- gregexpr("ion", str1, fixed = TRUE)
n_ion <- length(matched_ion[[1]])
If you want to only match "ion" at the end of words, then you do need regular expressions. \b
represents a word boundary, and you need to escape the backslash.
gregexpr(
"ion\\b",
"ionisation should only be matched at the end of the word",
perl = TRUE
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With