Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R count number of commas and string

Tags:

r

nlp

I have a string:

    str1 <- "This is a string, that I've written 
        to ask about a question, or at least tried to."

How would I :

1) count the number of commas

2) count the occurences of '-ion'

Any suggestions?

like image 961
screechOwl Avatar asked Mar 12 '12 16:03

screechOwl


People also ask

How do I count the number of occurrences of a string in R?

The stringr package provides a str_count() method which is used to count the number of occurrences of a certain pattern specified as an argument to the function. The pattern may be a single character or a group of characters. Any instances matching to the expression result in the increment of the count.

How do you count commas in a string?

Please do as follows: Select the cell you will place the counting result, type the formula =LEN(A2)-LEN(SUBSTITUTE(A2,",","")) (A2 is the cell where you will count the commas) into it, and then drag this cell's AutoFill Handle to the range as you need.


2 Answers

The stringr package has a function str_count that does this for you nicely.

library(stringr)

str_count(str1, ',')
[1] 2
str_count(str1, 'ion')
[1] 1

EDIT:

Cause I was curious:

vec <- paste(sample(letters, 1e6, replace=T), collapse=' ')

system.time(str_count(vec, 'a'))
   user  system elapsed 
  0.052   0.000   0.054 

system.time(length(gregexpr('a', vec, fixed=T)[[1]]))
   user  system elapsed 
  2.124   0.016   2.146 

system.time(length(gregexpr('a', vec, fixed=F)[[1]]))
   user  system elapsed 
  0.052   0.000   0.052 
like image 88
Justin Avatar answered Sep 28 '22 04:09

Justin


The general problem of mathcing text requires regular expressions. In this case you just want to match specific characters, but the functions to call are the same. You want gregexpr.

matched_commas <- gregexpr(",", str1, fixed = TRUE)
n_commas <- length(matched_commas[[1]])

matched_ion <- gregexpr("ion", str1, fixed = TRUE)
n_ion <- length(matched_ion[[1]])

If you want to only match "ion" at the end of words, then you do need regular expressions. \b represents a word boundary, and you need to escape the backslash.

gregexpr(
  "ion\\b", 
  "ionisation should only be matched at the end of the word", 
  perl = TRUE
)
like image 20
Richie Cotton Avatar answered Sep 28 '22 03:09

Richie Cotton