Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the number of spaces in a string

Tags:

string

r

How can I create different columns on the basis of space, eg: "I am going Out"

ANs 3
Column1 Column2 Column3 Column14 
I          am     going    out
like image 249
geniusakii Avatar asked Sep 13 '12 09:09

geniusakii


2 Answers

If you want the actual column values, as your example seems to indicate, then you can read a table from a text connection:

> read.table(textConnection("I am going Out"))
  V1 V2    V3  V4
1  I am going Out

To answer the title of your question, i.e. how many spaces there are, you can use ncol to count the columns of the above, and subtract one. However, if you are only interested in the number of spaces, the following is more efficient:

length(gregexpr(" ", "I am going Out")[[1]])

This uses a regular expression to search for the spaces.

The [[1]] takes the first element of the resulting list, which corresponds to the first item of the input vector with "I am going Out" as its only element. If you passed a different vector there, your list might have more than one element, or none at all for an empty vector.

If there is no space, gregexpr will still return a list of length 1, with -1 as the position of the match to indicate that there was no match. This causes the above code to incorrectly report one result in that case. A more elaborate solution, which deals with that and also accepts vectors as input, is the following:

countSpaces <- function(s) { sapply(gregexpr(" ", s), function(p) { sum(p>=0) } ) }

The function works as follows: gregexpr will return a list of results, one for each element of the input vector s. sapply will iterate over that list, and for each element of the list, compute the number of matches. Instead of counting the length of the vector of matched positions, it uses sum to only count the non-negative values, thus dropping any -1 caused by a failed match. There is an implicit conversion from FALSE/TRUE to 0/1 happening in that sum. The result of sapply will again be a vector, and thus nicely match the input vector.

This function can be used to rewrite a data frame, as requested in one comment. So assuming you have a data frame called foo which has strings in column bar and should be modified to contain these counts in a new column baz. You can write this as

foo <- transform(foo, baz = countSpaces(bar))
like image 130
MvG Avatar answered Sep 19 '22 14:09

MvG


Another way is to use the strsplit function:

R> strsplit("I am going Out", " ")[[1]]
[1] "I"     "am"    "going" "Out"  

So we split the first argument - I am going Out - by the second argument - the empty space. Then we can just use length:

R> length(strsplit("I am going Out", " ")[[1]])
[1] 4
like image 25
csgillespie Avatar answered Sep 20 '22 14:09

csgillespie