I have a vector of character data. Most of the elements in the vector consist of one or more letters followed by one or more numbers. I wish to split each element in the vector into the character portion and the number portion. I found a similar question on Stackoverflow.com here:
split a character from a number with multiple digits
However, the answer given above does not seem to work completely in my case or I am doing something wrong. An example vector is below:
my.data <- c("aaa", "b11", "b21", "b101", "b111", "ccc1", "ddd1", "ccc20", "ddd13") # I can obtain the number portion using: gsub("[^[:digit:]]", "", my.data) # However, I cannot obtaining the character portion using: gsub("[:digit:]", "", my.data)
How can I obtain the character portion? I am using R version 2.14.1 on a Windows 7 64-bit machine.
The split() method of the string class is fairly straightforward. It splits the string, given a delimiter, and returns a list consisting of the elements split out from the string. By default, the delimiter is set to a whitespace - so if you omit the delimiter argument, your string will be split on each whitespace.
Method #1 : Using re. compile() + re. match() + re. groups() The combination of all the above regex functions can be used to perform this particular task.
Since none of the previous answers use tidyr::separate
here it goes:
library(tidyr) df <- data.frame(mycol = c("APPLE348744", "BANANA77845", "OATS2647892", "EGG98586456")) df %>% separate(mycol, into = c("text", "num"), sep = "(?<=[A-Za-z])(?=[0-9])" )
For your regex you have to use:
gsub("[[:digit:]]","",my.data)
The [:digit:]
character class only makes sense inside a set of []
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With