I have a dataframe (df) with a column (Col2) like this:
Col1 Col2 Col3
1 C607989_booboobear_Nation A
2 C607989_booboobear_Nation B
3 C607989_booboobear_Nation C
4 C607989_booboobear_Nation D
5 C607989_booboobear_Nation E
6 C607989_booboobear_Nation F
I want to extract just the number in Col2
Col1 Col2 Col3
1 607989 A
2 607989 B
3 607989 C
4 607989 D
5 607989 E
6 607989 F
I have tried things like:
gsub("^.*?_","_",df$Col2)
but it's not working.
Remove Specific Character from StringUse gsub() function to remove a character from a string or text in R. This is an R base function that takes 3 arguments, first, the character to look for, second, the value to replace with, in our case we use blank string, and the third input string were to replace.
To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.
If your string is not too fancy/complex, it might be easiest to do something like:
gsub("C([0-9]+)_.*", "\\1", df$Col2)
# [1] "607989" "607989" "607989" "607989" "607989" "607989"
Start with a "C", followed by digits, followed by an underscore and then anything else. Digits are captured with ()
, and the replacement is set to that capture group (\\1
).
An alternate approach using qdap::genXtract
that grabs strings between a left and right boundary. Here I use C
and _
for the left and right bounds:
## Your data in a better form for sharing
dat <- structure(list(Col1 = c("1", "2", "3", "4", "5", "6"), Col2 = c("C607989_booboobear_Nation",
"C607989_booboobear_Nation", "C607989_booboobear_Nation", "C607989_booboobear_Nation",
"C607989_booboobear_Nation", "C607989_booboobear_Nation"), Col3 = c("A",
"B", "C", "D", "E", "F")), .Names = c("Col1", "Col2", "Col3"), row.names = c(NA,
-6L), class = "data.frame")
library(qdap)
dat[[2]] <- unlist(genXtract(dat[[2]], "C", "_"))
dat
## Col1 Col2 Col3
## 1 1 607989 A
## 2 2 607989 B
## 3 3 607989 C
## 4 4 607989 D
## 5 5 607989 E
## 6 6 607989 F
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With