I have a dataframe (df) with a column (Col2) like this:
Col1 Col2 Col3
1 C607989_booboobear_Nation A
2 C607989_booboobear_Nation B
3 C607989_booboobear_Nation C
4 C607989_booboobear_Nation D
5 C607989_booboobear_Nation E
6 C607989_booboobear_Nation F
I want to extract just the number in Col2
Col1 Col2 Col3
1 607989 A
2 607989 B
3 607989 C
4 607989 D
5 607989 E
6 607989 F
I have tried things like:
gsub("^.*?_","_",df$Col2)
but it's not working.
Remove Specific Character from StringUse gsub() function to remove a character from a string or text in R. This is an R base function that takes 3 arguments, first, the character to look for, second, the value to replace with, in our case we use blank string, and the third input string were to replace.
To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.
If your string is not too fancy/complex, it might be easiest to do something like:
gsub("C([0-9]+)_.*", "\\1", df$Col2)
# [1] "607989" "607989" "607989" "607989" "607989" "607989"
Start with a "C", followed by digits, followed by an underscore and then anything else. Digits are captured with (), and the replacement is set to that capture group (\\1).
An alternate approach using qdap::genXtract that grabs strings between a left and right boundary. Here I use C and _ for the left and right bounds:
## Your data in a better form for sharing
dat <- structure(list(Col1 = c("1", "2", "3", "4", "5", "6"), Col2 = c("C607989_booboobear_Nation",
"C607989_booboobear_Nation", "C607989_booboobear_Nation", "C607989_booboobear_Nation",
"C607989_booboobear_Nation", "C607989_booboobear_Nation"), Col3 = c("A",
"B", "C", "D", "E", "F")), .Names = c("Col1", "Col2", "Col3"), row.names = c(NA,
-6L), class = "data.frame")
library(qdap)
dat[[2]] <- unlist(genXtract(dat[[2]], "C", "_"))
dat
## Col1 Col2 Col3
## 1 1 607989 A
## 2 2 607989 B
## 3 3 607989 C
## 4 4 607989 D
## 5 5 607989 E
## 6 6 607989 F
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With