I have a dataframe (df) with a column (Col2) like this: <pre class="prettyprint"><code>Col1 Col2 Col3 1 C607989_booboobear_Nation A 2 C607989_booboobear_Nation B 3 C607989_booboobear_Nation C 4 C607989_booboobear_Nation D 5 C607989_booboobear_Nation E 6 C607989_booboobear_Nation F </code></pre> I want to extract just the number in Col2 <pre class="prettyprint"><code>Col1 Col2 Col3 1 607989 A 2 607989 B 3 607989 C 4 607989 D 5 607989 E 6 607989 F </code></pre> I have tried things like: <pre class="prettyprint"><code>gsub("^.*?_","_",df$Col2) </code></pre> but it's not working.

If your string is not too fancy/complex, it might be easiest to do something like: <pre class="prettyprint"><code>gsub("C([0-9]+)_.*", "\\1", df$Col2) # [1] "607989" "607989" "607989" "607989" "607989" "607989" </code></pre> <hr> Start with a "C", followed by digits, followed by an underscore and then anything else. Digits are captured with <code>()</code>, and the replacement is set to that capture group (<code>\\1</code>).

Remove part of a string in dataframe column (R)

Tags:

r

I have a dataframe (df) with a column (Col2) like this:

Col1                 Col2                   Col3
  1   C607989_booboobear_Nation               A
  2   C607989_booboobear_Nation               B
  3   C607989_booboobear_Nation               C
  4   C607989_booboobear_Nation               D
  5   C607989_booboobear_Nation               E
  6   C607989_booboobear_Nation               F

I want to extract just the number in Col2

Col1              Col2                    Col3
  1              607989                     A
  2              607989                     B
  3              607989                     C
  4              607989                     D
  5              607989                     E
  6              607989                     F

I have tried things like:

gsub("^.*?_","_",df$Col2)

but it's not working.

387

asked Aug 13 '14 02:08

Cybernetic

2 Answers

If your string is not too fancy/complex, it might be easiest to do something like:

gsub("C([0-9]+)_.*", "\\1", df$Col2)
# [1] "607989" "607989" "607989" "607989" "607989" "607989"

Start with a "C", followed by digits, followed by an underscore and then anything else. Digits are captured with (), and the replacement is set to that capture group (\\1).

174

answered Oct 21 '22 06:10

A5C1D2H2I1M1N2O1R2T1

An alternate approach using qdap::genXtract that grabs strings between a left and right boundary. Here I use C and _ for the left and right bounds:

## Your data in a better form for sharing
dat <- structure(list(Col1 = c("1", "2", "3", "4", "5", "6"), Col2 = c("C607989_booboobear_Nation", 
    "C607989_booboobear_Nation", "C607989_booboobear_Nation", "C607989_booboobear_Nation", 
    "C607989_booboobear_Nation", "C607989_booboobear_Nation"), Col3 = c("A", 
    "B", "C", "D", "E", "F")), .Names = c("Col1", "Col2", "Col3"), row.names = c(NA, 
    -6L), class = "data.frame")

library(qdap)
dat[[2]] <- unlist(genXtract(dat[[2]], "C", "_"))
dat

##   Col1   Col2 Col3
## 1    1 607989    A
## 2    2 607989    B
## 3    3 607989    C
## 4    4 607989    D
## 5    5 607989    E
## 6    6 607989    F

answered Oct 21 '22 08:10

Tyler Rinker

Related questions
                            
                                Filter data.table on same condition for multiple columns
                            
                                How to prevent truncation of error messages in R
                            
                                get browsing state in a function
                            
                                vcovHC and confidence interval
                            
                                Specifying ggplot2 panel width
                            
                                ESS to call different installations of R
                            
                                supply a vector to "classes" of dataframe
                            
                                How to copy row from one data.frame in to another [R]
                            
                                How does R's ifelse work with character data?
                            
                                How do I time out a lapply when a list item fails or takes too long?
                            
                                Automated ggplot2 example gallery in knitr
                            
                                R stemming a string/document/corpus
                            
                                How to iterate over list of Dates without coercion to numeric?
                            
                                Draw multiple squares with ggplot
                            
                                Print dataframe name in function output
                            
                                Changing bar width when using stat_summary with ggplot
                            
                                Merging different size data frames and repeating values
                            
                                R shiny pass variables from select list to reactive plot
                            
                                How to locate code called by .External2()?
                            
                                How to force the labels to fit in VennDiagram?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With