I need to subset a df to include certain strings. Some of these are full column names, and the following works fine: <pre class="prettyprint"><code>testData[,c("FullColName1","FullColName2","FullColName3")] </code></pre> My problem is that I need to expand this to also include column names that contain specific strings that may partially match to some other column names. These strings include letters and symbols: <pre class="prettyprint"><code>"PartString1()","PartString2()" </code></pre> I tried putting wildcards around these. (I've indicated this below with the prefix "star" because the "*" symbol didn't render correctly.) <pre class="prettyprint"><code>testData[ ,c("FullColName1","FullColName2","FullColName3", "starPartString1()star","starPartString2()star")] </code></pre> But I'm getting an error message: undefined columns selected. I can't figure out if or how I need grep to make this work.

You mentioned you may be looking for symbols, so for this particular example we can use <code>[[:punct:]]</code> as our regular expression. This will find all the strings with punctuation symbols in the column names. <pre class="prettyprint"><code>d <- data.frame(1:3, 3:1, 11:13, 13:11, rep(1, 3)) names(d) <- c("FullColName1", "FullColName2", "FullColName3", "PartString1()","PartString2()") d[grepl("[[:punct:]]", names(d))] # PartString1() PartString2() # 1 13 1 # 2 12 1 # 3 11 1 </code></pre> This last part just illustrates another way to do this with other string processing functions from <code>stringr</code> <pre class="prettyprint"><code>library(stringr) d[str_detect(names(d), "[[:punct:]]")] # PartString1() PartString2() # 1 13 1 # 2 12 1 # 3 11 1 </code></pre> ADD per OPs comment <pre class="prettyprint"><code>d[grepl("ring[12()]", names(d))] </code></pre> to get either of the substrings <code>ring1()</code> or <code>ring2()</code> from the names vector

Subset data based on partial match of column names

Tags:

r

subset

I need to subset a df to include certain strings. Some of these are full column names, and the following works fine:

testData[,c("FullColName1","FullColName2","FullColName3")]

My problem is that I need to expand this to also include column names that contain specific strings that may partially match to some other column names. These strings include letters and symbols:

"PartString1()","PartString2()"

I tried putting wildcards around these. (I've indicated this below with the prefix "star" because the "*" symbol didn't render correctly.)

testData[ ,c("FullColName1","FullColName2","FullColName3",
             "starPartString1()star","starPartString2()star")]

But I'm getting an error message: undefined columns selected. I can't figure out if or how I need grep to make this work.

716

asked Jun 12 '14 04:06

user3614783

2 Answers

You mentioned you may be looking for symbols, so for this particular example we can use [[:punct:]] as our regular expression. This will find all the strings with punctuation symbols in the column names.

d <- data.frame(1:3, 3:1, 11:13, 13:11, rep(1, 3))
names(d) <- c("FullColName1", "FullColName2", "FullColName3",
              "PartString1()","PartString2()")

d[grepl("[[:punct:]]", names(d))]
#   PartString1() PartString2()
# 1            13             1
# 2            12             1
# 3            11             1

This last part just illustrates another way to do this with other string processing functions from stringr

library(stringr)
d[str_detect(names(d), "[[:punct:]]")]
#   PartString1() PartString2()
# 1            13             1
# 2            12             1
# 3            11             1

ADD per OPs comment

d[grepl("ring[12()]", names(d))]

to get either of the substrings ring1() or ring2() from the names vector

answered Oct 06 '22 11:10

Rich Scriven

You can use grep to find indices of column names with partial match to a particular pattern

require(PerformanceAnalytics)
data(managers)

colnames(managers)
#[1] "HAM1"        "HAM2"        "HAM3"        "HAM4"        "HAM5"       
#[6] "HAM6"        "EDHEC LS EQ" "SP500 TR"    "US 10Y TR"   "US 3m TR"

suppose the pattern you want to match is "HAM", along with some fixed column names ("SP500 TR" "US 10Y TR" "US 3m TR")

head(managers[,c("SP500 TR","US 10Y TR","US 3m TR",colnames(managers)[grep("HAM",colnames(managers))])])
#           SP500 TR US 10Y TR US 3m TR    HAM1 HAM2    HAM3    HAM4 HAM5 HAM6
#1996-01-31   0.0340   0.00380  0.00456  0.0074   NA  0.0349  0.0222   NA   NA
#1996-02-29   0.0093  -0.03532  0.00398  0.0193   NA  0.0351  0.0195   NA   NA
#1996-03-31   0.0096  -0.01057  0.00371  0.0155   NA  0.0258 -0.0098   NA   NA
#1996-04-30   0.0147  -0.01739  0.00428 -0.0091   NA  0.0449  0.0236   NA   NA
#1996-05-31   0.0258  -0.00543  0.00443  0.0076   NA  0.0353  0.0028   NA   NA
#1996-06-30   0.0038   0.01507  0.00412 -0.0039   NA -0.0303 -0.0019   NA   NA

you can specify multiple patterns using, grep("pattern1 | pattern2 ", colnames(data))

answered Oct 06 '22 11:10

Silence Dogood

Related questions
                            
                                Recommendations for developing Sweave documents
                            
                                How do I generate a document (.rtf, .doc, .odt) from R
                            
                                Dealing with very small numbers in R
                            
                                STL decomposition of time series with missing values for anomaly detection
                            
                                Real part of complex number?
                            
                                Enter passwords interactively in R or R Studio (Server)?
                            
                                Error opening SHP file in R using maptools readShapePoly
                            
                                R selecting all rows from a data frame that don't appear in another
                            
                                How to use 'hclust' as function call in R
                            
                                Suggestions needed for building R server REST API's that I can call from external app?
                            
                                Control number formatting in Shiny's implementation of DataTable
                            
                                R knitr PDF problems with \includegraphics
                            
                                Get ggplot2 legend to display percentage sign in r
                            
                                Conditional dataframe mutations in R with magrittr and dplyr
                            
                                Installing package - cannot open file - permission denied
                            
                                Subset string by counting specific characters
                            
                                plotting two vectors of data on a GGPLOT2 scatter plot using R
                            
                                using predict with a list of lm() objects
                            
                                Programmatically creating a data frame and adding rows to it
                            
                                Arrangement of large number of plots and connect with lines in r

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With