I have a dataframe like this: <pre class="prettyprint"><code>set.seed(12) df <- data.frame( v1 = sample(LETTERS, 10), v2 = sample(LETTERS, 10), v3 = sample(LETTERS, 10), v4 = c(sample(LETTERS, 8), sample(letters, 2)), v5 = c(sample(letters, 1), sample(LETTERS, 7), sample(letters, 2)) ) df v1 v2 v3 v4 v5 1 B K F G p 2 U U T W N 3 W J C V Y 4 G I Q S E 5 D F E N T 6 A X Z T C 7 V Y K X I 8 M Z D Q A 9 Y L H k d 10 R B L j t </code></pre> I want to subset <code>df</code>on those rows that contain a lowercase value in any of <code>df</code>'s columns. It can be done like this: <pre class="prettyprint"><code>df1 <- df[grepl("[a-z]", df$v1) | grepl("[a-z]", df$v2) | grepl("[a-z]", df$v3) | grepl("[a-z]", df$v4) | grepl("[a-z]", df$v5), ] df1 v1 v2 v3 v4 v5 1 B K F G p 9 Y L H k d 10 R B L j t </code></pre> But this is uneconomical, if you have many (more) columns, and error-prone. Is there a cleaner, simpler and more economical way, preferably in base R?

<pre class="prettyprint"><code>df[rowSums(sapply(df, function(x) x %in% letters)) > 0,] #OR df[apply(df == sapply(df, tolower), 1, any),] # v1 v2 v3 v4 v5 #1 B L L M e #9 R N D t t #10 F X M h x </code></pre>

How to subset dataframe on lowercase values in multiple columns

Tags:

r

subset

I have a dataframe like this:

set.seed(12)
df <- data.frame(
  v1 = sample(LETTERS, 10),
  v2 = sample(LETTERS, 10),
  v3 = sample(LETTERS, 10),
  v4 = c(sample(LETTERS, 8), sample(letters, 2)),
  v5 = c(sample(letters, 1), sample(LETTERS, 7), sample(letters, 2))
    )
df
   v1 v2 v3 v4 v5
1   B  K  F  G  p
2   U  U  T  W  N
3   W  J  C  V  Y
4   G  I  Q  S  E
5   D  F  E  N  T
6   A  X  Z  T  C
7   V  Y  K  X  I
8   M  Z  D  Q  A
9   Y  L  H  k  d
10  R  B  L  j  t

I want to subset dfon those rows that contain a lowercase value in any of df's columns. It can be done like this:

df1 <- df[grepl("[a-z]", df$v1) | grepl("[a-z]", df$v2) | grepl("[a-z]", df$v3) |
          grepl("[a-z]", df$v4) | grepl("[a-z]", df$v5), ]
df1
   v1 v2 v3 v4 v5
1   B  K  F  G  p
9   Y  L  H  k  d
10  R  B  L  j  t

But this is uneconomical, if you have many (more) columns, and error-prone. Is there a cleaner, simpler and more economical way, preferably in base R?

235

asked Dec 09 '19 16:12

Chris Ruehlemann

2 Answers

df[rowSums(sapply(df, function(x) x %in% letters)) > 0,]
#OR
df[apply(df == sapply(df, tolower), 1, any),]
#   v1 v2 v3 v4 v5
#1   B  L  L  M  e
#9   R  N  D  t  t
#10  F  X  M  h  x

180

answered Nov 15 '22 02:11

d.b

One option is to apply grepl on each column with lapply to create a list of logical vectors and Reduce it with |

df[Reduce(`|`, lapply(df, grepl, pattern = "[a-z]")),]
#   v1 v2 v3 v4 v5
#1   B  L  L  M  e
#9   R  N  D  t  t
#10  F  X  M  h  x

Or using filter_all

library(dplyr)
library(stringr)
df %>% 
    filter_all(any_vars(str_detect(., "[a-z]")))
#  v1 v2 v3 v4 v5
#1  B  L  L  M  e
#2  R  N  D  t  t
#3  F  X  M  h  x

answered Nov 15 '22 02:11

akrun

Related questions
                            
                                How to generate a sequence that increments alternately
                            
                                Counting new values not occuring earlier and not occuring in last group
                            
                                R: calculate the number of occurrences of a specific event in a specified time future
                            
                                discretizing viridis ggplot color scale
                            
                                In R how do I find whether an integer is divisible by a number?
                            
                                dplyr for rowwise quantiles
                            
                                How to make gap between x and y axis and protruded ticks in ggplot2
                            
                                Highlight a single "bar" in ggplot
                            
                                Pandas assigning random string to each group as new column
                            
                                Write multiple data frames to csv-file using purrr::map [duplicate]
                            
                                how to scrape all files in a catalog series from the national archives (archives.gov) with R
                            
                                Mapping dates to the viridis colour scale in ggplot2
                            
                                Concatenate unique strings after groupby in R
                            
                                How can I change the labels of these buttons in DT::Datatable in R and change collors of rows?
                            
                                When should I use "which" for subsetting?
                            
                                Difference between sort(), rank(), and order() [duplicate]
                            
                                How to replace certain values in a specific rows and columns with NA in R?
                            
                                Calculating sequences based on summary counts
                            
                                How to subset a vector inside list of list
                            
                                Load an RDS file from the web (i.e. a url) directly into R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With