Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to subset dataframe on lowercase values in multiple columns

Tags:

r

subset

I have a dataframe like this:

set.seed(12)
df <- data.frame(
  v1 = sample(LETTERS, 10),
  v2 = sample(LETTERS, 10),
  v3 = sample(LETTERS, 10),
  v4 = c(sample(LETTERS, 8), sample(letters, 2)),
  v5 = c(sample(letters, 1), sample(LETTERS, 7), sample(letters, 2))
    )
df
   v1 v2 v3 v4 v5
1   B  K  F  G  p
2   U  U  T  W  N
3   W  J  C  V  Y
4   G  I  Q  S  E
5   D  F  E  N  T
6   A  X  Z  T  C
7   V  Y  K  X  I
8   M  Z  D  Q  A
9   Y  L  H  k  d
10  R  B  L  j  t

I want to subset dfon those rows that contain a lowercase value in any of df's columns. It can be done like this:

df1 <- df[grepl("[a-z]", df$v1) | grepl("[a-z]", df$v2) | grepl("[a-z]", df$v3) |
          grepl("[a-z]", df$v4) | grepl("[a-z]", df$v5), ]
df1
   v1 v2 v3 v4 v5
1   B  K  F  G  p
9   Y  L  H  k  d
10  R  B  L  j  t

But this is uneconomical, if you have many (more) columns, and error-prone. Is there a cleaner, simpler and more economical way, preferably in base R?

like image 235
Chris Ruehlemann Avatar asked Dec 09 '19 16:12

Chris Ruehlemann


People also ask

How do I lowercase a column in a data frame?

Use str. lower() to make a DataFrame string column lowercase Call df["first_column"]. str. lower() to make all strings in df["first_column"] lowercase.

How do you capitalize column names in pandas?

Convert Column Names to Uppercase using str. where, df is the input dataframe and columns is the attribute to get the column labels as an Index Object. Then using the StringMethods. upper() we converted all labels to uppercase. It converted all the column labels to uppercase.


2 Answers

df[rowSums(sapply(df, function(x) x %in% letters)) > 0,]
#OR
df[apply(df == sapply(df, tolower), 1, any),]
#   v1 v2 v3 v4 v5
#1   B  L  L  M  e
#9   R  N  D  t  t
#10  F  X  M  h  x
like image 180
d.b Avatar answered Nov 15 '22 02:11

d.b


One option is to apply grepl on each column with lapply to create a list of logical vectors and Reduce it with |

df[Reduce(`|`, lapply(df, grepl, pattern = "[a-z]")),]
#   v1 v2 v3 v4 v5
#1   B  L  L  M  e
#9   R  N  D  t  t
#10  F  X  M  h  x

Or using filter_all

library(dplyr)
library(stringr)
df %>% 
    filter_all(any_vars(str_detect(., "[a-z]")))
#  v1 v2 v3 v4 v5
#1  B  L  L  M  e
#2  R  N  D  t  t
#3  F  X  M  h  x
like image 36
akrun Avatar answered Nov 15 '22 02:11

akrun