Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply regex over entire dataframe without making all columns to character

Tags:

regex

r

dplyr

I needed to remove "Z" from dataframe:

df <- data.frame(Mineral = c("Zfeldspar", "Zgranite", "ZSilica"),
                Confidence = c("ZLow", "High", "Med"),
                Coverage = c("sub", "sub", "super"),
                Aspect = c("ZPos", "ZUnd", "Neg"),
                Pile1 = c(70, 88, 95),
                Pile2 = c(62,41,81))

I used tidyverse:

library(tidyverse)

df <- mutate_all(df, funs(str_replace_all(., "Z", ""))) %>%
      mutate(PileAvg = mean(Pile1 + Pile2))

But I get error

Error in mutate_impl(.data, dots) : 
  Evaluation error: non-numeric argument to binary operator.

I did investigating and this is because Pile columns are character now, not numbers. How do I use regex to remove "Z" without changing everything? Thank you for you help.

like image 905
rockhound Avatar asked Apr 20 '18 16:04

rockhound


1 Answers

In your df creation, you didn't set stringsAsFactors = FALSE so your character columns will automatically be coerced to factors. If you set this to TRUE or use tibble or data_frame you'll get character colunns.

This is where you'd use mutate_if rather than mutate_all. Here's an approach that will work for both factors and characters, by constructing a predicate function to use in mutate_if.

df <- data.frame(Mineral = c("Zfeldspar", "Zgranite", "ZSilica"),
                 Confidence = c("ZLow", "High", "Med"),
                 Coverage = c("sub", "sub", "super"),
                 Aspect = c("ZPos", "ZUnd", "Neg"),
                 Pile1 = c(70, 88, 95),
                 Pile2 = c(62,41,81))

is_character_factor <- function(x){

  is.character(x)|is.factor(x)

}

mutate_if(df, is_character_factor, funs(str_replace(., "Z", ""))) %>%
  mutate(PileAvg = mean(Pile1 + Pile2))
like image 100
Jake Kaupp Avatar answered Sep 27 '22 22:09

Jake Kaupp