Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr change many data types

Tags:

dataframe

r

dplyr

I have a data.frame:

dat <- data.frame(fac1 = c(1, 2),
                  fac2 = c(4, 5),
                  fac3 = c(7, 8),
                  dbl1 = c('1', '2'),
                  dbl2 = c('4', '5'),
                  dbl3 = c('6', '7')
                  )

To change data types I can use something like

l1 <- c("fac1", "fac2", "fac3")
l2 <- c("dbl1", "dbl2", "dbl3")
dat[, l1] <- lapply(dat[, l1], factor)
dat[, l2] <- lapply(dat[, l2], as.numeric)

with dplyr

dat <- dat %>% mutate(
    fac1 = factor(fac1), fac2 = factor(fac2), fac3 = factor(fac3),
    dbl1 = as.numeric(dbl1), dbl2 = as.numeric(dbl2), dbl3 = as.numeric(dbl3)
)

is there a more elegant (shorter) way in dplyr?

thx Christof

like image 844
ckluss Avatar asked Dec 27 '14 14:12

ckluss


People also ask

How do I change datatype of all columns in R?

To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.

How do you change multiple variables to factor in R?

In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables.

What does mutate () do in R?

In R programming, the mutate function is used to create a new variable from a data set. In order to use the function, we need to install the dplyr package, which is an add-on to R that includes a host of cool functions for selecting, filtering, grouping, and arranging data.

Why is dplyr useful for large datasets?

It’s particularly useful for large datasets because it only prints the first few rows. You can learn more about tibbles at https://tibble.tidyverse.org; in particular you can convert data frames to tibbles with as_tibble (). dplyr aims to provide a function for each basic verb of data manipulation.

How to convert integer to numeric in dplyr?

# We will use a wrapper function as recommended. df1[2:3] = lapply(df1[2:3], FUN = function(y){as.numeric(y)}) # Check that the columns are converted to numeric. str(df1) We can use dplyr ’s mutate () and across () functions to convert integer columns to numeric.

How do you use %>% in dplyr?

Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr. x %>% f (y) turns into f (x, y) so the result from one step is then “piped” into the next step.

How to create a table of summary statistics using dplyr?

wwwwww w Use group_by()to create a "grouped" copy of a table. dplyr functions will manipulate each "group" separately and then combine the results. mtcars %>% group_by(cyl) %>% summarise(avg = mean(mpg)) These apply summary functionsto columns to create a new table of summary statistics.


3 Answers

Edit (as of 2021-03)

As also pointed out in Eric's answer, mutate_[at|if|all] has been superseded by a combination of mutate() and across(). For reference, I will add the respective pendants to the examples in the original answer (see below):

# convert all factor to character
dat %>% mutate(across(where(is.factor), as.character))

# apply function (change encoding) to all character columns 
dat %>% mutate(across(where(is.character), 
               function(x){iconv(x, to = "ASCII//TRANSLIT")}))

# subsitute all NA in numeric columns
dat %>% mutate(across(where(is.numeric), function(x) tidyr::replace_na(x, 0)))

Original answer

Since Nick's answer is deprecated by now and Rafael's comment is really useful, I want to add this as an Answer. If you want to change all factor columns to character use mutate_if:

dat %>% mutate_if(is.factor, as.character)

Also other functions are allowed. I for instance used iconv to change the encoding of all character columns:

dat %>% mutate_if(is.character, function(x){iconv(x, to = "ASCII//TRANSLIT")})

or to substitute all NA by 0 in numeric columns:

dat %>% mutate_if(is.numeric, function(x){ifelse(is.na(x), 0, x)})
like image 143
loki Avatar answered Oct 08 '22 08:10

loki


You can use the standard evaluation version of mutate_each (which is mutate_each_) to change the column classes:

dat %>% mutate_each_(funs(factor), l1) %>% mutate_each_(funs(as.numeric), l2)
like image 59
talat Avatar answered Oct 08 '22 07:10

talat


EDIT - The syntax of this answer has been deprecated, loki's updated answer is more appropriate.

ORIGINAL-

From the bottom of the ?mutate_each (at least in dplyr 0.5) it looks like that function, as in @docendo discimus's answer, will be deprecated and replaced with more flexible alternatives mutate_if, mutate_all, and mutate_at. The one most similar to what @hadley mentions in his comment is probably using mutate_at. Note the order of the arguments is reversed, compared to mutate_each, and vars() uses select() like semantics, which I interpret to mean the ?select_helpers functions.

dat %>% mutate_at(vars(starts_with("fac")),funs(factor)) %>%   
  mutate_at(vars(starts_with("dbl")),funs(as.numeric))

But mutate_at can take column numbers instead of a vars() argument, and after reading through this page, and looking at the alternatives, I ended up using mutate_at but with grep to capture many different kinds of column names at once (unless you always have such obvious column names!)

dat %>% mutate_at(grep("^(fac|fctr|fckr)",colnames(.)),funs(factor)) %>%
  mutate_at(grep("^(dbl|num|qty)",colnames(.)),funs(as.numeric))

I was pretty excited about figuring out mutate_at + grep, because now one line can work on lots of columns.

EDIT - now I see matches() in among the select_helpers, which handles regex, so now I like this.

dat %>% mutate_at(vars(matches("fac|fctr|fckr")),funs(factor)) %>%
  mutate_at(vars(matches("dbl|num|qty")),funs(as.numeric))

Another generally-related comment - if you have all your date columns with matchable names, and consistent formats, this is powerful. In my case, this turns all my YYYYMMDD columns, which were read as numbers, into dates.

  mutate_at(vars(matches("_DT$")),funs(as.Date(as.character(.),format="%Y%m%d")))
like image 50
Rafael Zayas Avatar answered Oct 08 '22 08:10

Rafael Zayas