Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Changing column types with dplyr

Tags:

r

dplyr

I need some help tidying my data. I'm trying to convert some integers to factors (but not all integers to factors). I think I can do with selecting the variables in question but how do I add them back to the original data set? For example, keeping the values NOT selected from my raw_data_tbl and using the mutated types from the raw_data_tbl_int

enter image description here

enter image description here

    library(dplyr)

    raw_data_tbl %>% 
    select_if(is.numeric) %>% 
    select(-c(contains("units"), PRO_ALLOW, RTL_ACTUAL, REAL_PRICE, 
           REAL_PRICE_HHU, REBATE, RETURN_UNITS, UNITS_PER_CASE, Profit, STR_COST, DCC, 
           CREDIT_AMT)) %>% 
    mutate_if(is.numeric, as.factor)
like image 424
willshelley Avatar asked Feb 22 '19 20:02

willshelley


People also ask

How do I change datatype of a column in R?

You can change data types using as. * where * is the datatype to change to, the other way is using class(). class(df$var) = "Numeric".

How do I change multiple column types in R?

Use the lapply() Function to Convert Multiple Columns From Integer to Numeric Type in R. Base R's lapply() function allows us to apply a function to elements of a list.

How do I change data type in R?

convert() function in R Language is used to compute the data type of a particular data object. It can convert data object to logical, integer, numeric, or factor.

How do I rename a column in dplyr in R?

To rename a column in R you can use the rename() function from dplyr. For example, if you want to rename the column “A” to “B”, again, you can run the following code: rename(dataframe, B = A) .

How to add columns to data frame in your using dplyr?

How to Add Columns to Data Frame in R Using dplyr You can use the mutate () function from the dplyr package to add one or more columns to a data frame in R. This function uses the following basic syntax: Method 1: Add Column at End of Data Frame

What is a dplyr function?

This is a second post in a series of dplyr functions. It covers tools to manipulate your columns to get them the way you want them: this can be the calculation of a new column, changing a column into discrete values or splitting/merging columns.

Can dplyr make row-wise calculations?

Thanks Suzan - it was helpful to know I wasn't missing something! Your suggested approach of using arithmetic works. I actually did a bit more digging and found another option, which is to specifically tell dplyr to make row-wise calculations, using rowwise (). Very true, rowwise () would work as well.

How to modify an existing column?

Existing columns can be modified by assigning new values to desired columns. Writing code in comment? Please use ide.geeksforgeeks.org , generate link and share the link here.


3 Answers

As of dplyr 1.0.0 released on CRAN 2020-06-01, the scoped functions mutate_at(), mutate_if() and mutate_all() have been superseded thanks to the more generalizable across(). This means you can stay with just mutate(). The introductory blog post from April explains why it took so long to discover.

Toy example:

library(dplyr)

iris %>%
  mutate(across(c(Sepal.Width, 
                  Sepal.Length),
                factor))

In your case, you'd do this:

library(dplyr)

raw_data_tbl %>% 
  mutate(across(c(is.numeric,
                  -contains("units"),
                  -c(PRO_ALLOW, RTL_ACTUAL, REAL_PRICE, REAL_PRICE_HHU,
                     REBATE, RETURN_UNITS, UNITS_PER_CASE, Profit,
                     STR_COST, DCC, CREDIT_AMT)),
                factor))
like image 200
meedstrom Avatar answered Oct 19 '22 12:10

meedstrom


You can use mutate_at instead. Here's an example using the iris dataframe:

library(dplyr)

iris_factor <- iris %>%
  mutate_at(vars(Sepal.Width, 
                 Sepal.Length), 
            funs(factor))

Edit 08/2020

As of dplyr 0.8.0, funs() is deprecated. Use list() instead, as in

library(dplyr)

iris_factor <- iris %>%
  mutate_at(vars(Sepal.Width, 
                 Sepal.Length), 
            list(factor))

And the proof:

> str(iris_factor)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: Factor w/ 35 levels "4.3","4.4","4.5",..: 9 7 5 4 8 12 4 8 2 7 ...
 $ Sepal.Width : Factor w/ 23 levels "2","2.2","2.3",..: 15 10 12 11 16 19 14 14 9 11 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
like image 27
Dave Gruenewald Avatar answered Oct 19 '22 14:10

Dave Gruenewald


Honestly, I'd do it like this:

df = data.frame("LOC_ID" = c(1,2,3,4),
                "STRS" = c("a","b","c","d"),
                "UPC_CDE" = c(813,814,815,816))

df$LOC_ID = as.factor(df$LOC_ID)
df$UPC_CDE = as.factor(df$UPC_CDE)
like image 45
LetEpsilonBeLessThanZero Avatar answered Oct 19 '22 14:10

LetEpsilonBeLessThanZero