Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert column types in R tidyverse

Tags:

r

tidyverse

I'm trying to get comfortable with using the Tidyverse, but data type conversions are proving to be a barrier. I understand that automatically converting strings to factors is not ideal, but sometimes I would like to use factors, so some approach to easily converting desired character columns in a tibble to factors would be excellent. I prefer to read in excel files with the readxl package, but factors aren't a permitted column type! I can go through column by column after the fact, but that's really not efficient. I want either of these two following things to work:

  1. Read in a file and simultaneously specify which columns should be read as factors:

     data <- read_excel(path = "myfile.xlsx", 
                        col_types=c(col2="factor", col5="factor)))
    
  2. Or this function would be excellent for many reasons, but I can't figure out how it's supposed to work. The col_types function is very confusing to me:

     diamonds <- col_types(diamonds, 
                           cols=c(cut="factor", color="factor", clarity="factor"))
    

Thanks in advance!

like image 213
tef2128 Avatar asked Apr 19 '18 17:04

tef2128


People also ask

Can read_Excel be used to guess column types in R?

This will do nicely. read_excel uses Excel cell types to guess column types for use in R. I also agree with the opinion of read_excel that one should read the data and allow a limited set of column types. Then if the user wishes, type conversion can take place later.

How can I recycle a single type in tidyverse?

The col_types argument is more flexible than you might think; you can mix actual types in with "skip" and "guess" and a single type will be recycled to the necessary length. Here are different ways this might look: If you use other packages in the tidyverse, you are probably familiar with readr, which reads data from flat files.

How to convert a character to a numeric in R?

For this task, we can use the following R code: data$x1 <- as.numeric(as.character( data$x1)) # Convert one variable to numeric. data$x1 <- as.numeric (as.character (data$x1)) # Convert one variable to numeric. Note: The previous code converts our factor variable to character first and then it converts the character to numeric.

What is the best R package to use for data transformation?

The R package that we will use here is tidyverse. The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures. Functions from dplyr & tidyr packages of tidyverse mostly do the work of data transformation.


Video Answer


1 Answers

read_excel uses Excel cell types to guess column types for use in R. I also agree with the opinion of read_excel that one should read the data and allow a limited set of column types. Then if the user wishes, type conversion can take place later.

There is no function called col_types. That is a parameter name for read_excel. The tidyverse way would be:

library(tidyverse)

(foo <- data_frame(x = letters[1:3], y = LETTERS[4:6], z=1:3))
#> # A tibble: 3 x 3
#>   x     y         z
#>   <chr> <chr> <int>
#> 1 a     D         1
#> 2 b     E         2
#> 3 c     F         3

foo %>% 
  mutate_at(vars(x, y), factor)
#> # A tibble: 3 x 3
#>   x     y         z
#>   <fct> <fct> <int>
#> 1 a     D         1
#> 2 b     E         2
#> 3 c     F         3
like image 146
ngm Avatar answered Nov 04 '22 22:11

ngm