Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Name variables during multiple melt with data.table

Tags:

r

data.table

melt

I have the following data.table that I want to perform a multiple column melt upon:

# Load library
library(data.table)

# Dummy data set
dt <- data.table(a = 1:10,
                 t_alpha = runif(10),
                 t_beta = runif(10),
                 t_gamma = runif(10),
                 s_alpha = runif(10),
                 s_beta = runif(10),
                 s_gamma = runif(10))

# Look at data table
dt
#>      a    t_alpha    t_beta    t_gamma    s_alpha     s_beta   s_gamma
#>  1:  1 0.73315322 0.7777962 0.13258401 0.12841353 0.20865995 0.8967482
#>  2:  2 0.10161042 0.9691244 0.23506277 0.35130324 0.36672933 0.9247026
#>  3:  3 0.95234142 0.7002526 0.07104051 0.85558314 0.39467530 0.8982107
#>  4:  4 0.48967202 0.3951288 0.72376459 0.58978334 0.47201394 0.8508710
#>  5:  5 0.22997537 0.3487354 0.58831312 0.63917860 0.12685815 0.5616682
#>  6:  6 0.01138277 0.0449675 0.79384717 0.28983485 0.98793013 0.6574457
#>  7:  7 0.32324941 0.3446745 0.66394161 0.16357679 0.14828451 0.2019661
#>  8:  8 0.78644407 0.4460397 0.76992857 0.78703793 0.46623055 0.3081812
#>  9:  9 0.90938120 0.4526063 0.47948163 0.03182248 0.63600496 0.3598600
#> 10: 10 0.11345349 0.1732653 0.59240218 0.95541944 0.06640378 0.9475107

So, I go ahead and melt, like so:

# Groups of columns to melt
m <- list(names(dt)[grepl("t_", names(dt))],
          names(dt)[grepl("s_", names(dt))])

# Perform melt
dt2 <- melt(dt, measure = m, value.name = c("t", "s"))

which gives,

# Examine results
dt2
#>      a variable          t          s
#>  1:  1        1 0.73315322 0.12841353
#>  2:  2        1 0.10161042 0.35130324
#>  3:  3        1 0.95234142 0.85558314
#>  4:  4        1 0.48967202 0.58978334
#>  5:  5        1 0.22997537 0.63917860
#>  6:  6        1 0.01138277 0.28983485
#>  7:  7        1 0.32324941 0.16357679
#>  8:  8        1 0.78644407 0.78703793
#>  9:  9        1 0.90938120 0.03182248
#> 10: 10        1 0.11345349 0.95541944
#> 11:  1        2 0.77779618 0.20865995
#> 12:  2        2 0.96912444 0.36672933
#> 13:  3        2 0.70025264 0.39467530
#> 14:  4        2 0.39512877 0.47201394
#> 15:  5        2 0.34873538 0.12685815
#> 16:  6        2 0.04496750 0.98793013
#> 17:  7        2 0.34467455 0.14828451
#> 18:  8        2 0.44603969 0.46623055
#> 19:  9        2 0.45260634 0.63600496
#> 20: 10        2 0.17326532 0.06640378
#> 21:  1        3 0.13258401 0.89674825
#> 22:  2        3 0.23506277 0.92470265
#> 23:  3        3 0.07104051 0.89821074
#> 24:  4        3 0.72376459 0.85087097
#> 25:  5        3 0.58831312 0.56166816
#> 26:  6        3 0.79384717 0.65744574
#> 27:  7        3 0.66394161 0.20196612
#> 28:  8        3 0.76992857 0.30818121
#> 29:  9        3 0.47948163 0.35986003
#> 30: 10        3 0.59240218 0.94751075
#>      a variable          t          s

That's great!

Now, notice an extra column, variable. That corresponds to each of my columns for t and s. Specifically, 1 = alpha, 2 = beta, and 3 = gamma. I can fix this variable as follows:

# Fix variable names
v_names <- c("alpha", "beta", "gamma")
dt2$variable <- v_names[dt2$variable]

producing,

# Examine results
dt2
#>      a variable          t          s
#>  1:  1    alpha 0.73315322 0.12841353
#>  2:  2    alpha 0.10161042 0.35130324
#>  3:  3    alpha 0.95234142 0.85558314
#>  4:  4    alpha 0.48967202 0.58978334
#>  5:  5    alpha 0.22997537 0.63917860
#>  6:  6    alpha 0.01138277 0.28983485
#>  7:  7    alpha 0.32324941 0.16357679
#>  8:  8    alpha 0.78644407 0.78703793
#>  9:  9    alpha 0.90938120 0.03182248
#> 10: 10    alpha 0.11345349 0.95541944
#> 11:  1     beta 0.77779618 0.20865995
#> 12:  2     beta 0.96912444 0.36672933
#> 13:  3     beta 0.70025264 0.39467530
#> 14:  4     beta 0.39512877 0.47201394
#> 15:  5     beta 0.34873538 0.12685815
#> 16:  6     beta 0.04496750 0.98793013
#> 17:  7     beta 0.34467455 0.14828451
#> 18:  8     beta 0.44603969 0.46623055
#> 19:  9     beta 0.45260634 0.63600496
#> 20: 10     beta 0.17326532 0.06640378
#> 21:  1    gamma 0.13258401 0.89674825
#> 22:  2    gamma 0.23506277 0.92470265
#> 23:  3    gamma 0.07104051 0.89821074
#> 24:  4    gamma 0.72376459 0.85087097
#> 25:  5    gamma 0.58831312 0.56166816
#> 26:  6    gamma 0.79384717 0.65744574
#> 27:  7    gamma 0.66394161 0.20196612
#> 28:  8    gamma 0.76992857 0.30818121
#> 29:  9    gamma 0.47948163 0.35986003
#> 30: 10    gamma 0.59240218 0.94751075
#>      a variable          t          s

Created on 2019-08-09 by the reprex package (v0.3.0)

This is my desired result. My question is, can I perform this operation (i.e., renaming of the values in the variable column) during the melt or do I just have to patch it up afterwards?

like image 617
Lyngbakr Avatar asked Aug 09 '19 18:08

Lyngbakr


People also ask

What are the variables in a data table?

DataTable variables represent a type of variable that can store big pieces of information, and act as a database or a simple spreadsheet with rows and columns. They can be found in the Browse and Select a . Net Type window, under the System. Data namespace (System.

What is Dcast function in R?

dcast: Convert data between wide and long forms.

How do you melt data in R?

Melting in R It is performed using melt() function which takes dataset and column values that has to be kept constant. Using melt(), dataframe is converted into long format and stretches the data frame.

What is Reshape2 package in R?

Reshape2 is a package that allows us to easily transform our data into whatever structure we may need. Many of us are used to seeing our data structured so that corresponds to a single participant and each column corresponds to a variable. This type of data structure is known as wide format.


1 Answers

Here is an option in pivot_longer from the dev version of tidyr

library(tidyr)# ‘0.8.3.9000’
library(dplyr)
dt %>% 
   pivot_longer(-a, names_to= c(".value", "variable"), names_sep = "_")
# A tibble: 30 x 4
#       a variable      t       s
#   <int> <chr>     <dbl>   <dbl>
# 1     1 alpha    0.592  0.0634 
# 2     1 beta     0.316  0.339  
# 3     1 gamma    0.760  0.751  
# 4     2 alpha    0.936  0.445  
# 5     2 beta     0.232  0.00250
# 6     2 gamma    0.0111 0.406  
# 7     3 alpha    0.0950 0.102  
# 8     3 beta     0.426  0.775  
# 9     3 gamma    0.321  0.811  
#10     4 alpha    0.884  0.0526 
# … with 20 more rows

NOTE: The values are different as there was no set.seed

like image 86
akrun Avatar answered Oct 19 '22 10:10

akrun