Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is it returning a warning when reshaping in R?

Tags:

r

reshape

Here is my table that needs to be in wide format:

V1      V2        V3        V4
1       A0      numeric   string
1       A1         .         .
1       A2         .         .
1       A3         .         .
1       A4         .         .
1       A5         .         .
1       A6         .         .
1       A7         .         .
2       A0         .         .
2       A1         .         .
...     ...        .         .

I've been trying with something like this:

reshape(variable.name, timevar = "V2", idvar = "V1", direction = "wide")

Which has resulted in the following, which seems to be what I want:

V1   V3.A0     V4.A0    V3.A1     ...
1    Numeric   String   Numeric   ...
2    ...       ...      ...       ...

But I get a warning message:

Warning message:
In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying,  :
multiple rows match for V2 = blah: first taken

Why is this warning happening and how can I circumvent it? I don't want to just ignore it as I'll have to do the same for several data files. Thanks! Very grateful for the help.

like image 773
questionmark Avatar asked Mar 22 '23 04:03

questionmark


1 Answers

As a few people have pointed out, you need to decide what you want to do with the extra value. dcast allows you to specify an aggregation function, and is essentially the same as reshape with direction wide but with the ability to specify what to do when you have multiple values. Here is an example where basically every combination has repetitions, and we show the full vector for each as a deparsed string (e.g. 1:2 shows up as c(1, 2)).

library(reshape2)

# Make up data

df <- data.frame(
  V1=rep(1:3, 14), 
  V2=rep(paste0("A", 0:6), 6), 
  V3=sample(1:100, 42), 
  V4=paste0(sample(letters, 42, replace=TRUE), sample(letters, 42, replace=TRUE))  
)    
# Need to melt V3 and V4 together first because
# dcast does not allow multiple value variables,
# unfortunately, this allso coerces V1 to character

df.melt <- melt(df, id.vars=c("V1", "V2")) 

# Function to handle multiple items for one V1 - V2
# pair.  In this case we just deparse the vectors,
# but if you wanted, you could convert the numerics
# back to integers, or do whatever you want (e.g.
# paste if character, median if numeric).

my_func <- function(x) {
  paste0(deparse(x), collapse="")
}
# Now convert to wide format with dcast    

dcast(
  df.melt, 
  V1 ~ V2 + variable,
  value.var="value",
  fun.aggregate=my_func
)

This results in the following:

  V1         A0_V3         A0_V4          A1_V3         A1_V4
1  1 c("86", "93") c("yf", "pr")   c("5", "76") c("py", "aj")
2  2 c("53", "71") c("as", "mi")  c("42", "12") c("ho", "la")
3  3 c("69", "16") c("lm", "un") c("66", "100") c("xk", "px")
          A2_V3         A2_V4         A3_V3         A3_V4         A4_V3
1 c("43", "67") c("xh", "bk") c("79", "94") c("ix", "cx") c("51", "50")
2 c("14", "68") c("nq", "sr") c("25", "19") c("dw", "ay") c("28", "35")
3 c("21", "24") c("wu", "il") c("39", "88") c("vz", "yw") c("74", "65")
          A4_V4         A5_V3         A5_V4         A6_V3         A6_V4
1 c("hv", "uw") c("85", "34") c("cn", "ql") c("73", "87") c("px", "vy")
2 c("qb", "dc")  c("2", "72") c("ci", "du") c("81", "49") c("sd", "rx")
3 c("jk", "fv")  c("6", "90") c("sr", "yr") c("62", "97") c("rg", "dv")    

The perfect solution would be a combination of reshape and dcast. Unfortunately dcast (AFAIK) doesn't allow multiple Z columns, whereas reshape does (hence necessitating the melt step and the coersion to character), whereas reshape does not allow an aggregation function (AFAIK).

You can potentially resolve this by running dcast twice, once with V3, once with V4, and then merge the results, or add more intelligence in the aggregation function.

like image 149
BrodieG Avatar answered Apr 02 '23 11:04

BrodieG