Here is my table that needs to be in wide format:
V1 V2 V3 V4
1 A0 numeric string
1 A1 . .
1 A2 . .
1 A3 . .
1 A4 . .
1 A5 . .
1 A6 . .
1 A7 . .
2 A0 . .
2 A1 . .
... ... . .
I've been trying with something like this:
reshape(variable.name, timevar = "V2", idvar = "V1", direction = "wide")
Which has resulted in the following, which seems to be what I want:
V1 V3.A0 V4.A0 V3.A1 ...
1 Numeric String Numeric ...
2 ... ... ... ...
But I get a warning message:
Warning message:
In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying, :
multiple rows match for V2 = blah: first taken
Why is this warning happening and how can I circumvent it? I don't want to just ignore it as I'll have to do the same for several data files. Thanks! Very grateful for the help.
As a few people have pointed out, you need to decide what you want to do with the extra value. dcast
allows you to specify an aggregation function, and is essentially the same as reshape
with direction wide but with the ability to specify what to do when you have multiple values. Here is an example where basically every combination has repetitions, and we show the full vector for each as a deparsed string (e.g. 1:2 shows up as c(1, 2)).
library(reshape2)
# Make up data
df <- data.frame(
V1=rep(1:3, 14),
V2=rep(paste0("A", 0:6), 6),
V3=sample(1:100, 42),
V4=paste0(sample(letters, 42, replace=TRUE), sample(letters, 42, replace=TRUE))
)
# Need to melt V3 and V4 together first because
# dcast does not allow multiple value variables,
# unfortunately, this allso coerces V1 to character
df.melt <- melt(df, id.vars=c("V1", "V2"))
# Function to handle multiple items for one V1 - V2
# pair. In this case we just deparse the vectors,
# but if you wanted, you could convert the numerics
# back to integers, or do whatever you want (e.g.
# paste if character, median if numeric).
my_func <- function(x) {
paste0(deparse(x), collapse="")
}
# Now convert to wide format with dcast
dcast(
df.melt,
V1 ~ V2 + variable,
value.var="value",
fun.aggregate=my_func
)
This results in the following:
V1 A0_V3 A0_V4 A1_V3 A1_V4
1 1 c("86", "93") c("yf", "pr") c("5", "76") c("py", "aj")
2 2 c("53", "71") c("as", "mi") c("42", "12") c("ho", "la")
3 3 c("69", "16") c("lm", "un") c("66", "100") c("xk", "px")
A2_V3 A2_V4 A3_V3 A3_V4 A4_V3
1 c("43", "67") c("xh", "bk") c("79", "94") c("ix", "cx") c("51", "50")
2 c("14", "68") c("nq", "sr") c("25", "19") c("dw", "ay") c("28", "35")
3 c("21", "24") c("wu", "il") c("39", "88") c("vz", "yw") c("74", "65")
A4_V4 A5_V3 A5_V4 A6_V3 A6_V4
1 c("hv", "uw") c("85", "34") c("cn", "ql") c("73", "87") c("px", "vy")
2 c("qb", "dc") c("2", "72") c("ci", "du") c("81", "49") c("sd", "rx")
3 c("jk", "fv") c("6", "90") c("sr", "yr") c("62", "97") c("rg", "dv")
The perfect solution would be a combination of reshape
and dcast
. Unfortunately dcast
(AFAIK) doesn't allow multiple Z columns, whereas reshape
does (hence necessitating the melt
step and the coersion to character), whereas reshape
does not allow an aggregation function (AFAIK).
You can potentially resolve this by running dcast
twice, once with V3
, once with V4
, and then merge the results, or add more intelligence in the aggregation function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With