I would like to convert a dataframe from long format to a wide format, but with unequal group sizes.
The eventual use will be in 'qcc', which requires a data frame or a matrix with each row consisting of one group, using NA's in groups which have fewer samples.
The following code will create an example dataset, as well as show manual conversion to the desired format.
# This is an example of the initial data that I have
# * 10 sample measurements, over 3 groups with 3, 2, and 5 elements respectively
x <- rnorm(10)
x_df <- data.frame( time = c( rep('2001 Q1',3), rep('2001 Q2',2), rep('2001 Q3',5) ), measure = x )
x_df
# This is a manual conversion into the desired format
x_pad <- c( x[1:3], NA, NA, x[4:5], NA, NA, NA, x[6:10] )
x_matrix <- matrix( x_pad, nrow = 3, ncol = 5, byrow = TRUE, dimnames = list(c('2001 Q1','2001 Q2','2001 Q3')) )
x_matrix # desired format
# An example of how it will be used
library(qcc)
plot(qcc(x_matrix, type = 'xbar', plot = FALSE))
So, I'd like to convert this:
time measure
1 2001 Q1 0.14680685
2 2001 Q1 0.53593193
3 2001 Q1 0.56097974
4 2001 Q2 -1.48102689
5 2001 Q2 0.18150972
6 2001 Q3 1.72018147
7 2001 Q3 -0.08480855
8 2001 Q3 -2.23208877
9 2001 Q3 -1.15269107
10 2001 Q3 0.57975023
... to this ...
[,1] [,2] [,3] [,4] [,5]
2001 Q1 0.1468068 0.53593193 0.5609797 NA NA
2001 Q2 -1.4810269 0.18150972 NA NA NA
2001 Q3 1.7201815 -0.08480855 -2.2320888 -1.152691 0.5797502
There is probably an easy way (perhaps some usage of reshape or reshape2 casting that I'm not familiar with?), but a bunch of searching hasn't helped me so far.
Thanks for any help!
==========
From one of the solutions below, the following will generate the final qcc xbar plot, including group labels:
library(splitstackshape)
out_df <- dcast( getanID( x_df, 'time' ), time~.id, value.var='measure' )
qcc( out_df[,-1], type = 'xbar', labels = out_df[,1] )
You can create a sequence column ('.id') using getanID
from splitstackshape
and use dcast
from data.table
to convert the long format to wide format. The output of splitstackshape
is a data.table. When we load splitstackshape
, data.table will also be loaded. So, if you already have the devel version of data.table, then the dcast
from data.table
can be used as well.
library(splitstackshape)
dcast(getanID(df1, 'time'), time~.id, value.var='measure')
# time 1 2 3 4 5
#1: 2001 Q1 0.1468068 0.53593193 0.5609797 NA NA
#2: 2001 Q2 -1.4810269 0.18150972 NA NA NA
#3: 2001 Q3 1.7201815 -0.08480855 -2.2320888 -1.152691 0.5797502
As @snoram mentioned in the comments, function rowid
from data.table
makes it easier to use just data.table
alone
library(data.table)
dcast(setDT(df1), time ~ rowid(time), value.var = "measure")
You'll need an intermediate variable that gives a "within-time" id. You can create it and reshape like this
library(tidyr)
library(dplyr)
group_by(X, time) %>%
mutate(seq = 1:n()) %>%
ungroup() %>%
spread(seq, measure)
Another splitstackshape
approach
cSplit(setDT(df)[, toString(measure), by='time'], 'V1', ',')
# time V1_1 V1_2 V1_3 V1_4 V1_5
#1: 2001 Q1 0.1468068 0.53593193 0.5609797 NA NA
#2: 2001 Q2 -1.4810269 0.18150972 NA NA NA
#3: 2001 Q3 1.7201815 -0.08480855 -2.2320888 -1.152691 0.5797502
Or using the devel
version of data.table
a similar approach after pasting together the 'measure' by the grouping column 'time' would be using tstrsplit
to split the 'V1' column generated from toString(measure)
.
setDT(df)[, toString(measure), by ='time'][, c(list(time), tstrsplit(V1, ', '))]
Also, we can add type.convert=TRUE
in tstrsplit
to convert the class
of the split columns. By default it is FALSE
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With