I'm running into difficulties reshaping a large dataframe. And I've been relatively fortunate in avoiding reshaping problems in the past, which also means I'm terrible at it. My current dataframe looks something like this: <pre class="prettyprint"><code>unique_id seq response detailed.name treatment a N1 123.23 descr. of N1 T1 a N2 231.12 descr. of N2 T1 a N3 231.23 descr. of N3 T1 ... b N1 343.23 descr. of N1 T2 b N2 281.13 descr. of N2 T2 b N3 901.23 descr. of N3 T2 ... </code></pre> And I'd like: <pre class="prettyprint"><code>seq detailed.name T1 T2 N1 descr. of N1 123.23 343.23 N2 descr. of N2 231.12 281.13 N3 descr. of N3 231.23 901.23 </code></pre> I've looked into the reshape package, but I'm not sure how I can convert the treatment factors into individual column names. Thanks! Edit: I tried running this on my local machine (4GB dual-core iMac 3.06Ghz) and it keeps failing with: <pre class="prettyprint"><code>> d.tmp.2 <- cast(d.tmp, `SEQ_ID` + `GENE_INFO` ~ treatments) Aggregation requires fun.aggregate: length used as default R(5751) malloc: *** mmap(size=647168) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug </code></pre> I'll try running this on one of our bigger machines when I get a chance.

reshape always seems tricky to me too, but it always seems to work with a little trial and error. Here's what I ended up finding: <pre class="prettyprint"><code>> x unique_id seq response detailed.name treatment 1 a N1 123.23 dN1 T1 2 a N2 231.12 dN2 T1 3 a N3 231.23 dN3 T1 4 b N1 343.23 dN1 T2 5 b N2 281.13 dN2 T2 6 b N3 901.23 dN3 T2 > x2 <- melt(x, c("seq", "detailed.name", "treatment"), "response") > x2 seq detailed.name treatment variable value 1 N1 dN1 T1 response 123.23 2 N2 dN2 T1 response 231.12 3 N3 dN3 T1 response 231.23 4 N1 dN1 T2 response 343.23 5 N2 dN2 T2 response 281.13 6 N3 dN3 T2 response 901.23 > cast(x2, seq + detailed.name ~ treatment) seq detailed.name T1 T2 1 N1 dN1 123.23 343.23 2 N2 dN2 231.12 281.13 3 N3 dN3 231.23 901.23 </code></pre> Your original data was already in long format, but not in the long format that melt/cast uses. So I re-melted it. The second argument (id.vars) is list of things not to melt. The third argument (measure.vars) is the list of things that vary. Then, the cast uses a formula. Left of the tilde are the things that stay as they are, and right of the tilde are the columns that are used to condition the value column. More or less...!

Another option would be to use <code>spread</code> from <code>tidyr</code> <pre class="prettyprint"><code>library(tidyr) Wide1 <- spread(x[-1], treatment, response) Wide1 # seq detailed.name T1 T2 #1 N1 dN1 123.23 343.23 #2 N2 dN2 231.12 281.13 #3 N3 dN3 231.23 901.23 </code></pre> The opposite action is performed by <code>gather</code> <pre class="prettyprint"><code>gather(Wide1, detailed.name, response, T1:T2) # seq detailed.name detailed.name response #1 N1 dN1 T1 123.23 #2 N2 dN2 T1 231.12 #3 N3 dN3 T1 231.23 #4 N1 dN1 T2 343.23 #5 N2 dN2 T2 281.13 #6 N3 dN3 T2 901.23 </code></pre> Also, there is <code>dcast.data.table</code> from <code>data.table</code> <pre class="prettyprint"><code>library(data.table) dcast.data.table(setDT(x), seq + detailed.name~treatment, value.var='response') # seq detailed.name T1 T2 #1: N1 dN1 123.23 343.23 #2: N2 dN2 231.12 281.13 #3: N3 dN3 231.23 901.23 </code></pre> <h3>data</h3> <pre class="prettyprint"><code>x <- structure(list(unique_id = structure(c(1L, 1L, 1L, 2L, 2L, 2L ), .Label = c("a", "b"), class = "factor"), seq = structure(c(1L, 2L, 3L, 1L, 2L, 3L), .Label = c("N1", "N2", "N3"), class = "factor"), response = c(123.23, 231.12, 231.23, 343.23, 281.13, 901.23 ), detailed.name = structure(c(1L, 2L, 3L, 1L, 2L, 3L), .Label = c("dN1", "dN2", "dN3"), class = "factor"), treatment = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("T1", "T2"), class = "factor")), .Names = c("unique_id", "seq", "response", "detailed.name", "treatment"), class = "data.frame", row.names = c(NA, -6L)) </code></pre>

Reshaping data frame in R [duplicate]

Tags:

dataframe

r

reshape

I'm running into difficulties reshaping a large dataframe. And I've been relatively fortunate in avoiding reshaping problems in the past, which also means I'm terrible at it.

My current dataframe looks something like this:

unique_id    seq   response    detailed.name    treatment 
a            N1     123.23     descr. of N1     T1
a            N2     231.12     descr. of N2     T1
a            N3     231.23     descr. of N3     T1
...
b            N1     343.23     descr. of N1     T2
b            N2     281.13     descr. of N2     T2
b            N3     901.23     descr. of N3     T2
...

And I'd like:

seq    detailed.name   T1           T2
N1     descr. of N1    123.23       343.23
N2     descr. of N2    231.12       281.13
N3     descr. of N3    231.23       901.23

I've looked into the reshape package, but I'm not sure how I can convert the treatment factors into individual column names.

Thanks!

Edit: I tried running this on my local machine (4GB dual-core iMac 3.06Ghz) and it keeps failing with:

> d.tmp.2 <- cast(d.tmp, `SEQ_ID` + `GENE_INFO` ~ treatments)
Aggregation requires fun.aggregate: length used as default
R(5751) malloc: *** mmap(size=647168) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

I'll try running this on one of our bigger machines when I get a chance.

918

asked Oct 07 '09 18:10

Vince

3 Answers

reshape always seems tricky to me too, but it always seems to work with a little trial and error. Here's what I ended up finding:

> x
  unique_id seq response detailed.name treatment
1         a  N1   123.23           dN1        T1
2         a  N2   231.12           dN2        T1
3         a  N3   231.23           dN3        T1
4         b  N1   343.23           dN1        T2
5         b  N2   281.13           dN2        T2
6         b  N3   901.23           dN3        T2

> x2 <- melt(x, c("seq", "detailed.name", "treatment"), "response")
> x2
  seq detailed.name treatment variable  value
1  N1           dN1        T1 response 123.23
2  N2           dN2        T1 response 231.12
3  N3           dN3        T1 response 231.23
4  N1           dN1        T2 response 343.23
5  N2           dN2        T2 response 281.13
6  N3           dN3        T2 response 901.23

> cast(x2, seq + detailed.name ~ treatment)
  seq detailed.name     T1     T2
1  N1           dN1 123.23 343.23
2  N2           dN2 231.12 281.13
3  N3           dN3 231.23 901.23

Your original data was already in long format, but not in the long format that melt/cast uses. So I re-melted it. The second argument (id.vars) is list of things not to melt. The third argument (measure.vars) is the list of things that vary.

Then, the cast uses a formula. Left of the tilde are the things that stay as they are, and right of the tilde are the columns that are used to condition the value column.

More or less...!

119

answered Sep 22 '22 18:09

Harlan

Building on Harlan's answer - the remelting step can be avoided if the data is already in the long format, and the column holding values is specified in the cast call.

> x <- read.table(textConnection("  unique_id seq response detailed.name treatment
+ 1         a  N1   123.23           dN1        T1
+ 2         a  N2   231.12           dN2        T1
+ 3         a  N3   231.23           dN3        T1
+ 4         b  N1   343.23           dN1        T2
+ 5         b  N2   281.13           dN2        T2
+ 6         b  N3   901.23           dN3        T2"))
> 
> cast(x, seq + detailed.name ~ treatment, value = "response")
  seq detailed.name     T1     T2
1  N1           dN1 123.23 343.23
2  N2           dN2 231.12 281.13
3  N3           dN3 231.23 901.23

answered Sep 21 '22 18:09

learnr

Another option would be to use spread from tidyr

library(tidyr) 
Wide1 <- spread(x[-1], treatment, response)
Wide1
#  seq detailed.name     T1     T2
#1  N1           dN1 123.23 343.23
#2  N2           dN2 231.12 281.13
#3  N3           dN3 231.23 901.23

The opposite action is performed by gather

gather(Wide1, detailed.name, response, T1:T2)
#  seq detailed.name detailed.name response
#1  N1           dN1            T1   123.23
#2  N2           dN2            T1   231.12
#3  N3           dN3            T1   231.23
#4  N1           dN1            T2   343.23
#5  N2           dN2            T2   281.13
#6  N3           dN3            T2   901.23

Also, there is dcast.data.table from data.table

library(data.table)
dcast.data.table(setDT(x), seq + detailed.name~treatment,
                                          value.var='response')
#   seq detailed.name     T1     T2
#1:  N1           dN1 123.23 343.23
#2:  N2           dN2 231.12 281.13
#3:  N3           dN3 231.23 901.23

data

x <- structure(list(unique_id = structure(c(1L, 1L, 1L, 2L, 2L, 2L
), .Label = c("a", "b"), class = "factor"), seq = structure(c(1L, 
2L, 3L, 1L, 2L, 3L), .Label = c("N1", "N2", "N3"), class = "factor"), 
response = c(123.23, 231.12, 231.23, 343.23, 281.13, 901.23
), detailed.name = structure(c(1L, 2L, 3L, 1L, 2L, 3L), .Label = c("dN1", 
"dN2", "dN3"), class = "factor"), treatment = structure(c(1L, 
1L, 1L, 2L, 2L, 2L), .Label = c("T1", "T2"), class = "factor")), .Names =
c("unique_id", "seq", "response", "detailed.name", "treatment"), class = 
"data.frame", row.names = c(NA, -6L))

answered Sep 23 '22 18:09

akrun

Related questions
                            
                                Adding a color legend to an image
                            
                                Replace values in data frame with other values according to a rule
                            
                                Add a transparent window/keyhole ggplot2 (grid)
                            
                                Using R to download zipped data file, extract, and import .csv
                            
                                .onLoad failed in loadNamespace() for 'rJava' when installing a package
                            
                                Create SpatialPointsDataframe
                            
                                Passing data within Shiny Modules from Module 1 to Module 2
                            
                                Decrease overal legend size (elements and text)
                            
                                Getting the state of variables after an error occurs in R
                            
                                An NA in subsetting a data.frame does something unexpected
                            
                                Intersection of lists in R
                            
                                generate markdown comments within for loop
                            
                                R function to return the license of a package?
                            
                                counting occurrences in data.frame in r
                            
                                Convert time from numeric to time format in R
                            
                                Constructing a named list without having to type each object's name twice [duplicate]
                            
                                How can I calculate the percentage change within a group for multiple columns in R?
                            
                                Removing elements from pandas series in python
                            
                                How to cite multiple papers in RMarkdown
                            
                                Rmarkdown setting the position of kable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With