How to ddply() without sorting?

Tags:

I use the following code to summarize my data, grouped by Compound, Replicate and Mass.

summaryDataFrame <- ddply(reviewDataFrame, .(Compound, Replicate, Mass), 
  .fun = calculate_T60_Over_T0_Ratio)

An unfortunate side effect is that the resulting data frame is sorted by those fields. I would like to do this and keep Compound, Replicate and Mass in the same order as in the original data frame. Any ideas? I tried adding a "Sorting" column of sequential integers to the original data, but of course I can't include that in the .variables since I don't want to 'group by' that, and so it is not returned in the summaryDataFrame.

Thanks for the help.

721

asked Aug 29 '11 20:08

James

2 Answers

This came up on the plyr mailing list a while back (raised by @kohske no less) and this is a solution offered by Peter Meilstrup for limited cases:

Click to copy

#Peter's version used a function gensym to
# create the col name, but I couldn't track down
# what package it was in.
keeping.order <- function(data, fn, ...) { 
  col <- ".sortColumn"
  data[,col] <- 1:nrow(data) 
  out <- fn(data, ...) 
  if (!col %in% colnames(out)) stop("Ordering column not preserved by function") 
  out <- out[order(out[,col]),] 
  out[,col] <- NULL 
  out 
} 

#Some sample data 
d <- structure(list(g = c(2L, 2L, 1L, 1L, 2L, 2L), v = c(-1.90127112738315, 
-1.20862680183042, -1.13913266070505, 0.14899803094742, -0.69427656843677, 
0.872558638137971)), .Names = c("g", "v"), row.names = c(NA, 
-6L), class = "data.frame") 

#This one resorts
ddply(d, .(g), mutate, v=scale(v)) #does not preserve order of d 

#This one does not
keeping.order(d, ddply, .(g), mutate, v=scale(v)) #preserves order of d

Please do read the thread for Hadley's notes about why this functionality may not be general enough to roll into ddply, particularly as it probably applies in your case as you are likely returning fewer rows with each piece.

Edited to include a strategy for more general cases

If ddply is outputting something that is sorted in an order you do not like you basically have two options: specify the desired ordering on the splitting variables beforehand using ordered factors, or manually sort the output after the fact.

For instance, consider the following data:

Click to copy

d <- data.frame(x1 = rep(letters[1:3],each = 5), 
                x2 = rep(letters[4:6],5),
                x3 = 1:15,stringsAsFactors = FALSE)

using strings, for now. ddply will sort the output, which in this case will entail the default lexical ordering:

Click to copy

> ddply(d,.(x1,x2),summarise, val = sum(x3))
  x1 x2 val
1  a  d   5
2  a  e   7
3  a  f   3
4  b  d  17
5  b  e   8
6  b  f  15
7  c  d  13
8  c  e  25
9  c  f  27


> ddply(d[sample(1:15,15),],.(x1,x2),summarise, val = sum(x3))
  x1 x2 val
1  a  d   5
2  a  e   7
3  a  f   3
4  b  d  17
5  b  e   8
6  b  f  15
7  c  d  13
8  c  e  25
9  c  f  27

If the resulting data frame isn't ending up in the "right" order, it's probably because you really want some of those variables to be ordered factors. Suppose that we really wanted x1 and x2 ordered like so:

Click to copy

d$x1 <- factor(d$x1, levels = c('b','a','c'),ordered = TRUE)
d$x2 <- factor(d$x2, levels = c('d','f','e'), ordered = TRUE)

Now when we use ddply, the resulting sort will be as we intend:

Click to copy

> ddply(d,.(x1,x2),summarise, val = sum(x3))
  x1 x2 val
1  b  d  17
2  b  f  15
3  b  e   8
4  a  d   5
5  a  f   3
6  a  e   7
7  c  d  13
8  c  f  27
9  c  e  25

The moral of the story here is that if ddply is outputting something in an order you didn't intend, it's a good sign that you should be using ordered factors for the variables you're splitting on.

181

answered Sep 20 '22 02:09

joran

I eventually ended up adding an 'indexing' column to the original data frame. It consisted of two columns pasted with sep="_". Then I made another data frame made of only unique members of the 'indexing' column and a counter 1:length(df). I did my ddply() on the data which returned a sorted data frame. Then to get things back in the original order I did merge() the results data frame and the index data frame (making sure the columns are named the same thing makes this easier). Finally, I did order and removed the extraneous columns.

Not an elegant solution, but one that works.

Thanks for the assist. It got me thinking in the right direction.

answered Sep 21 '22 02:09

James

Related questions
                            
                                ggplot2: Creating themed title, subtitle with cowplot
                            
                                identify consecutively overlapping segments in R
                            
                                String as formula
                            
                                Setting up docker image with R and SQL server drivers
                            
                                Error code 100 fitting exp distribution using fitdist in r
                            
                                Error when trying to write DataFrame to feather. Does feather support list columns?
                            
                                R regex - extract words beginning with @ symbol
                            
                                Filtering a vector on condition
                            
                                A Regex to remove digits except for words starting with #
                            
                                How to pipe SQL into R's dplyr?
                            
                                Triple exclamation marks on R
                            
                                How to summarize the top n values across multiple columns row wise?
                            
                                Plotting predefined density functions using ggplot and R
                            
                                How do I highlight an observation's bin in a histogram in R
                            
                                How do I Sweave a multiple-file project?
                            
                                IDE / setup for package development with C++ code integrated
                            
                                How to supply file names with paths to R's read.table function?
                            
                                In R what are the common cases of this error: "Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'character'"
                            
                                histogram without vertical lines
                            
                                Read table with separator = k white space with k variable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to ddply() without sorting?

Tags:

sorting

r

plyr

James

People also ask

2 Answers

joran

James

Recent Activity

Donate For Us