how can I mutate in dplyr without losing order?

Tags:

Using data.table I can do the following:

library(data.table)
dt = data.table(a = 1:2, b = c(1,2,NA,NA))
#   a  b
#1: 1  1
#2: 2  2
#3: 1 NA
#4: 2 NA

dt[, b := b[1], by = a]
#   a b
#1: 1 1
#2: 2 2
#3: 1 1
#4: 2 2

Attempting the same operation in dplyr however the data gets scrambled/sorted by a:

library(dplyr)
dt = data.table(a = 1:2, b = c(1,2,NA,NA))
dt %.% group_by(a) %.% mutate(b = b[1])
#  a b
#1 1 1
#2 1 1
#3 2 2
#4 2 2

(as an aside the above also sorts the original dt, which is somewhat confusing for me given dplyr's philosophy of not modifying in place - I'm guessing that's a bug with how dplyr interfaces with data.table)

What's the dplyr way of achieving the above?

654

asked Feb 12 '14 00:02

eddi

1 Answers

In the current development version of dplyr (which will eventually become dplyr 0.2) the behaviour differs between data frames and data tables:

library(dplyr)
library(data.table)

df <- data.frame(a = 1:2, b = c(1,2,NA,NA))
dt <- data.table(df)

df %.% group_by(a) %.% mutate(b = b[1])

## Source: local data frame [4 x 2]
## Groups: a
## 
##   a b
## 1 1 1
## 2 2 2
## 3 1 1
## 4 2 2

dt %.% group_by(a) %.% mutate(b = b[1])

## Source: local data table [4 x 2]
## Groups: a
## 
##   a b
## 1 1 1
## 2 1 1
## 3 2 2
## 4 2 2

This happens because group_by() applied to a data.table automatically does setkey() on the assumption that the index will make future operations faster.

If there's a strong feeling that this is a bad default, I'm happy to change it.

154

answered Nov 15 '22 08:11

hadley

Related questions
                            
                                How to properly dput internationalized text?
                            
                                Assigning output of a function to two variables in R [duplicate]
                            
                                Make R use C notation when escaping terminals
                            
                                Vertical white lines when plotting heatmap in TIFF
                            
                                R DOLS (Dynamic Ordinary Least Squares) packages
                            
                                capturing pipe exit status in R
                            
                                Pooling Cox PH results after multiple imputation with the MICE package
                            
                                using multiple size scales in a ggplot
                            
                                LARGE covariance matrix in R
                            
                                R graph degree.distribution not working
                            
                                Segment annotation on log10 scale works differently for the end and the beginning of the segment?
                            
                                How to keep using R version 2.x and download packages automatically with install.packages() by package name?
                            
                                Copy files while preserving original file information (creation time etc.)
                            
                                Translating time stamps (start, end) into time series data. Errors with align.time() and colnames
                            
                                data.table assignment involving factors
                            
                                R: Generic Function to Uncompress Files
                            
                                Two chunks side by side with knitr markdown
                            
                                How to define data.table keys for fastest aggregation using multiple keys
                            
                                Sync and maintain the same installed packages across multiple workstations
                            
                                R - how to react to database inserts/updates/deletes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how can I mutate in dplyr without losing order?

Tags:

r

data.table

dplyr

eddi

People also ask

1 Answers

hadley

Recent Activity

Donate For Us