Using data.table
I can do the following:
library(data.table)
dt = data.table(a = 1:2, b = c(1,2,NA,NA))
# a b
#1: 1 1
#2: 2 2
#3: 1 NA
#4: 2 NA
dt[, b := b[1], by = a]
# a b
#1: 1 1
#2: 2 2
#3: 1 1
#4: 2 2
Attempting the same operation in dplyr
however the data gets scrambled/sorted by a
:
library(dplyr)
dt = data.table(a = 1:2, b = c(1,2,NA,NA))
dt %.% group_by(a) %.% mutate(b = b[1])
# a b
#1 1 1
#2 1 1
#3 2 2
#4 2 2
(as an aside the above also sorts the original dt
, which is somewhat confusing for me given dplyr
's philosophy of not modifying in place - I'm guessing that's a bug with how dplyr
interfaces with data.table
)
What's the dplyr
way of achieving the above?
mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name.
By default, dplyr arrange() function orders in ascending order however, you can change this in R and arrange the dataframe in descending/decreasing order by using desc() function.
In my benchmarking project, Base R sorts a dataset much faster than dplyr or data.
The arrange() function lets you reorder the rows of a tibble. It takes a tibble, followed by the unquoted names of columns. For example, to sort in ascending order of the values of column x , then (where there is a tie in x ) by descending order of values of y , you would write the following.
In the current development version of dplyr (which will eventually become dplyr 0.2) the behaviour differs between data frames and data tables:
library(dplyr)
library(data.table)
df <- data.frame(a = 1:2, b = c(1,2,NA,NA))
dt <- data.table(df)
df %.% group_by(a) %.% mutate(b = b[1])
## Source: local data frame [4 x 2]
## Groups: a
##
## a b
## 1 1 1
## 2 2 2
## 3 1 1
## 4 2 2
dt %.% group_by(a) %.% mutate(b = b[1])
## Source: local data table [4 x 2]
## Groups: a
##
## a b
## 1 1 1
## 2 1 1
## 3 2 2
## 4 2 2
This happens because group_by()
applied to a data.table
automatically does setkey()
on the assumption that the index will make
future operations faster.
If there's a strong feeling that this is a bad default, I'm happy to change it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With