Is there in pandas data frame an equivalent to using 'by' in R data.table?
for example in R I can do:
DT = data.table(x = c('a', 'a', 'a', 'b', 'b', 'b'), y = rnorm(6))
DT[, z := mean(y[1:2]), by = x]
Is there something similar in pandas?
frame in R is similar to the data table which is used to create tabular data but data table provides a lot more features than the data frame so, generally, all prefer the data. table instead of the data. frame.
While the process takes 16.62 seconds for Pandas, Datatable is only at 6.55 seconds. Overall Datatable is 2 times faster than Pandas.
There are clear points of similarity between both R and Python (pandas Dataframes were inspired by R dataframes, the rvest package was inspired by BeautifulSoup), and both ecosystems continue to grow stronger.
If we need to get the similar output as in data.table
where we want to take the first elements of 'y' grouped by 'x' and create a new column 'z', then
mean1 = lambda x: x.head(2).mean()
df['z'] = df['y'].groupby(df['x']).transform(mean1)
print(df)
# x y z
#0 a 1.329212 0.279589
#1 a -0.770033 0.279589
#2 a -0.316280 0.279589
#3 b -0.990810 -1.030813
#4 b -1.070816 -1.030813
#5 b -1.438713 -1.030813
Using the OP's code for data.table
in R
library(data.table)
DT[, z := mean(y[1:2]), by = x]
DT
# x y z
#1: a 1.329212 0.2795895
#2: a -0.770033 0.2795895
#3: a -0.316280 0.2795895
#4: b -0.990810 -1.0308130
#5: b -1.070816 -1.0308130
#6: b -1.438713 -1.0308130
import pandas as pd
import numpy as np
from numpy import random
np.random.seed(seed=24)
df = pd.DataFrame({'x': ['a', 'a', 'a', 'b', 'b', 'b'],
'y': random.randn(6)})
DT <- structure(list(x = c("a", "a", "a", "b", "b", "b"),
y = c(1.329212,
-0.770033, -0.31628, -0.99081, -1.070816, -1.438713)), .Names = c("x",
"y"), class = c("data.table", "data.frame"),
row.names = c(NA, -6L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With