Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas dataframe equivalent to R data.table by

Is there in pandas data frame an equivalent to using 'by' in R data.table?

for example in R I can do:

DT = data.table(x = c('a', 'a', 'a', 'b', 'b', 'b'), y = rnorm(6))
DT[, z := mean(y[1:2]), by = x]

Is there something similar in pandas?

like image 655
rnorthcott Avatar asked Jan 08 '17 07:01

rnorthcott


People also ask

Is a Dataframe the same as a table in R?

frame in R is similar to the data table which is used to create tabular data but data table provides a lot more features than the data frame so, generally, all prefer the data. table instead of the data. frame.

Is Datatable faster than Pandas?

While the process takes 16.62 seconds for Pandas, Datatable is only at 6.55 seconds. Overall Datatable is 2 times faster than Pandas.

Is Pandas inspired by R?

There are clear points of similarity between both R and Python (pandas Dataframes were inspired by R dataframes, the rvest package was inspired by BeautifulSoup), and both ecosystems continue to grow stronger.


1 Answers

If we need to get the similar output as in data.table where we want to take the first elements of 'y' grouped by 'x' and create a new column 'z', then

mean1 = lambda x: x.head(2).mean()
df['z'] = df['y'].groupby(df['x']).transform(mean1)
print(df)
#   x         y         z
#0  a  1.329212  0.279589
#1  a -0.770033  0.279589
#2  a -0.316280  0.279589
#3  b -0.990810 -1.030813
#4  b -1.070816 -1.030813
#5  b -1.438713 -1.030813

Using the OP's code for data.table in R

library(data.table)
DT[, z := mean(y[1:2]), by = x]
DT
#   x         y          z
#1: a  1.329212  0.2795895
#2: a -0.770033  0.2795895
#3: a -0.316280  0.2795895
#4: b -0.990810 -1.0308130
#5: b -1.070816 -1.0308130
#6: b -1.438713 -1.0308130

data

import pandas as pd
import numpy as np
from numpy import random

np.random.seed(seed=24)
df = pd.DataFrame({'x': ['a', 'a', 'a', 'b', 'b', 'b'], 
               'y': random.randn(6)})


DT <- structure(list(x = c("a", "a", "a", "b", "b", "b"),
y = c(1.329212, 
-0.770033, -0.31628, -0.99081, -1.070816, -1.438713)), .Names = c("x", 
"y"), class = c("data.table", "data.frame"), 
  row.names = c(NA, -6L))
like image 178
akrun Avatar answered Sep 29 '22 09:09

akrun