Is there in pandas data frame an equivalent to using 'by' in R data.table? for example in R I can do: <pre class="prettyprint"><code>DT = data.table(x = c('a', 'a', 'a', 'b', 'b', 'b'), y = rnorm(6)) DT[, z := mean(y[1:2]), by = x] </code></pre> Is there something similar in pandas?

If we need to get the similar output as in <code>data.table</code> where we want to take the first elements of 'y' grouped by 'x' and create a new column 'z', then <pre class="prettyprint"><code>mean1 = lambda x: x.head(2).mean() df['z'] = df['y'].groupby(df['x']).transform(mean1) print(df) # x y z #0 a 1.329212 0.279589 #1 a -0.770033 0.279589 #2 a -0.316280 0.279589 #3 b -0.990810 -1.030813 #4 b -1.070816 -1.030813 #5 b -1.438713 -1.030813 </code></pre> Using the OP's code for <code>data.table</code> in <code>R</code> <pre class="prettyprint"><code>library(data.table) DT[, z := mean(y[1:2]), by = x] DT # x y z #1: a 1.329212 0.2795895 #2: a -0.770033 0.2795895 #3: a -0.316280 0.2795895 #4: b -0.990810 -1.0308130 #5: b -1.070816 -1.0308130 #6: b -1.438713 -1.0308130 </code></pre> <h3>data</h3> <pre class="prettyprint"><code>import pandas as pd import numpy as np from numpy import random np.random.seed(seed=24) df = pd.DataFrame({'x': ['a', 'a', 'a', 'b', 'b', 'b'], 'y': random.randn(6)}) DT <- structure(list(x = c("a", "a", "a", "b", "b", "b"), y = c(1.329212, -0.770033, -0.31628, -0.99081, -1.070816, -1.438713)), .Names = c("x", "y"), class = c("data.table", "data.frame"), row.names = c(NA, -6L)) </code></pre>

pandas dataframe equivalent to R data.table by

Tags:

python

pandas

r

data.table

Is there in pandas data frame an equivalent to using 'by' in R data.table?

for example in R I can do:

DT = data.table(x = c('a', 'a', 'a', 'b', 'b', 'b'), y = rnorm(6))
DT[, z := mean(y[1:2]), by = x]

Is there something similar in pandas?

655

asked Jan 08 '17 07:01

rnorthcott

1 Answers

If we need to get the similar output as in data.table where we want to take the first elements of 'y' grouped by 'x' and create a new column 'z', then

mean1 = lambda x: x.head(2).mean()
df['z'] = df['y'].groupby(df['x']).transform(mean1)
print(df)
#   x         y         z
#0  a  1.329212  0.279589
#1  a -0.770033  0.279589
#2  a -0.316280  0.279589
#3  b -0.990810 -1.030813
#4  b -1.070816 -1.030813
#5  b -1.438713 -1.030813

Using the OP's code for data.table in R

library(data.table)
DT[, z := mean(y[1:2]), by = x]
DT
#   x         y          z
#1: a  1.329212  0.2795895
#2: a -0.770033  0.2795895
#3: a -0.316280  0.2795895
#4: b -0.990810 -1.0308130
#5: b -1.070816 -1.0308130
#6: b -1.438713 -1.0308130

data

import pandas as pd
import numpy as np
from numpy import random

np.random.seed(seed=24)
df = pd.DataFrame({'x': ['a', 'a', 'a', 'b', 'b', 'b'], 
               'y': random.randn(6)})


DT <- structure(list(x = c("a", "a", "a", "b", "b", "b"),
y = c(1.329212, 
-0.770033, -0.31628, -0.99081, -1.070816, -1.438713)), .Names = c("x", 
"y"), class = c("data.table", "data.frame"), 
  row.names = c(NA, -6L))

178

answered Sep 29 '22 09:09

akrun

Related questions
                            
                                Python lxml etree.tostring() returns empty string running on mod_wsgi
                            
                                Creating PyPi package - Could not find a version that satisfies the requirement iso8601 [duplicate]
                            
                                How to add edge in mesh using Maya Python API 2.0
                            
                                ConcatOp : Dimensions of inputs should match
                            
                                Spark Dataframes: Skewed Partition after Join
                            
                                Pandas idiomatic way to custom fillna
                            
                                Reshaping Pandas Dataframe with Grouped Data (Long to Wide)
                            
                                Django: Update multiple objects attributes
                            
                                isinstance not working for Decimal in AppEngine
                            
                                Pandas read_csv, reading a boolean with missing values specified as an int
                            
                                Removing text while processing the image
                            
                                uWSGI NOT working with .ini file
                            
                                GridSearch with Keras Neural Networks
                            
                                Why is `NaN` considered "smaller" than `-np.inf` in numpy?
                            
                                How to get native windows path inside msys python?
                            
                                Error in parsing, update multiple columns in 1 line
                            
                                xarray with masked arrays while preserving integer dtypes
                            
                                How to get the number of rows in a Pandas chunk?
                            
                                Efficiently check if an element occurs at least n times in a list
                            
                                why can't I import geopandas?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With