I have a dataframe 'df' that looks like this: <pre class="prettyprint"><code>id date1 date2 1 11/1/2016 11/1/2016 1 11/1/2016 11/2/2016 1 11/1/2016 11/1/2016 1 11/1/2016 11/2/2016 1 11/2/2016 11/2/2016 2 11/1/2016 11/1/2016 2 11/1/2016 11/2/2016 2 11/1/2016 11/1/2016 2 11/2/2016 11/2/2016 2 11/2/2016 11/2/2016 </code></pre> What I would like to do is to groupby the id, then get the size for each id where date1=date2. The result should look like: <pre class="prettyprint"><code>id samedate count 1 11/1/2016 2 1 11/2/2016 1 2 11/1/2016 2 2 11/2/2016 2 </code></pre> I have tried this: <pre class="prettyprint"><code>gb=df.groupby(id').apply(lambda x: x[x.date1== x.date2]['date1'].size()) </code></pre> And get this error: <pre class="prettyprint"><code>TypeError: 'int' object is not callable </code></pre> You could certainly flag each instance where the date1 and date2 are equal, then count those flags for each id by each samedate, but I have to believe there is a groupby option for this.

You can use <code>boolean indexing</code> first and then aggregate <code>size</code>: <pre class="prettyprint"><code>df.date1 = pd.to_datetime(df.date1) df.date2 = pd.to_datetime(df.date2) df = df[df.date1 == df.date2] gb=df.groupby(['id', 'date1']).size().reset_index(name='count') print (gb) id date1 count 0 1 2016-11-01 2 1 1 2016-11-02 1 2 2 2016-11-01 2 3 2 2016-11-02 2 </code></pre> Timings: <pre class="prettyprint"><code>In [79]: %timeit (df[df.date1 == df.date2].groupby(['id', 'date1']).size().reset_index(name='count')) 100 loops, best of 3: 3.84 ms per loop In [80]: %timeit (df.groupby(['id', 'date1']).apply(lambda x: (x['date1'] == x['date2']).sum()).reset_index()) 100 loops, best of 3: 7.57 ms per loop </code></pre> Code for timings: <pre class="prettyprint"><code>#len df = 10k df = pd.concat([df]*1000).reset_index(drop=True) #print (df) df.date1 = pd.to_datetime(df.date1) df.date2 = pd.to_datetime(df.date2) </code></pre>

Python Pandas Dataframe GroupBy Size based on condition

Tags:

python

pandas

lambda

size

I have a dataframe 'df' that looks like this:

id  date1   date2
1   11/1/2016   11/1/2016
1   11/1/2016   11/2/2016
1   11/1/2016   11/1/2016
1   11/1/2016   11/2/2016
1   11/2/2016   11/2/2016
2   11/1/2016   11/1/2016
2   11/1/2016   11/2/2016
2   11/1/2016   11/1/2016
2   11/2/2016   11/2/2016
2   11/2/2016   11/2/2016

What I would like to do is to groupby the id, then get the size for each id where date1=date2. The result should look like:

id  samedate    count
1   11/1/2016    2 
1   11/2/2016    1 
2   11/1/2016    2 
2   11/2/2016    2

I have tried this:

gb=df.groupby(id').apply(lambda x: x[x.date1== x.date2]['date1'].size())

And get this error:

TypeError: 'int' object is not callable

You could certainly flag each instance where the date1 and date2 are equal, then count those flags for each id by each samedate, but I have to believe there is a groupby option for this.

472

asked Nov 27 '16 19:11

clg4

1 Answers

You can use boolean indexing first and then aggregate size:

df.date1 = pd.to_datetime(df.date1)
df.date2 = pd.to_datetime(df.date2)

df = df[df.date1 == df.date2]
gb=df.groupby(['id', 'date1']).size().reset_index(name='count')
print (gb)
   id      date1  count
0   1 2016-11-01      2
1   1 2016-11-02      1
2   2 2016-11-01      2
3   2 2016-11-02      2

Timings:

In [79]: %timeit (df[df.date1 == df.date2].groupby(['id', 'date1']).size().reset_index(name='count'))
100 loops, best of 3: 3.84 ms per loop

In [80]: %timeit (df.groupby(['id', 'date1']).apply(lambda x: (x['date1'] == x['date2']).sum()).reset_index())
100 loops, best of 3: 7.57 ms per loop

Code for timings:

#len df = 10k
df = pd.concat([df]*1000).reset_index(drop=True)
#print (df)

df.date1 = pd.to_datetime(df.date1)
df.date2 = pd.to_datetime(df.date2)

196

answered Sep 18 '22 18:09

jezrael

Related questions
                            
                                Feature_importance vector in Decision Trees in SciKit Learn along with feature names
                            
                                Can i use Docker for creating exe using pyinstaller
                            
                                How can I use multiple parameters using pandas pd.read_sql_query?
                            
                                turning pandas to pyspark expression
                            
                                How to convert to "end-of-month"?
                            
                                Labeling duplicates in a list
                            
                                How continue execute program after assertion in python?
                            
                                Numpy array of numpy arrays has 1D shape
                            
                                Baffling inability to auth with requests: NoneType error
                            
                                Congruency Table in Pandas (Pearson Correlation between each row for every row pair)
                            
                                How to break a line in a function definition in Python according to pep8?
                            
                                Python Matplotlib - Smooth plot line for x-axis with date values
                            
                                Is it acceptable to use a stop size larger than length of list when using colon to slice list in Python?
                            
                                python iter over dict-like object
                            
                                'pybot' is not recognized as an internal or external command
                            
                                Removing duplicates in a Python list by id
                            
                                What does the utcoffset method do in datetime - Python
                            
                                PyQt5: mouseClick and source-code in QWebEngineView
                            
                                Artificial tick labels for seaborn heatmaps
                            
                                RobotFramework Keyword variable not setting

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With