In Pandas, after groupby the grouped column is gone

Tags:

python

pandas

I have the following dataframe named ttm:

    usersidid   clienthostid    eventSumTotal   LoginDaysSum    score
0       12          1               60              3           1728
1       11          1               240             3           1331
3       5           1               5               3           125
4       6           1               16              2           216
2       10          3               270             3           1000
5       8           3               18              2           512

When i do

ttm.groupby(['clienthostid'], as_index=False, sort=False)['LoginDaysSum'].count()

I get what I expected (though I would've wanted the results to be under a new label named 'ratio'):

       clienthostid  LoginDaysSum
0             1          4
1             3          2

But when I do

ttm.groupby(['clienthostid'], as_index=False, sort=False)['LoginDaysSum'].apply(lambda x: x.iloc[0] / x.iloc[1])

I get:

0    1.0
1    1.5

Why did the labels go? I still also need the grouped need the 'clienthostid' and I need also the results of the apply to be under a label too
Sometimes when I do groupby some of the other columns still appear, why is that that sometimes columns disappear and sometime stays? is there a flag I'm missing that do those stuff?
In the example that I gave, when I did count the results showed on label 'LoginDaysSum', is there a why to add a new label for the results instead?

Thank you,

515

asked Jan 15 '17 06:01

O. San

1 Answers

For return DataFrame after groupby are 2 possible solutions:

parameter as_index=False what works nice with count, sum, mean functions
reset_index for create new column from levels of index, more general solution

df = ttm.groupby(['clienthostid'], as_index=False, sort=False)['LoginDaysSum'].count()
print (df)
   clienthostid  LoginDaysSum
0             1             4
1             3             2

df = ttm.groupby(['clienthostid'], sort=False)['LoginDaysSum'].count().reset_index()
print (df)
   clienthostid  LoginDaysSum
0             1             4
1             3             2

For second need remove as_index=False and instead add reset_index:

#output is `Series`
a = ttm.groupby(['clienthostid'], sort=False)['LoginDaysSum'] \
         .apply(lambda x: x.iloc[0] / x.iloc[1])
print (a)
clienthostid
1    1.0
3    1.5
Name: LoginDaysSum, dtype: float64

print (type(a))
<class 'pandas.core.series.Series'>

print (a.index)
Int64Index([1, 3], dtype='int64', name='clienthostid')


df1 = ttm.groupby(['clienthostid'], sort=False)['LoginDaysSum']
         .apply(lambda x: x.iloc[0] / x.iloc[1]).reset_index(name='ratio')
print (df1)
   clienthostid  ratio
0             1    1.0
1             3    1.5

Why some columns are gone?

I think there can be problem automatic exclusion of nuisance columns:

#convert column to str
ttm.usersidid = ttm.usersidid.astype(str) + 'aa'
print (ttm)
  usersidid  clienthostid  eventSumTotal  LoginDaysSum  score
0      12aa             1             60             3   1728
1      11aa             1            240             3   1331
3       5aa             1              5             3    125
4       6aa             1             16             2    216
2      10aa             3            270             3   1000
5       8aa             3             18             2    512

#removed str column userid
a = ttm.groupby(['clienthostid'], sort=False).sum()
print (a)
              eventSumTotal  LoginDaysSum  score
clienthostid                                    
1                       321            11   3400
3                       288             5   1512

What is the difference between size and count in pandas?

172

answered Oct 02 '22 18:10

jezrael

Related questions
                            
                                Run BASH built-in commands in Python?
                            
                                Check if file system is case-insensitive in Python
                            
                                Using Python's max to return two equally large values
                            
                                Python: JSON string to list of dictionaries - Getting error when iterating
                            
                                Get IP Address when testing flask application through nosetests
                            
                                How can I get Python to automatically create missing key/value pairs in a dictionary? [duplicate]
                            
                                Python write string of bytes to file
                            
                                What does "if var" mean in python?
                            
                                What is the Difference between PySphere and PyVmomi?
                            
                                Python property returning property object
                            
                                Convert date to float for linear regression on Pandas data frame
                            
                                pg_config executable not found when using pgxnclient on Windows 7 x64
                            
                                How do I catch errors with scrapy so I can do something when I get User Timeout error?
                            
                                Clean way to get the "true" stem of a Path object?
                            
                                Access last index value of dataframe
                            
                                lambda in python can iterate dict?
                            
                                How to traverse a GenericForeignKey in Django?
                            
                                Understanding == applied to a NumPy array
                            
                                In Flask-migrate ValueError: invalid interpolation syntax in connection string at position 15
                            
                                How to get the coordinates of the maximum in xarray?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With