I've got a pandas dataframe df
. I group it by 3 columns, and count the results. When I do this I lose some information, specifically, the name
column. This column is mapped 1:1 with the desk_id
column. Is there anyway to include both in my final dataframe?
here is the dataframe:
shift_id shift_start_time shift_end_time name end_time desk_id shift_hour
0 37423064 2014-01-17 08:00:00 2014-01-17 12:00:00 Adam Scott 2014-01-17 10:16:41.040000 15557987 2
1 37423064 2014-01-17 08:00:00 2014-01-17 12:00:00 Adam Scott 2014-01-17 10:16:41.096000 15557987 2
2 37423064 2014-01-17 08:00:00 2014-01-17 12:00:00 Adam Scott 2014-01-17 10:52:17.402000 15557987 2
3 37423064 2014-01-17 08:00:00 2014-01-17 12:00:00 Adam Scott 2014-01-17 11:06:59.083000 15557987 3
4 37423064 2014-01-17 08:00:00 2014-01-17 12:00:00 Adam Scott 2014-01-17 08:27:57.998000 15557987 0
I group it like this:
grouped = df.groupby(['desk_id', 'shift_id', 'shift_hour']).size()
grouped = grouped.reset_index()
And here is the result, missing the name
column.
desk_id shift_id shift_hour 0
0 14468690 37729081 0 7
1 14468690 37729081 1 3
2 14468690 37729081 2 6
3 14468690 37729081 3 5
4 14468690 37729082 0 5
Also, anyway to rename the count column as 'count' instead of '0'?
Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe. groupby() function is used to split the data into groups based on some criteria.
Here, we are going to learn how to groupby column values with NaN values, as the groupby method usually excludes the NaN values hence to include NaN values, we use groupby method with some special parameters.
groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
Groupby preserves the order of rows within each group.
You need to include 'name'
in groupby
by groups:
In [43]:
grouped = df.groupby(['desk_id', 'shift_id', 'shift_hour', 'name']).size()
grouped = grouped.reset_index()
grouped.columns=np.where(grouped.columns==0, 'count', grouped.columns) #replace the default 0 to 'count'
print grouped
desk_id shift_id shift_hour name count
0 15557987 37423064 0 Adam Scott 1
1 15557987 37423064 2 Adam Scott 3
2 15557987 37423064 3 Adam Scott 1
If the name-to-id relationship is a many-to-one type, say we have a pete scott for the same set of data, the result will become:
desk_id shift_id shift_hour name count
0 15557987 37423064 0 Adam Scott 1
1 15557987 37423064 0 Pete Scott 1
2 15557987 37423064 2 Adam Scott 3
3 15557987 37423064 2 Pete Scott 3
4 15557987 37423064 3 Adam Scott 1
5 15557987 37423064 3 Pete Scott 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With