I have a data frame like this
dummy = pd.DataFrame([
('01/09/2020', 'TRUE', 'FALSE'),
('01/09/2020', 'TRUE', 'TRUE'),
('02/09/2020', 'FALSE', 'TRUE'),
('02/09/2020', 'TRUE', 'FALSE'),
('03/09/2020', 'FALSE', 'FALSE'),
('03/09/2020', 'TRUE', 'TRUE'),
('03/09/2020', 'TRUE', 'FALSE')], columns=['date', 'Action1', 'Action2'])
Now I want an aggregation of 'TRUE' action per day, which should look like
I applied group by, sum and count etc but nothing is working for me as it i have to aggegate multiple columns and I don't want to split the table into multiple dataframes and resolve it indivisually and merge into one, can someone please suggest any smart way to do it.
To count unique values per groups in Python Pandas, we can use df. groupby('column_name'). count().
We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.
Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values.
Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.
Pandas GroupBy – Count occurrences in column 1 Import module 2 Create or import data frame 3 Apply groupby 4 Use any of the two methods 5 Display result More ...
Use pandas DataFrame.groupby () to group the rows by column and use count () method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well. The below example does the grouping on Courses column and calculates count how many times each value is present. Yields below output.
Pandas: How to Group and Aggregate by Multiple Columns Often you may want to group and aggregate by multiple columns of a pandas DataFrame. Fortunately this is easy to do using the pandas.groupby () and.agg () functions. This tutorial explains several examples of how to use these functions in practice.
... Each iteration on the groupby object will return two values. The first value is the identifier of the group, which is the value for the column (s) on which they were grouped. The second value is the group itself, which is a Pandas DataFrame object.
True and False in your dummy df are strings, you can convert them to int and sum
dummy.replace({'TRUE':1,'FALSE':0}).groupby('date',as_index = False).sum()
date Action1 Action2
0 01/09/2020 2 1
1 02/09/2020 1 1
2 03/09/2020 2 1
You can also try:
dummy.set_index(['date']).eq('TRUE').sum(level='date')
Output:
Action1 Action2
date
01/09/2020 2 1
02/09/2020 1 1
03/09/2020 2 1
Anyone seeing this answer should look at the answers by @QuangHoang or @Vaishali
They are much better answers. I can't control what the OP chooses, but you should go upvote those answers.
dummy.iloc[:, 1:].eq('TRUE').groupby(dummy.date).sum()
Action1 Action2
date
01/09/2020 2 1
02/09/2020 1 1
03/09/2020 2 1
Fix your dataframe such that it has actual True
/False
values
from ast import literal_eval
dummy = dummy.assign(**dummy[['Action1', 'Action2']].applymap(str.title).applymap(literal_eval))
Then use groupby
dummy.groupby('date').sum()
Action1 Action2
date
01/09/2020 2 1
02/09/2020 1 1
03/09/2020 2 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With