I have a pandas.DataFrame
that looks like this.
COL1 COL2 COL3
C1 None None
C1 C2 None
C1 C1 None
C1 C2 C3
For each row in this dataframe I would like to count the occurrences of each of C1, C2, C3 and append this information as columns to this dataframe. For instance, the first row has 1 C1, 0 C2 and 0 C3. The final data frame should look like this
COL1 COL2 COL3 C1 C2 C3
C1 None None 1 0 0
C1 C2 None 1 1 0
C1 C1 None 2 0 0
C1 C2 C3 1 1 1
So, I have created a Series with C1, C2 and C3 as the values - one way top count this is to loop over the rows and columns of the DataFrame and then over this Series and increment the counter if it matches. But is there an apply
approach that can achieve this in a compact fashion?
You can apply a function to each row of the DataFrame with apply method. In the applied function, you can first transform the row into a boolean array using between method or with standard relational operators, and then count the True values of the boolean array with sum method.
We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.
How do you Count the Number of Occurrences in a data frame? To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
You could apply value_counts
:
In [11]: df.apply(pd.Series.value_counts, axis=1)
Out[11]:
C1 C2 C3 None
0 1 NaN NaN 2
1 1 1 NaN 1
2 2 NaN NaN 1
3 1 1 1 NaN
So you can fill the NaN and applend just the base values you want:
In [12]: df.apply(pd.Series.value_counts, axis=1)[['C1', 'C2', 'C3']].fillna(0)
Out[12]:
C1 C2 C3
0 1 0 0
1 1 1 0
2 2 0 0
3 1 1 1
Note: there's an open issue to have a value_counts method directly for a DataFrame (which I think should be introduced by pandas 0.15).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With