Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count occurrences of items in Series in each row of a DataFrame

I have a pandas.DataFrame that looks like this.

COL1    COL2    COL3
C1      None    None
C1      C2      None
C1      C1      None
C1      C2      C3

For each row in this dataframe I would like to count the occurrences of each of C1, C2, C3 and append this information as columns to this dataframe. For instance, the first row has 1 C1, 0 C2 and 0 C3. The final data frame should look like this

COL1    COL2    COL3    C1  C2  C3
C1      None    None    1   0   0
C1      C2      None    1   1   0
C1      C1      None    2   0   0
C1      C2      C3      1   1   1

So, I have created a Series with C1, C2 and C3 as the values - one way top count this is to loop over the rows and columns of the DataFrame and then over this Series and increment the counter if it matches. But is there an apply approach that can achieve this in a compact fashion?

like image 670
sriramn Avatar asked Jul 01 '14 17:07

sriramn


People also ask

How do you count occurrences of values within a specific range by row?

You can apply a function to each row of the DataFrame with apply method. In the applied function, you can first transform the row into a boolean array using between method or with standard relational operators, and then count the True values of the boolean array with sum method.

How do you count occurrences of specific value in pandas row?

We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.

How do you count occurrences in pandas series?

How do you Count the Number of Occurrences in a data frame? To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.

How do you count occurrences in a data frame?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.


1 Answers

You could apply value_counts:

In [11]: df.apply(pd.Series.value_counts, axis=1)
Out[11]: 
   C1  C2  C3  None
0   1 NaN NaN     2
1   1   1 NaN     1
2   2 NaN NaN     1
3   1   1   1   NaN

So you can fill the NaN and applend just the base values you want:

In [12]: df.apply(pd.Series.value_counts, axis=1)[['C1', 'C2', 'C3']].fillna(0)
Out[12]: 
   C1  C2  C3
0   1   0   0
1   1   1   0
2   2   0   0
3   1   1   1

Note: there's an open issue to have a value_counts method directly for a DataFrame (which I think should be introduced by pandas 0.15).

like image 84
Andy Hayden Avatar answered Sep 28 '22 05:09

Andy Hayden