I have a <code>pandas.DataFrame</code> that looks like this. <pre class="prettyprint"><code>COL1 COL2 COL3 C1 None None C1 C2 None C1 C1 None C1 C2 C3 </code></pre> For each row in this dataframe I would like to count the occurrences of each of C1, C2, C3 and append this information as columns to this dataframe. For instance, the first row has 1 C1, 0 C2 and 0 C3. The final data frame should look like this <pre class="prettyprint"><code>COL1 COL2 COL3 C1 C2 C3 C1 None None 1 0 0 C1 C2 None 1 1 0 C1 C1 None 2 0 0 C1 C2 C3 1 1 1 </code></pre> So, I have created a Series with C1, C2 and C3 as the values - one way top count this is to loop over the rows and columns of the DataFrame and then over this Series and increment the counter if it matches. But is there an <code>apply</code> approach that can achieve this in a compact fashion?

You could apply <code>value_counts</code>: <pre class="prettyprint"><code>In [11]: df.apply(pd.Series.value_counts, axis=1) Out[11]: C1 C2 C3 None 0 1 NaN NaN 2 1 1 1 NaN 1 2 2 NaN NaN 1 3 1 1 1 NaN </code></pre> So you can fill the NaN and applend just the base values you want: <pre class="prettyprint"><code>In [12]: df.apply(pd.Series.value_counts, axis=1)[['C1', 'C2', 'C3']].fillna(0) Out[12]: C1 C2 C3 0 1 0 0 1 1 1 0 2 2 0 0 3 1 1 1 </code></pre> Note: there's an open issue to have a value_counts method directly for a DataFrame (which I think should be introduced by pandas 0.15).

Count occurrences of items in Series in each row of a DataFrame

Tags:

python

pandas

apply

I have a pandas.DataFrame that looks like this.

COL1    COL2    COL3
C1      None    None
C1      C2      None
C1      C1      None
C1      C2      C3

For each row in this dataframe I would like to count the occurrences of each of C1, C2, C3 and append this information as columns to this dataframe. For instance, the first row has 1 C1, 0 C2 and 0 C3. The final data frame should look like this

COL1    COL2    COL3    C1  C2  C3
C1      None    None    1   0   0
C1      C2      None    1   1   0
C1      C1      None    2   0   0
C1      C2      C3      1   1   1

So, I have created a Series with C1, C2 and C3 as the values - one way top count this is to loop over the rows and columns of the DataFrame and then over this Series and increment the counter if it matches. But is there an apply approach that can achieve this in a compact fashion?

670

asked Jul 01 '14 17:07

sriramn

1 Answers

You could apply value_counts:

In [11]: df.apply(pd.Series.value_counts, axis=1)
Out[11]: 
   C1  C2  C3  None
0   1 NaN NaN     2
1   1   1 NaN     1
2   2 NaN NaN     1
3   1   1   1   NaN

So you can fill the NaN and applend just the base values you want:

In [12]: df.apply(pd.Series.value_counts, axis=1)[['C1', 'C2', 'C3']].fillna(0)
Out[12]: 
   C1  C2  C3
0   1   0   0
1   1   1   0
2   2   0   0
3   1   1   1

Note: there's an open issue to have a value_counts method directly for a DataFrame (which I think should be introduced by pandas 0.15).

answered Sep 28 '22 05:09

Andy Hayden

Related questions
                            
                                django manage.py settings default
                            
                                Debugging code in the Python interpreter
                            
                                Simple tutorial for Neo4J and using it with django + python
                            
                                Read a text file with non-ASCII characters in an unknown encoding
                            
                                Can we use regular expressions to check if there are an odd number of each type of character?
                            
                                How do I disable the keyboard shortcuts in Matplotlib?
                            
                                Get country code for timezone using pytz?
                            
                                have sphinx report broken links
                            
                                Python module won't install
                            
                                How do I plot a spectrogram the same way that pylab's specgram() does?
                            
                                What's the unit of RSS in psutil.Process.get_memory_info?
                            
                                Filtering two lists simultaneously
                            
                                shebang env preferred python version
                            
                                How to compile a string of Python code into a module whose functions can be called?
                            
                                A ThreadPoolExecutor inside a ProcessPoolExecutor
                            
                                Is it possible to read data from an Excel sheet in Python using Xlsxwriter? If so how?
                            
                                Print all fields of ctypes "Structure" with introspection
                            
                                Finding next occurring tag and its enclosed text with Beautiful Soup
                            
                                Inline in Django admin: has no ForeignKey
                            
                                Python get most recent file in a directory with certain extension

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With