<p>I have a pandas dataframe like this</p> <pre class="prettyprint lang-py prettyprint-override"><code> Windows Linux Mac 0 True False False 1 False True False 2 False False True </code></pre> <p>I want to combine these three columns in a single column like this</p> <pre class="prettyprint lang-py prettyprint-override"><code> OS 0 Windows 1 Linux 2 Mac </code></pre> <p>I know that I can write a simple function like this</p> <pre class="prettyprint lang-py prettyprint-override"><code>def aggregate_os(row): if row['Windows'] == True: return 'Windows' if row['Linux'] == True: return 'Linux' if row['Mac'] == True: return 'Mac' </code></pre> <p>which I can call like this</p> <pre class="prettyprint lang-py prettyprint-override"><code>df['OS'] = df.apply(aggregate_os, axis=1) </code></pre> <p>The problem is that my dataset is huge and this solution is too slow. Is there a more efficient way of doing this aggregation?</p>

<h3><code>idxmax</code></h3> <pre class="prettyprint"><code>df.idxmax(1).to_frame('OS') OS 0 Windows 1 Linux 2 Mac </code></pre> <hr> <h3><code>np.select</code></h3> <pre class="prettyprint"><code>pd.DataFrame( {'OS': np.select([*map(df.get, df)], [*df])}, df.index ) OS 0 Windows 1 Linux 2 Mac </code></pre> <hr> <h3><code>dot</code></h3> <pre class="prettyprint"><code>df.dot(df.columns).to_frame('OS') OS 0 Windows 1 Linux 2 Mac </code></pre> <hr> <h3><code>np.where</code></h3> <p>Assuming only one <code>True</code> per row</p> <pre class="prettyprint"><code>pd.DataFrame( {'OS': df.columns[np.where(df)[1]]}, df.index ) OS 0 Windows 1 Linux 2 Mac </code></pre>

Melt multiple boolean columns in a single column in pandas

Tags:

performance

python-3.x

pandas

dataframe

I have a pandas dataframe like this

  Windows Linux Mac
0 True    False False
1 False   True  False
2 False   False True

I want to combine these three columns in a single column like this

  OS
0 Windows
1 Linux
2 Mac

I know that I can write a simple function like this

def aggregate_os(row):
   if row['Windows'] == True:
      return 'Windows'
   if row['Linux'] == True:
      return 'Linux'
   if row['Mac'] == True:
      return 'Mac'

which I can call like this

df['OS'] = df.apply(aggregate_os, axis=1)

The problem is that my dataset is huge and this solution is too slow. Is there a more efficient way of doing this aggregation?

258

asked Sep 09 '19 21:09

sm1994

2 Answers

`idxmax`

df.idxmax(1).to_frame('OS')

        OS
0  Windows
1    Linux
2      Mac

`np.select`

pd.DataFrame(
    {'OS': np.select([*map(df.get, df)], [*df])},
    df.index
)

        OS
0  Windows
1    Linux
2      Mac

`dot`

df.dot(df.columns).to_frame('OS')

        OS
0  Windows
1    Linux
2      Mac

`np.where`

Assuming only one True per row

pd.DataFrame(
   {'OS': df.columns[np.where(df)[1]]},
    df.index
)

        OS
0  Windows
1    Linux
2      Mac

110

answered Sep 20 '22 14:09

piRSquared

Using boolean indexing with stack and rename

df_new = df.stack()
df_new[df_new].reset_index(level=1).rename(columns={'level_1':'OS'}).drop(columns=0)

Output

        OS
0  Windows
1    Linux
2      Mac

answered Sep 18 '22 14:09

Erfan

Related questions
                            
                                list() vs iterable unpacking in Python 3.5+
                            
                                python OpenAI gym monitor creates json files in the recording directory
                            
                                Comparing numpy array of dtype object
                            
                                ValueError: Negative dimension size caused by subtracting 3 from 1 for 'conv1d_1/convolution/Conv2D
                            
                                How to check If Path Exists Using Fabric2.x
                            
                                What is the difference between super().__repr__() and repr(super())?
                            
                                Python - Creating Dictionaries by reading text files and searching through that dictionary
                            
                                How do I make my discord.py bot play mp3 in voice channel?
                            
                                why Tensorflow-gpu is still using cpu
                            
                                Specify keys for mypy in python dictionary
                            
                                Only numbers are missing Weasyprint PDF
                            
                                How to use Selenium on Colaboratory Google?
                            
                                Python: How to write error in the console in txt file?
                            
                                Matplotlib sharex on data with different x values?
                            
                                python 3.6+ logger to log pandas dataframe - how to indent the entire dataframe?
                            
                                Simple Python TCP forking server using asyncio
                            
                                How to By Pass NTLM authentication pop up while performing automation testing using Selenium web driver for Chrome browser?
                            
                                How to fix this problem when installing mysqlclient using pip
                            
                                How to Fix AttributeError: module 'botocore.vendored.requests' has no attribute 'Post' Traceback
                            
                                Trouble parsing tabular items from a graph located in a website

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With