Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Melt multiple boolean columns in a single column in pandas

I have a pandas dataframe like this

  Windows Linux Mac
0 True    False False
1 False   True  False
2 False   False True

I want to combine these three columns in a single column like this

  OS
0 Windows
1 Linux
2 Mac

I know that I can write a simple function like this

def aggregate_os(row):
   if row['Windows'] == True:
      return 'Windows'
   if row['Linux'] == True:
      return 'Linux'
   if row['Mac'] == True:
      return 'Mac'

which I can call like this

df['OS'] = df.apply(aggregate_os, axis=1)

The problem is that my dataset is huge and this solution is too slow. Is there a more efficient way of doing this aggregation?

like image 258
sm1994 Avatar asked Sep 09 '19 21:09

sm1994


People also ask

How do I combine multiple columns into one column in pandas?

To combine the values of all the column and append them into a single column, we will use apply() method inside which we will write our expression to do the same. Whenever we want to perform some operation on the entire DataFrame, we use apply() method.

How do I convert multiple columns in pandas?

You can use the DataFrame. apply() and pd. to_datetime() function to convert multiple columns to DataTime. apply() function applies function to each and every row and column of the DataFrame.

How do I split a column with multiple values in pandas?

split() function is used to break up single column values into multiple columns based on a specified separator or delimiter. The Series. str. split() function is similar to the Python string split() method, but split() method works on the all Dataframe columns, whereas the Series.


2 Answers

idxmax

df.idxmax(1).to_frame('OS')

        OS
0  Windows
1    Linux
2      Mac

np.select

pd.DataFrame(
    {'OS': np.select([*map(df.get, df)], [*df])},
    df.index
)

        OS
0  Windows
1    Linux
2      Mac

dot

df.dot(df.columns).to_frame('OS')

        OS
0  Windows
1    Linux
2      Mac

np.where

Assuming only one True per row

pd.DataFrame(
   {'OS': df.columns[np.where(df)[1]]},
    df.index
)

        OS
0  Windows
1    Linux
2      Mac
like image 110
piRSquared Avatar answered Sep 20 '22 14:09

piRSquared


Using boolean indexing with stack and rename

df_new = df.stack()
df_new[df_new].reset_index(level=1).rename(columns={'level_1':'OS'}).drop(columns=0)

Output

        OS
0  Windows
1    Linux
2      Mac
like image 21
Erfan Avatar answered Sep 18 '22 14:09

Erfan