I have a pandas dataframe like this
Windows Linux Mac
0 True False False
1 False True False
2 False False True
I want to combine these three columns in a single column like this
OS
0 Windows
1 Linux
2 Mac
I know that I can write a simple function like this
def aggregate_os(row):
if row['Windows'] == True:
return 'Windows'
if row['Linux'] == True:
return 'Linux'
if row['Mac'] == True:
return 'Mac'
which I can call like this
df['OS'] = df.apply(aggregate_os, axis=1)
The problem is that my dataset is huge and this solution is too slow. Is there a more efficient way of doing this aggregation?
To combine the values of all the column and append them into a single column, we will use apply() method inside which we will write our expression to do the same. Whenever we want to perform some operation on the entire DataFrame, we use apply() method.
You can use the DataFrame. apply() and pd. to_datetime() function to convert multiple columns to DataTime. apply() function applies function to each and every row and column of the DataFrame.
split() function is used to break up single column values into multiple columns based on a specified separator or delimiter. The Series. str. split() function is similar to the Python string split() method, but split() method works on the all Dataframe columns, whereas the Series.
idxmax
df.idxmax(1).to_frame('OS')
OS
0 Windows
1 Linux
2 Mac
np.select
pd.DataFrame(
{'OS': np.select([*map(df.get, df)], [*df])},
df.index
)
OS
0 Windows
1 Linux
2 Mac
dot
df.dot(df.columns).to_frame('OS')
OS
0 Windows
1 Linux
2 Mac
np.where
Assuming only one True
per row
pd.DataFrame(
{'OS': df.columns[np.where(df)[1]]},
df.index
)
OS
0 Windows
1 Linux
2 Mac
Using boolean indexing
with stack
and rename
df_new = df.stack()
df_new[df_new].reset_index(level=1).rename(columns={'level_1':'OS'}).drop(columns=0)
Output
OS
0 Windows
1 Linux
2 Mac
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With