I have a Pandas DataFrame as shown below - I'm using Python Pandas.
+------------+---------+---------+----------+--------+
| Movie Name | English | Chinese | Japanese | Korean |
+------------+---------+---------+----------+--------+
| A | 1 | 0 | 0 | 0 |
| B | 0 | 1 | 1 | 0 |
| C | 0 | 1 | 1 | 1 |
| D | 1 | 0 | 0 | 0 |
| E | 0 | 1 | 0 | 0 |
+------------+---------+---------+----------+--------+
I want to convert it like below, by concatenating Header Names based on values (0 or 1)
Expected Output
+------------+-------------------------+
| Movie Name | Languages |
+------------+-------------------------+
| A | English |
| B | Chinese, Japanese |
| C | Chinese,Japanese,Korean |
| D | English |
| E | Chinese |
+------------+-------------------------+
First create index by DataFrame.set_index
, then use DataFrame.dot
for matrix multiplication of 1
and last remove last ,
by Series.str.rstrip
with Series.reset_index
for 2 column DataFrame
:
df = df.set_index('Movie Name')
df1 = df.dot(df.columns + ',').str.rstrip(',').reset_index(name='Languages')
print (df1)
Movie Name Languages
0 A English
1 B Chinese,Japanese
2 C Chinese,Japanese,Korean
3 D English
4 E Chinese
IIUC, melt
then the problem became a groupby
issue
s=df.melt('MovieName').query('value==1').groupby('MovieName').variable.agg(','.join)
df['New']=df.MovieName.map(s)
df
Out[690]:
MovieName English ... Korean New
0 A 1 ... 0 English
1 B 0 ... 0 Chinese,Japanese
2 C 0 ... 1 Chinese,Japanese,Korean
3 D 1 ... 0 English
4 E 0 ... 0 Chinese
[5 rows x 6 columns]
You can use:
df['langauges'] = (df.eq(1)*df.columns).apply(lambda x : ','.join(x), axis = 1).str.strip(',')
df
Movie Name English Chinese Japanese Korean langauges
0 A 1 0 0 0 English
1 B 0 1 1 0 Chinese,Japanese
2 C 0 1 1 1 Chinese,Japanese,Korean
3 D 1 0 0 0 English
4 E 0 1 0 0 Chinese
It can be done with pandas.Series.str.cat . You can read more about it here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.cat.html
import pandas as pd
import numpy as np
df=pd.DataFrame({'Movie Name':['A','B','C','D','E'],'English':[1,0,0,1,0],'Chinese':[0,1,1,0,1],'Japanese':[0,1,1,0,0],'Korean':[0,0,1,0,0]})
df=df.replace(1,df.columns.to_series())
df=df.replace(0,np.NaN)
df['Languages']=df[['English','Chinese','Japanese','Korean']].apply(lambda x: x.str.cat(sep=","),axis=1)
df=df.drop(columns=['English','Chinese','Japanese','Korean'])
Result:
Movie Name Languages
0 A English
1 B Chinese,Japanese
2 C Chinese,Japanese,Korean
3 D English
4 E Chinese
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With