I have a string column in dataframe like this:
ID col1
id1 AA's 2015:45,BB:96
id2 Jigga:91,OO:73,BB:34
I want to create a new dataframe out of this which can take the shape:
ID var1 var2 var3 var4
id1 45 96 0 0
id2 0 34 91 73
where var1=AA's 2015,var2=BB,var3=Jigga,var4=OO
I have stored all distinct values of string's first values in a list like this:
["AA's 2015","BB","Jigga","OO"]
I want to iterate through this list and for each value create a variable var[i] which will take up it's value from col1 for that particular ID.
I can use the for loop for iterating through the list. But how to lookup the value and put in var[i]?
Any ideas will be appreciated
Use apply to manipulate the strings into a pandas Series. The function passed to apply will be called on each string. The returned values, Series, are then merged into a single DataFrame. applyreturns this DataFrame.
The DataFrame's column labels come from merging all the Series' indices. The merging also places the Series values in the appropriate columns, which thus yields the desired result:
import pandas as pd
df = pd.DataFrame({'ID': ['id1', 'id2'], 'col1': ["AA: 2015:45,BB:96", 'Jigga:91,OO:73,BB:34']})
result = df['col1'].apply(lambda x: pd.Series(
dict([
item for item in [
part.rsplit(':',1) for part in x.split(',')]
if len(item)>1 # remove items corresponding to empty strings
]))).fillna(0)
result = result.rename(columns={name:'var{}'.format(i) for i, name in
enumerate(result.columns, 1)})
result = pd.concat([df[['ID']], result], axis=1)
print(result)
yields
ID var1 var2 var3 var4
0 id1 45 96 0 0
1 id2 0 34 91 73
I learned this trick here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With