I have a simple DataFrame:
import pandas as pd
df = pd.DataFrame({'id':list('abcd')})
df['tuples'] = df.index.map(lambda i:(i,i+1))
# outputs:
# id tuples
# 0 a (0, 1)
# 1 b (1, 2)
# 2 c (2, 3)
# 3 d (3, 4)
I can then split the tuples column into two very simply, e.g.
df[['x','y']] = pd.DataFrame(df.tuples.tolist())
# outputs:
# id tuples x y
# 0 a (0, 1) 0 1
# 1 b (1, 2) 1 2
# 2 c (2, 3) 2 3
# 3 d (3, 4) 3 4
This approach also works:
df[['x','y']] = df.apply(lambda x:x.tuples,result_type='expand',axis=1)
However if my DataFrame is slightly more complex, e.g.
df = pd.DataFrame({'id':list('abcd')})
df['tuples'] = df.index.map(lambda i:(i,i+1) if i%2 else None)
# outputs:
# id tuples
# 0 a None
# 1 b (1, 2)
# 2 c None
# 3 d (3, 4)
then the first approach throws "Columns must be same length as key" (of course) because some rows have two values and some have none, and my code anticipates two.
I can use .loc to create single columns, twice.
get_rows = df.tuples.notnull() # return rows with tuples
df.loc[get_rows,'x'] = df.tuples.str[0]
df.loc[get_rows,'y'] = df.tuples.str[1]
# outputs:
# id tuples x y
# 0 a None NaN NaN
# 1 b (1, 2) 1.0 2.0
# 2 c None NaN NaN
# 3 d (3, 4) 3.0 4.0
[Aside: useful how the indexing carries assigns only relevant rows from the right, without having to specify them.]
However, I can't use .loc to create two columns at once, e.g.
# This isn't valid use of .loc
df.loc[get_rows,['x','y']] = df.loc[get_rows,'tuples'].map(lambda x:list(x))
as it throws the error "shape mismatch: value array of shape (2,2) could not be broadcast to indexing result of shape (2,)".
I also can't use this
df[get_rows][['x','y']] = df[get_rows].apply(lambda x:x.tuples,result_type='expand',axis=1)
as it throws the usual "A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc..."
I can't help thinking I'm missing something.
Here is another way (comments inline):
c=df.tuples.astype(bool) #similar to df.tuples.notnull()
#create a dataframe by dropping the None and assign index as df.index where c is True
d=pd.DataFrame(df.tuples.dropna().values.tolist(),columns=list('xy'),index=df[c].index)
final=pd.concat([df,d],axis=1) #concat them both
id tuples x y
0 a None NaN NaN
1 b (1, 2) 1.0 2.0
2 c None NaN NaN
3 d (3, 4) 3.0 4.0
df[get_rows] is a copy, set value to df[get_rows][['x','y']] does not change the underlying data. Just use df[['x','y']] to create now columns.
df = pd.DataFrame({'id':list('abcd')})
df['tuples'] = df.index.map(lambda i:(i,i+1) if i%2 else None)
get_rows = df.tuples.notnull()
df[['x','y']] = df[get_rows].apply(lambda x:x.tuples,result_type='expand',axis=1)
print(df)
id tuples x y
0 a None NaN NaN
1 b (1, 2) 1.0 2.0
2 c None NaN NaN
3 d (3, 4) 3.0 4.0
Another quick fix:
pd.concat([df, pd.DataFrame(df.tuples.to_dict()).T],
axis=1)
returns:
id tuples 0 1
0 a None None None
1 b (1, 2) 1 2
2 c None None None
3 d (3, 4) 3 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With