I have a Pandas Data Frame with two string columns, which I would like to split on space, like this:
df =
A B
0.1 0.5 0.01 ... 0.3 0.1 0.4 ...
I would like to split both these columns and form new columns for as many values, which result out of the split.
So, the result:
df =
A1 A2. A3 ... B1 B2 B3
0.1 0.5 0.01 ... 0.3 0.1 0.4
Currently, I am doing:
df = df.join(df['A'].str.split(' ', expand = True)
df = df.join(df['B'].str.split(' ', expand = True)
But, I get the following error:
columns overlap but no suffix specified
This is because I guess columns names of 1st and 2nd split overlap?
So, my question is how to split multiple columns by providing column names or suffixes for multiple splits?
Use DataFrame.add_prefix
for columns names by splitted column:
df = df.join(df['A'].str.split(expand = True).add_prefix('A'))
df = df.join(df['B'].str.split(expand = True).add_prefix('B'))
print (df)
A B A0 A1 A2 B0 B1 B2
0 0.1 0.5 0.01 0.3 0.1 0.4 0.1 0.5 0.01 0.3 0.1 0.4
Another idea is use list comprehension:
cols = ['A','B']
df1 = pd.concat([df[c].str.split(expand=True).add_prefix(c) for c in cols], axis=1)
print (df1)
A0 A1 A2 B0 B1 B2
0 0.1 0.5 0.01 0.3 0.1 0.4
And for add all original columns:
df = df.join(df1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With