Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas.series.split(' ',expand =True) With Column Names

I have a Pandas Data Frame with two string columns, which I would like to split on space, like this:

 df =
        A                                   B
        0.1  0.5  0.01 ...                    0.3  0.1  0.4 ...

I would like to split both these columns and form new columns for as many values, which result out of the split.

So, the result:

df =
       A1      A2.    A3  ...               B1        B2        B3
       0.1     0.5   0.01 ...               0.3       0.1       0.4

Currently, I am doing:

 df = df.join(df['A'].str.split(' ', expand = True)
 df = df.join(df['B'].str.split(' ', expand = True)

But, I get the following error:

 columns overlap but no suffix specified

This is because I guess columns names of 1st and 2nd split overlap?

So, my question is how to split multiple columns by providing column names or suffixes for multiple splits?

like image 262
learner Avatar asked Sep 17 '25 03:09

learner


1 Answers

Use DataFrame.add_prefix for columns names by splitted column:

df = df.join(df['A'].str.split(expand = True).add_prefix('A'))
df = df.join(df['B'].str.split(expand = True).add_prefix('B'))
print (df)
              A            B   A0   A1    A2   B0   B1   B2
0  0.1 0.5 0.01  0.3 0.1 0.4  0.1  0.5  0.01  0.3  0.1  0.4

Another idea is use list comprehension:

cols = ['A','B']
df1 = pd.concat([df[c].str.split(expand=True).add_prefix(c) for c in cols], axis=1)
print (df1)
    A0   A1    A2   B0   B1   B2
0  0.1  0.5  0.01  0.3  0.1  0.4

And for add all original columns:

df = df.join(df1)
like image 196
jezrael Avatar answered Sep 19 '25 14:09

jezrael