Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iterating re.split() on a dataframe

I am trying to use re.split() to split a single variable in a pandas dataframe into two other variables.

My data looks like:

   xg              
0.05+0.43
0.93+0.05
0.00
0.11+0.11
0.00
3.94-2.06

I want to create

 e      a
0.05  0.43
0.93  0.05
0.00  
0.11  0.11
0.00
3.94  2.06

I can do this using a for loop and and indexing.

for i in range(len(df)):
    if df['xg'].str.len()[i] < 5:
        df['e'][i] = df['xg'][i]
    else:
        df['e'][i], df['a'][i] = re.split("[\+ \-]", df['xg'][i])

However this is slow and I do not believe is a good way of doing this and I am trying to improve my code/python understanding.

I had made various attempts by trying to write it using np.where, or using a list comprehension or apply lambda but I can't get it too run. I think all the issues I have are because I am trying to apply the functions to the whole series rather than the positional value.

If anyone has an idea of a better method than my ugly for loop I would be very interested.

like image 543
oldlizard Avatar asked Nov 20 '18 21:11

oldlizard


Video Answer


1 Answers

Borrowed from this answer using the str.split method with the expand argument: https://stackoverflow.com/a/14745484/3084939

df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})
df[['left','right']] = df['col'].str.split('[+|-]', expand=True)

df.head()
       col left right
0      1+2    1     2
1      3+4    3     4
2       20   20  None
3  0.6+1.6  0.6   1.6
like image 153
wonderstruck80 Avatar answered Oct 12 '22 03:10

wonderstruck80