Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas str.split without stripping split pattern

Example code:

In [1]: import pandas as pd

In [2]: serie = pd.Series(['this#is#a#test', 'another#test'])

In [3]: serie.str.split('#', expand=True)
Out[3]:
         0     1     2     3
0     this    is     a  test
1  another  test  None  None

Is it possible to split without stripping the split criteria string? Output of the above would be:

Out[3]:
         0     1     2     3
0     this   #is    #a #test
1  another #test  None  None

EDIT 1: Real use case would be to keep matching pattern, for instance:

serie.str.split(r'\n\*\*\* [A-Z]+', expand=True)

And [A-Z]+ are processing steps in my case, which i want to keep for further processing.

like image 210
roirodriguez Avatar asked Jul 31 '19 10:07

roirodriguez


People also ask

How do you split Str in Pandas?

split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.

What is expand true?

When using expand=True , the split elements will expand out into separate columns. If NaN is present, it is propagated throughout the columns during the split. For slightly more complex use cases like splitting the html document name from a url, a combination of parameter settings can be used.


1 Answers

You could split by using a positive look ahead. So the split point will be the point just before the postivie look ahead expression.

import pandas as pd

serie = pd.Series(['this#is#a#test', 'another#test'])
print(serie.str.split('(?=#)', expand=True))

OUTPUT

         0      1     2      3
0     this    #is    #a  #test
1  another  #test  None   None
like image 197
Chris Doyle Avatar answered Sep 18 '22 16:09

Chris Doyle