Example code:
In [1]: import pandas as pd
In [2]: serie = pd.Series(['this#is#a#test', 'another#test'])
In [3]: serie.str.split('#', expand=True)
Out[3]:
0 1 2 3
0 this is a test
1 another test None None
Is it possible to split without stripping the split criteria string? Output of the above would be:
Out[3]:
0 1 2 3
0 this #is #a #test
1 another #test None None
EDIT 1: Real use case would be to keep matching pattern, for instance:
serie.str.split(r'\n\*\*\* [A-Z]+', expand=True)
And [A-Z]+ are processing steps in my case, which i want to keep for further processing.
split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.
When using expand=True , the split elements will expand out into separate columns. If NaN is present, it is propagated throughout the columns during the split. For slightly more complex use cases like splitting the html document name from a url, a combination of parameter settings can be used.
You could split by using a positive look ahead. So the split point will be the point just before the postivie look ahead expression.
import pandas as pd
serie = pd.Series(['this#is#a#test', 'another#test'])
print(serie.str.split('(?=#)', expand=True))
OUTPUT
0 1 2 3
0 this #is #a #test
1 another #test None None
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With