Example code: <pre class="prettyprint"><code>In [1]: import pandas as pd In [2]: serie = pd.Series(['this#is#a#test', 'another#test']) In [3]: serie.str.split('#', expand=True) Out[3]: 0 1 2 3 0 this is a test 1 another test None None </code></pre> Is it possible to split without stripping the split criteria string? Output of the above would be: <pre class="prettyprint"><code>Out[3]: 0 1 2 3 0 this #is #a #test 1 another #test None None </code></pre> EDIT 1: Real use case would be to keep matching pattern, for instance: <pre class="prettyprint"><code>serie.str.split(r'\n\*\*\* [A-Z]+', expand=True) </code></pre> And [A-Z]+ are processing steps in my case, which i want to keep for further processing.

You could split by using a positive look ahead. So the split point will be the point just before the postivie look ahead expression. <pre class="prettyprint"><code>import pandas as pd serie = pd.Series(['this#is#a#test', 'another#test']) print(serie.str.split('(?=#)', expand=True)) </code></pre> OUTPUT <pre class="prettyprint"><code> 0 1 2 3 0 this #is #a #test 1 another #test None None </code></pre>

Pandas str.split without stripping split pattern

Tags:

python

regex

pandas

Example code:

In [1]: import pandas as pd

In [2]: serie = pd.Series(['this#is#a#test', 'another#test'])

In [3]: serie.str.split('#', expand=True)
Out[3]:
         0     1     2     3
0     this    is     a  test
1  another  test  None  None

Is it possible to split without stripping the split criteria string? Output of the above would be:

Out[3]:
         0     1     2     3
0     this   #is    #a #test
1  another #test  None  None

EDIT 1: Real use case would be to keep matching pattern, for instance:

serie.str.split(r'\n\*\*\* [A-Z]+', expand=True)

And [A-Z]+ are processing steps in my case, which i want to keep for further processing.

210

asked Jul 31 '19 10:07

roirodriguez

1 Answers

You could split by using a positive look ahead. So the split point will be the point just before the postivie look ahead expression.

import pandas as pd

serie = pd.Series(['this#is#a#test', 'another#test'])
print(serie.str.split('(?=#)', expand=True))

OUTPUT

         0      1     2      3
0     this    #is    #a  #test
1  another  #test  None   None

197

answered Sep 18 '22 16:09

Chris Doyle

Related questions
                            
                                ModuleNotFoundError when using importlib.import_module
                            
                                Pandas Timestamp rounds 30 seconds inconsistently
                            
                                How to create a Pandas DataFrame from dictionary of dataframes?
                            
                                Perform operations after styling in a dataframe
                            
                                Missing values in Pandas Pivot table?
                            
                                Optimizing suggestions for a piece of Julia and Python code
                            
                                Remove string element in a list of strings if the first characters match with another string element in the list
                            
                                DiGraph parallel ordering
                            
                                Drop rows in pandas if records in two columns do not appear together at least twice in the dataset
                            
                                Django Rest Framework Custom JWT authentication
                            
                                How to fetch a product from woocommerce api based on the sku?
                            
                                Pulling Zillow Rent Data from Zillow API
                            
                                How to convert a continuous variable to a categorical variable?
                            
                                Nexus pypi repository "Could not find a version that satisfies the requirement"
                            
                                Find an element where data-tb-test-id attribute is present instead of id using Selenium and Python
                            
                                How to properly use dask's upload_file() to pass local code to workers
                            
                                Matplotlib plot from Python script not showing up in output when run in Jupyter Notebook
                            
                                pandas int or float column to percentage distribution
                            
                                How to use pathlib.Path.expanduser() and amend and use a PosixPath value?
                            
                                How SelectKBest (chi2) calculates score?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With