I'm looking to split a string Series at different points depending on the length of certain substrings: <pre class="prettyprint"><code>In [47]: df = pd.DataFrame(['group9class1', 'group10class2', 'group11class20'], columns=['group_class']) In [48]: split_locations = df.group_class.str.rfind('class') In [49]: split_locations Out[49]: 0 6 1 7 2 7 dtype: int64 In [50]: df Out[50]: group_class 0 group9class1 1 group10class2 2 group11class20 </code></pre> My output should look like: <pre class="prettyprint"><code> group_class group class 0 group9class1 group9 class1 1 group10class2 group10 class2 2 group11class20 group11 class20 </code></pre> I half-thought this might work: <pre class="prettyprint"><code>In [56]: df.group_class.str[:split_locations] Out[56]: 0 NaN 1 NaN 2 NaN </code></pre> How can I slice my strings by the variable locations in <code>split_locations</code>?

This works, by using double <code>[[]]</code> you can access the index value of the current element so you can index into the <code>split_locations</code> series: <pre class="prettyprint"><code>In [119]: df[['group_class']].apply(lambda x: pd.Series([x.str[split_locations[x.name]:][0], x.str[:split_locations[x.name]][0]]), axis=1) Out[119]: 0 1 0 class1 group9 1 class2 group10 2 class20 group11 </code></pre> Or as @ajcr has suggested you can <code>extract</code>: <pre class="prettyprint"><code>In [106]: df['group_class'].str.extract(r'(?P<group>group[0-9]+)(?P<class>class[0-9]+)') Out[106]: group class 0 group9 class1 1 group10 class2 2 group11 class20 </code></pre> EDIT Regex explanation: the regex came from @ajcr (thanks!), this uses <code>str.extract</code> to extract groups, the groups become new columns. So <code>?P<group></code> here identifies an id for a specific group to look for, if this is missing then an int will be returned for the column name. so the rest should be self-explanatory: <code>group[0-9]</code> looks for the string <code>group</code> followed by the digits in range <code>[0-9]</code> which is what the <code>[]</code> indicate, this is equivalent to <code>group\d</code> where <code>\d</code> means digit. So it could be re-written as: <pre class="prettyprint"><code>df['group_class'].str.extract(r'(?P<group>group\d+)(?P<class>class\d+)') </code></pre>

Use a regular expression to split the string <pre class="prettyprint"><code> import re regex = re.compile("(class)") str="group1class23" # this will split the group and the class string by adding a space between them, and using a simple split on space. split_string = re.sub(regex, " \\1", str).split(" ") </code></pre> This will return the array: <pre class="prettyprint"><code> ['group9', 'class23'] </code></pre> So to append two new columns to your <code>DataFrame</code> you can do: <pre class="prettyprint"><code>new_cols = [re.sub(regex, " \\1", x).split(" ") for x in df.group_class] df['group'], df['class'] = zip(*new_cols) </code></pre> Which results in: <pre class="prettyprint"><code> group_class group class 0 group9class1 group9 class1 1 group10class2 group10 class2 2 group11class20 group11 class20 </code></pre>

Slice/split string Series at various positions

I'm looking to split a string Series at different points depending on the length of certain substrings:

In [47]: df = pd.DataFrame(['group9class1', 'group10class2', 'group11class20'], columns=['group_class'])
In [48]: split_locations = df.group_class.str.rfind('class')
In [49]: split_locations
Out[49]: 
0    6
1    7
2    7
dtype: int64
In [50]: df
Out[50]: 
      group_class
0    group9class1
1   group10class2
2  group11class20

My output should look like:

      group_class    group    class
0    group9class1   group9   class1
1   group10class2  group10   class2
2  group11class20  group11  class20

I half-thought this might work:

In [56]: df.group_class.str[:split_locations]
Out[56]: 
0   NaN
1   NaN
2   NaN

How can I slice my strings by the variable locations in split_locations?

What is str slice in Python?

String slicing in Python is about obtaining a sub-string from the given string by slicing it respectively from start to end.

What is split in ServiceNow?

ServiceNow DevOps provides an automated change management engine that maintains appropriate governance in less time. Integrate Split with ServiceNow to automate the creation of change tickets and set up policies based on data within ServiceNow for automatic approvals.

This works, by using double [[]] you can access the index value of the current element so you can index into the split_locations series:

In [119]:    
df[['group_class']].apply(lambda x: pd.Series([x.str[split_locations[x.name]:][0], x.str[:split_locations[x.name]][0]]), axis=1)
Out[119]:
         0        1
0   class1   group9
1   class2  group10
2  class20  group11

Or as @ajcr has suggested you can extract:

In [106]:

df['group_class'].str.extract(r'(?P<group>group[0-9]+)(?P<class>class[0-9]+)')
Out[106]:
     group    class
0   group9   class1
1  group10   class2
2  group11  class20

EDIT

Regex explanation:

the regex came from @ajcr (thanks!), this uses str.extract to extract groups, the groups become new columns.

So ?P<group> here identifies an id for a specific group to look for, if this is missing then an int will be returned for the column name.

so the rest should be self-explanatory: group[0-9] looks for the string group followed by the digits in range [0-9] which is what the [] indicate, this is equivalent to group\d where \d means digit.

So it could be re-written as:

df['group_class'].str.extract(r'(?P<group>group\d+)(?P<class>class\d+)')

Use a regular expression to split the string

 import re

 regex = re.compile("(class)")
 str="group1class23"
 # this will split the group and the class string by adding a space between them, and using a simple split on space.
 split_string = re.sub(regex, " \\1", str).split(" ")

This will return the array:

 ['group9', 'class23']

So to append two new columns to your DataFrame you can do:

new_cols = [re.sub(regex, " \\1", x).split(" ") for x in df.group_class]
df['group'], df['class'] = zip(*new_cols)

Which results in:

      group_class    group    class
0    group9class1   group9   class1
1   group10class2  group10   class2
2  group11class20  group11  class20

Slice/split string Series at various positions

Tags:

python

pandas

LondonRob

People also ask

2 Answers

EdChum

Rob

Recent Activity

Donate For Us

Slice/split string Series at various positions

Tags:

python

pandas

LondonRob

People also ask

2 Answers

EdChum

Rob

Related questions

Recent Activity

Donate For Us