Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python split string on regex

Tags:

I'm trying to split a string using a regular expression.

Friday 1Friday 11 JAN 11

The output I want to achieve is

['Friday 1', 'Friday 11', ' JAN 11']

My snippet so far is not producing the desired results:

>>> import re
>>> p = re.compile(r'(Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday)\s*\d{1,2}')
>>> filter(None, p.split('Friday 1Friday 11 JAN 11'))
['Friday', 'Friday', ' JAN 11']

What am I doing wrong with my regex?

like image 725
Jared Knipp Avatar asked Feb 14 '11 18:02

Jared Knipp


People also ask

Can you use regex in Python split?

Introduction to the Python regex split() functionpattern is a regular expression whose matches will be used as separators for splitting. string is an input string to split. maxsplit determines at most the splits occur. Generally, if the maxsplit is one, the resulting list will have two elements.

How split a string in regex?

To split a string by a regular expression, pass a regex as a parameter to the split() method, e.g. str. split(/[,. \s]/) . The split method takes a string or regular expression and splits the string based on the provided separator, into an array of substrings.

How do you split a pattern in Python?

If you want to split a string that matches a regular expression (regex) instead of perfect match, use the split() of the re module. In re. split() , specify the regex pattern in the first parameter and the target character string in the second parameter. An example of split by consecutive numbers is as follows.

How do you split a string in regex in Python?

Regex example to split a string into words In this example, we will split the target string at each white-space character using the \s special sequence. Let's add the + metacharacter at the end of \s . Now, The \s+ regex pattern will split the target string on the occurrence of one or more whitespace characters.


2 Answers

The problem is the capturing parentheses. This syntax: (?:...) makes them non-capturing. Try:

p = re.compile(r'((?:Friday|Saturday)\s*\d{1,2})')
like image 67
scoffey Avatar answered Oct 23 '22 06:10

scoffey


You can also use 're.findall' function.

\>>> val  
'Friday 1Friday 11 JAN 11 '  
\>>> pat = re.compile(r'(\w+\s*\d*)')  
\>>> m=re.findall(pat,val)  
\>>> m  
['Friday 1', 'Friday 11', 'JAN 11']
like image 34
sateesh Avatar answered Oct 23 '22 06:10

sateesh