Regular expression to return all characters between two special characters

Regex

The expression that you need is .*?\[(.*)\].*. The group that you want will be \1.
- .*?: . matches any character but a newline. * is a meta-character and means Repeat this 0 or more times. ? makes the * non-greedy, i.e., . will match up as few chars as possible before hitting a '['.
- \[: \ escapes special meta-characters, which in this case, is [. If we didn't do that, [ would do something very weird instead.
- (.*): Parenthesis 'groups' whatever is inside it and you can later retrieve the groups by their numeric IDs or names (if they're given one).
- \].*: You should know enough by now to know what this means.

Implementation

First, import the re module -- it's not a built-in -- to where-ever you want to use the expression.

Then, use re.search(regex_pattern, string_to_be_tested) to search for the pattern in the string to be tested. This will return a MatchObject which you can store to a temporary variable. You should then call it's group() method and pass 1 as an argument (to see the 'Group 1' we captured using parenthesis earlier). I should now look like:

>>> import re
>>> pat = r'.*?\[(.*)].*'             #See Note at the bottom of the answer
>>> s = "foobar['infoNeededHere']ddd"
>>> match = re.search(pat, s)
>>> match.group(1)
"'infoNeededHere'"

An Alternative

You can also use findall() to find all the non-overlapping matches by modifying the regex to (?>=\[).+?(?=\]).
- (?<=\[): (?<=) is called a look-behind assertion and checks for an expression preceding the actual match.
- .+?: + is just like * except that it matches one or more repititions. It is made non-greedy by ?.
- (?=\]): (?=) is a look-ahead assertion and checks for an expression following the match w/o capturing it.
Your code should now look like:

>>> import re
>>> pat = r'(?<=\[).+?(?=\])'  #See Note at the bottom of the answer
>>> s = "foobar['infoNeededHere']ddd[andHere] [andOverHereToo[]"
>>> re.findall(pat, s)
["'infoNeededHere'", 'andHere', 'andOverHereToo[']

Note: Always use raw Python strings by adding an 'r' before the string (E.g.: r'blah blah blah').

10x for reading! I wrote this answer when there were no accepted ones yet, but by the time I finished it, 2 ore came up and one got accepted. :( x<

^.*\['(.*)'\].*$ will match a line and capture what you want in a group.

You have to escape the [ and ] with \

The documentation at the rubular.com proof link will explain how the expression is formed.

If there's only one of these [.....] tokens per line, then you don't need to use regular expressions at all:

In [7]: mystring = "Bacon, [eggs], and spam"

In [8]: mystring[ mystring.find("[")+1 : mystring.find("]") ]
Out[8]: 'eggs'

If there's more than one of these per line, then you'll need to modify Jarrod's regex ^.*\['(.*)'\].*$ to match multiple times per line, and to be non greedy. (Use the .*? quantifier instead of the .* quantifier.)

In [15]: mystring = "[Bacon], [eggs], and [spam]."

In [16]: re.findall(r"\[(.*?)\]",mystring)
Out[16]: ['Bacon', 'eggs', 'spam']

Related questions
                            
                                Permission denied error by installing matplotlib
                            
                                Annoying message when opening windows from Python on OS X 10.8
                            
                                dict_items object has no attribute 'sort'
                            
                                Passing variables from Flask to JavaScript
                            
                                Python - requests.exceptions.SSLError - dh key too small
                            
                                How to dynamically set the queryset of a models.ModelChoiceField on a forms.Form subclass
                            
                                Is there a Python equivalent of the Haskell 'let'
                            
                                How to find cube root using Python? [duplicate]
                            
                                ValueError: cannot index with vector containing NA / NaN values
                            
                                Python - Trap all signals
                            
                                Generating pdf-latex with python script
                            
                                Recommended way to manage credentials with multiple AWS accounts?
                            
                                Insert variable into global namespace from within a function? [duplicate]
                            
                                Rounding down integers to nearest multiple
                            
                                How to get user permissions?
                            
                                Error with matplotlib.show() : module 'matplotlib' has no attribute 'show' [duplicate]
                            
                                Python: load words from file into a set
                            
                                Has anyone parsed Wiktionary? [closed]
                            
                                How to create a month iterator
                            
                                How to convert integer into date object python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regular expression to return all characters between two special characters

Tags:

python

regex

parsing

People also ask

Regex

Implementation

An Alternative

Recent Activity

Donate For Us