Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Regex to find a string in double quotes within a string

Tags:

python

regex

I'm looking for a code in python using regex that can perform something like this

Input: Regex should return "String 1" or "String 2" or "String3"

Output: String 1,String2,String3

I tried r'"*"'

like image 818
nomi Avatar asked Mar 01 '12 16:03

nomi


People also ask

How do you find the string between double quotes?

Use the re. findall() method to extract strings between quotes, e.g. my_list = re. findall(r'"([^"]*)"', my_str) .

How do you check if a string has a double quote in Python?

The find(substring) function will return -1 if substring is not found. If substring was found, then let's find the index of the second quote. If that second quote is found, then lets print out the start and end indices of the quotes. Show activity on this post.

How do you match double quotes in regex?

Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.


2 Answers

Here's all you need to do:

def doit(text):         import re   matches = re.findall(r'"(.+?)"',text)   # matches is now ['String 1', 'String 2', 'String3']   return ",".join(matches)  doit('Regex should return "String 1" or "String 2" or "String3" ') 

result:

'String 1,String 2,String3' 

As pointed out by Li-aung Yip:

To elaborate, .+? is the "non-greedy" version of .+. It makes the regular expression match the smallest number of characters it can instead of the most characters it can. The greedy version, .+, will give String 1" or "String 2" or "String 3; the non-greedy version .+? gives String 1, String 2, String 3.

In addition, if you want to accept empty strings, change .+ to .*. Star * means zero or more while plus + means at least one.

like image 160
Johan Lundberg Avatar answered Sep 22 '22 19:09

Johan Lundberg


The highly up-voted answer doesn't account for the possibility that the double-quoted string might contain one or more double-quote characters (properly escaped, of course). To handle this situation, the regex needs to accumulate characters one-by-one with a positive lookahead assertion stating that the current character is not a double-quote character that is not preceded by a backslash (which requires a negative lookbehind assertion):

"(?:(?:(?!(?<!\\)").)*)" 

See Regex Demo

import re import ast   def doit(text):     matches=re.findall(r'"(?:(?:(?!(?<!\\)").)*)"',text)     for match in matches:         print(match, '=>', ast.literal_eval(match))   doit('Regex should return "String 1" or "String 2" or "String3" and "\\"double quoted string\\"" ') 

Prints:

"String 1" => String 1 "String 2" => String 2 "String3" => String3 "\"double quoted string\"" => "double quoted string" 
like image 24
Booboo Avatar answered Sep 19 '22 19:09

Booboo