Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Regex to catch text until first occurrence of certain character

Tags:

python

regex

Consider I have the following text: Temp:C5E501374D0343090957F7E5929E765C931F7D3EC7A96189FDA88549D54D9E4E5DB3FC1C2, adfsafd1242412,

And I want to catch all the data after Temp: and until first occurrence of , which means: C5E501374D0343090957F7E5929E765C931F7D3EC7A96189FDA88549D54D9E4E5DB3FC1C2

I tried using regex Temp:(.+,) without success
How do I tell the regex that , should be the first found?

like image 797
JavaSa Avatar asked Aug 18 '15 20:08

JavaSa


People also ask

How do I match a specific character in regex?

Match any specific character in a setUse square brackets [] to match any characters in a set. Use \w to match any single alphanumeric character: 0-9 , a-z , A-Z , and _ (underscore). Use \d to match any single digit. Use \s to match any single whitespace character.

What does ?= * Mean in regex?

. means match any character in regular expressions. * means zero or more occurrences of the SINGLE regex preceding it.

What does \b mean in regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.

How do you use [^] in regex?

The [^;] is a character class, it matches everything but a semicolon. You can specify a character class, by enclosing a list of characters in [] , which will match any character from the list. If the first character after the " [" is "^", the class matches any character not in the list. This should work in most regex dialects.

How to get the first occurrence of a string?

this is not a regex solution, but something simple enough for your problem description. Just split your string and get the first item from your array. This will match up to the first occurrence only in each string and will ignore subsequent occurrences.

How to make regex match as little as possible?

By default regex will match as much as it can (greedy) Simply add a ? and it will be non-greedy and match as little as possible! Good luck, hope that helps. Thanks for contributing an answer to Stack Overflow!

Do you think a regex is going to be used repeatedly?

I tend to assume that a regex is going to be used repeatedly. In theory the objects are cached internally by re and this isn't a requirement, but I favor explicit solutions from implicit ones and I find keeping RegexObjects to be more readable.


2 Answers

To capture the value you need, you could try and use lazy matching dot (.+? matches 1 or more characters - but as few as possible - that are any characters but a newline):

Temp:(.+?),

Since lazy matching might eat up more than you need, a negated character class ([^,]+ matches 1 or more characters other than a comma) looks preferable:

Temp:([^,]+)

The result is captured into Group 1 with the capturing group (parentheses).

IDEONE sample code:

import re
p = re.compile(r'Temp:([^,]+)')
test_str = "Temp:C5E501374D0343090957F7E5929E765C931F7D3EC7A96189FDA88549D54D9E4E5DB3FC1C2, adfsafd1242412,"
print (re.search(p, test_str).group(1))

Output: C5E501374D0343090957F7E5929E765C931F7D3EC7A96189FDA88549D54D9E4E5DB3FC1C2

NOTE that a look-around based solution is more resource-consuming that the capturing group one that you and I are using.

like image 144
Wiktor Stribiżew Avatar answered Nov 14 '22 22:11

Wiktor Stribiżew


You can use this lookbehind based regex:

(?<=Temp:)[^,]+

RegEx Demo

Code:

s='Temp:C5E501374D0343090957F7E5929E765C931F7D3EC7A96189FDA88549D54D9E4E5DB3FC1C2, adfsafd1242412,'
print re.search(r"(?<=Temp:)[^,]+", s).group()

Output:

C5E501374D0343090957F7E5929E765C931F7D3EC7A96189FDA88549D54D9E4E5DB3FC1C2
like image 44
anubhava Avatar answered Nov 14 '22 21:11

anubhava