Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python comparing string against several regular expressions

I'm pretty experienced with Perl and Ruby but new to Python so I'm hoping someone can show me the Pythonic way to accomplish the following task. I want to compare several lines against multiple regular expressions and retrieve the matching group. In Ruby it would be something like this:

# Revised to show variance in regex and related action.
data, foo, bar = [], nil, nil
input_lines.each do |line|
  if line =~ /Foo(\d+)/
    foo = $1.to_i
  elsif line =~ /Bar=(.*)$/
    bar = $1
  elsif bar
    data.push(line.to_f)
  end
end

My attempts in Python are turning out pretty ugly because the matching group is returned from a call to match/search on a regular expression and Python has no assignment in conditionals or switch statements. What's the Pythonic way to do (or think!) about this problem?

like image 449
maerics Avatar asked Apr 13 '10 22:04

maerics


People also ask

Can you use == to compare strings in Python?

Python comparison operators can be used to compare strings in Python. These operators are: equal to ( == ), not equal to ( != ), greater than ( > ), less than ( < ), less than or equal to ( <= ), and greater than or equal to ( >= ).

How do you compare two strings of different lengths in Python?

Python string comparison is performed using the characters in both strings. The characters in both strings are compared one by one. When different characters are found then their Unicode value is compared. The character with lower Unicode value is considered to be smaller.

Does Python allow comparison of 2 string values?

Python String Comparison operators In python language, we can compare two strings such as identify whether the two strings are equivalent to each other or not, or even which string is greater or smaller than each other.


2 Answers

Something like this, but prettier:

regexs = [re.compile('...'), ...]

for regex in regexes:
  m = regex.match(s)
  if m:
    print m.groups()
    break
else:
  print 'No match'
like image 175
Ignacio Vazquez-Abrams Avatar answered Sep 28 '22 13:09

Ignacio Vazquez-Abrams


There are several ways to "bind a name on the fly" in Python, such as my old recipe for "assign and test"; in this case I'd probably choose another such way (assuming Python 2.6, needs some slight changes if you're working with an old version of Python), something like:

import re
pats_marks = (r'^A:(.*)$', 'FOO'), (r'^B:(.*)$', 'BAR')
for line in lines:
    mo, m = next(((mo, m) for p, m in pats_mark for mo in [re.match(p, line)] if mo),
                 (None, None))
    if mo: print '%s: %s' % (m, mo.group(1))
    else: print 'NO MATCH: %s' % line

Many minor details can be adjusted, of course (for example, I just chose (.*) rather than (.*?) as the matching group -- they're equivalent given the immediately-following $ so I chose the shorter form;-) -- you could precompile the REs, factor things out differently than the pats_mark tuple (e.g., with a dict indexed by RE patterns), etc.

But the substantial ideas, I think, are to make the structure data-driven, and to bind the match object to a name on the fly with the subexpression for mo in [re.match(p, line)], a "loop" over a single-item list (genexps bind names only by loop, not by assignment -- some consider using this part of genexps' specs to be "tricky", but I consider it a perfectly acceptable Python idiom, esp. since it was considered back in the time when listcomps, genexps' "ancestors" in a sense, were being designed).

like image 25
Alex Martelli Avatar answered Sep 28 '22 13:09

Alex Martelli