Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby Regex vs Python Regex

Tags:

python

regex

ruby

Are there any real differences between Ruby regex and Python regex?

I've been unable to find any differences in the two, but may have missed something.

like image 631
Tim O Avatar asked Apr 15 '11 02:04

Tim O


2 Answers

The last time I checked, they differed substantially in their Unicode support. Ruby in 1.9 at least has some very limited Unicode support. I believe one or two Unicode properties might be supported by now. Probably the general categories and maybe the scripts were the two I'm thinking of.

Python has less and more Unicode support at the same time. Python does seem to make it possible to meet the requirements of RL1.2a "Compatability Properties" from UTS#18 on Unicode Regular Expressions.

That said, there is a really rather nice Python library out there by Matthew Barnett (mrab) that finally adds a couple of Unicode properties to Python regexes. He supports the two most important ones: the general categories, and the script properties. It has some other intriguing features as well. It deserves some good publicity.

I don't think either of Ruby or Python support Unicode all that terribly well, although more and more gets done every day. In particular, however, neither meets even the barebones Level 1 requirement for Unicode Regular Expressions cited above. For example, RL1.2 requires that at least 11 properties be supported: General_Category, Script, Alphabetic, Uppercase, Lowercase, White_Space, Noncharacter_Code_Point, Default_Ignorable_Code_Point, ANY, ASCII, and ASSIGNED.

I think Python only lets you get to some of those, and only in a roundabout way. Of course, there are many, many other properties beyond these 11.

When you’re looking for Unicode support, there's more than just UTS#10 on Regular Expressions of course, although that is the one that matters most to this question and neither Ruby nor Puython are Level 1 compliant. Other very important aspects of Unicode include UAX#15, UAX#14, UTS#18, UAX#11, UAX#29, and of course the crucial UAX#44. Python has libraries for at least a couple of those, I know. I don't know that they're standard.

But when it comes to regular expression support, um, there are richer alternatives than just those two, you know. :)

like image 111
tchrist Avatar answered Oct 02 '22 23:10

tchrist


I like the /pattern/ syntax in Ruby, inspired from Perl, for regular expressions. Python's re.compile("pattern") is not really elegant for me. The syntatic sugar in Ruby and the fact that regular expressions are a separate re module in Python, makes me lean towards Ruby when it comes to Regular Expressions.

Apart from this, I don't see much of a difference from a normal Regular Expression programming perspective. Both the languages have pretty comprehensive and mostly similar RE support. There might be performance differences ( Python traditionally has has better performance ) and also Python has greater unicode regular expressions support.

like image 21
manojlds Avatar answered Oct 02 '22 22:10

manojlds