Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between \A \z and ^ $ in Ruby regular expressions

Tags:

regex

ruby

People also ask

What is the difference between \b and \b in regular expression?

\B is the negated version of \b. \B matches at every position where \b does not. Effectively, \B matches at any position between two word characters as well as at any position between two non-word characters.

What is the difference between and * in regex?

* means zero-or-more, and + means one-or-more. So the difference is that the empty string would match the second expression but not the first.

What does =~ mean in Ruby regex?

=~ is Ruby's basic pattern-matching operator. When one operand is a regular expression and the other is a string then the regular expression is used as a pattern to match against the string. (This operator is equivalently defined by Regexp and String so the order of String and Regexp do not matter.

What kind of regex does Ruby use?

A regular expression is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings. Ruby regular expressions i.e. Ruby regex for short, helps us to find particular patterns inside a string. Two uses of ruby regex are Validation and Parsing.


If you're depending on the regular expression for validation, you always want to use \A and \z. ^ and $ will only match up until a newline character, which means they could use an email like [email protected]\n<script>dangerous_stuff();</script> and still have it validate, since the regex only sees everything before the \n.

My recommendation would just be completely stripping new lines from a username or email beforehand, since there's pretty much no legitimate reason for one. Then you can safely use EITHER \A \z or ^ $.


According to Pickaxe:

^ Matches the beginning of a line.

$ Matches the end of a line.

\A Matches the beginning of the string.

\z Matches the end of the string.

\Z Matches the end of the string unless the string ends with a "\n", in which case it matches just before the "\n".

So, use \A and lowercase \z. If you use \Z someone could sneak in a newline character. This is not dangerous I think, but might screw up algorithms that assume that there's no whitespace in the string. Depending on your regex and string-length constraints someone could use an invisible name with just a newline character.

JavaScript's implementation of Regex treats \A as a literal 'A' (ref). So watch yourself out there and test.


The start and end of a string may not necessarily be the same thing as the start and end of a line. Imagine if you used the following as your test string:

my
name
is
Andrew

Notice that the string has many lines in it - the ^ and $ characters allow you to match the beginning and end of those lines (basically treating the \n character as a delimeter) while \A and \Z allow you to match the beginning and end of the entire string.


Difference By Example

  1. /^foo$/ matches any of the following, /\Afoo\z/ does not:
whatever1
foo
whatever2
foo
whatever2
whatever1
foo
  1. /^foo$/ and /\Afoo\z/ all match the following:
foo