Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP preg_match not working for new line [duplicate]

Tags:

regex

php

I have this nice preg_match regex:

if(preg_match ("%^[A-Za-z0-9ążśźęćń󳥯ŚŹĘĆŃÓŁ\.\,\-\?\!\(\)\"\ \/\t\/\n]{2,50}$%", stripslashes(trim($_POST['x']))){...}

Which should allow all characters that could be used in and eventual text content of a post. Problem is, despite the \n it the functions still doesn't work for new lines in my post, so a syntax of

foo

bar

would not work. Does anybody know why the function would not work properly?

Any help would be gratefully appreciated.

like image 373
aln447 Avatar asked Mar 22 '16 14:03

aln447


1 Answers

By default a preg_match() with a pattern using ^ and $ will consider the whole string, even if it contains newlines.

This behaviour can be altered using Pattern Modifiers, of which I will list the ones that fit this topic:

  • s (PCRE_DOTALL): by default, the dot (.) will not match newlines, but by using the modifier s it will. However, character classes (e.g. [a-z] and [^a-z]) never treat the newline as a special character anyway, thus this modifier will not affect their behaviour like it will for the dot (.).

  • m (PCRE_MULTILINE): by default, the start (^) and end ($) anchors will by default match the start and end of the whole string that is subjected to pattern matching, even if that string contains newlines. However, when this modifier is used, the preg-function is allowed to consider each part of the string that is separated by newlines as a complete string, so "foo\nbar\nbar" will result in three matches (1: foo, 2: bar, 3: bar) when matched against the pattern /^[a-z]$/m, not just one (1: foo\nbar\bar) as when the m modifier is not used: /^[a-z]$/.

  • D (PCRE_DOLLAR_ENDONLY): by default, the end ($) anchor will not only match the very end of a string, but also right before a trailing newline (trailing meaning: at the very end of the string). To undo this behaviour and make it very stricly only match the string ending, use this pattern modifier.

YOUR PROBLEM:

if(preg_match("%^[A-Za-z0-9ążśźęćń󳥯ŚŹĘĆŃÓŁ\.\,\-\?\!\(\)\"\ \/\t\/\n]{2,50}$%m", stripslashes(trim($_POST['x']))){...}

I don't see much wrong with your pattern, except that it is not required that you escape characters other than \, -, ^ (only at the start of the character class) and ] (only when not at the start of the character class), but the PHP doc says it's not a violation to still do so.

It might be, though, that your text snippet contains newlines in the form of \r\n and since \r is not included in the character class of your pattern, it will not be matched.

Since my original post mentioned the use of the Patter Modifier m to which you replied that that worked, I wonder what really might have been the issue.

like image 178
klaar Avatar answered Nov 09 '22 08:11

klaar