Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to exclude a symbol within [ ] with RegEx

I am using PHP preg_match_all, and this is what I can get so far....

[A-Za-z+\W]+\s[\d]

The only problem is that I need the \W to not be a ".

So I have tried:

[A-Za-z+[^\dA-Za-z"]\s?]+\s[\d]


[A-Za-z+]\s?+[^A-Za-z\d"]?\s[\d]

among other things, and it is just failing and I really can't figure out why.

EDIT:

Here is the entire RegEx;

([A-Z][a-z]+\s){1,5}\s?[^a-zA-Z\d\s:,.\'\"]\s?
[A-Za-z+\W]+\s[\d]{1,2}\s[A-Z][a-z]+\s[\d]{4}

I split it into two line, the second line begins with what I posted.

Patterns trying to match:

    India – Adulterated Tea Powder Seized 18 April 2011
    India – Importer of Haldiram’s Petha Sweet Cubes Issuing Voluntary Recall 26 April 2011
    India – Undeclared Gluten Found in Sweets by Canadian Authorities 27 April 2011
    India – Adulteration Found in Edible Oils 28 April 2011
    India – Viral Disease Affects Chili Crop in Goa 28 April 2011
NOT ---->   Chili – India: Goa”. 8 April 2011
    Ivory Coast – Potential Cocoa Quality Decline despite Sufficient Surplus 11 April 2011
    Japan – Sanuki Kanzume Co. and Failure to Comply with FDA Standards 27 April 2011
    Madagascar – Toxic Sardines 14 April 2011
    Madagascar – Update: Toxic Sardines 26 April 2011
like image 478
Ryan Ward Avatar asked Oct 11 '22 00:10

Ryan Ward


1 Answers

the pattern you are showing matches all letters and non word characters. The only thing not included in the pattern are numbers and you also want to not match the double quote.

[^\d\"_]+\s\d

Edit:

I could be wrong, but from the sample input, it appears you are just trying to match all lines that don't have a double quote. If so something like this is much easier and I've even grouped the date separate from the rest of the string. If you don't need to group the sting/date then just remove all the parenthesis.

<?php
error_reporting(E_ALL);
$str = "    India - Adulterated Tea Powder Seized 18 April 2011
    India - Importer of Haldiram’s Petha Sweet Cubes Issuing Voluntary Recall 26 April 2011
    India - Undeclared Gluten Found in Sweets by Canadian Authorities 27 April 2011
    India - Adulteration Found in Edible Oils 28 April 2011
    India - Viral Disease Affects Chili Crop in Goa 28 April 2011
    Chili - India: Goa\". 8 April 2011
    Ivory Coast - Potential Cocoa Quality Decline despite Sufficient Surplus 11 April 2011
    Japan - Sanuki Kanzume Co. and Failure to Comply with FDA Standards 27 April 2011
    Madagascar - Toxic Sardines 14 April 2011
    Madagascar - Update: Toxic Sardines 26 April 2011";
preg_match_all("/^([^\"]+?)(\d?\d\s[a-z]+\s\d{4})$/im", $str, $m);
echo '<pre>'.print_r($m, true).'</pre>';
?>
like image 195
Jonathan Kuhn Avatar answered Oct 18 '22 10:10

Jonathan Kuhn