I am using PHP preg_match_all, and this is what I can get so far....
[A-Za-z+\W]+\s[\d]
The only problem is that I need the \W to not be a "
.
So I have tried:
[A-Za-z+[^\dA-Za-z"]\s?]+\s[\d]
[A-Za-z+]\s?+[^A-Za-z\d"]?\s[\d]
among other things, and it is just failing and I really can't figure out why.
Here is the entire RegEx;
([A-Z][a-z]+\s){1,5}\s?[^a-zA-Z\d\s:,.\'\"]\s?
[A-Za-z+\W]+\s[\d]{1,2}\s[A-Z][a-z]+\s[\d]{4}
I split it into two line, the second line begins with what I posted.
Patterns trying to match:
India – Adulterated Tea Powder Seized 18 April 2011
India – Importer of Haldiram’s Petha Sweet Cubes Issuing Voluntary Recall 26 April 2011
India – Undeclared Gluten Found in Sweets by Canadian Authorities 27 April 2011
India – Adulteration Found in Edible Oils 28 April 2011
India – Viral Disease Affects Chili Crop in Goa 28 April 2011
NOT ----> Chili – India: Goa”. 8 April 2011
Ivory Coast – Potential Cocoa Quality Decline despite Sufficient Surplus 11 April 2011
Japan – Sanuki Kanzume Co. and Failure to Comply with FDA Standards 27 April 2011
Madagascar – Toxic Sardines 14 April 2011
Madagascar – Update: Toxic Sardines 26 April 2011
the pattern you are showing matches all letters and non word characters. The only thing not included in the pattern are numbers and you also want to not match the double quote.
[^\d\"_]+\s\d
Edit:
I could be wrong, but from the sample input, it appears you are just trying to match all lines that don't have a double quote. If so something like this is much easier and I've even grouped the date separate from the rest of the string. If you don't need to group the sting/date then just remove all the parenthesis.
<?php
error_reporting(E_ALL);
$str = " India - Adulterated Tea Powder Seized 18 April 2011
India - Importer of Haldiram’s Petha Sweet Cubes Issuing Voluntary Recall 26 April 2011
India - Undeclared Gluten Found in Sweets by Canadian Authorities 27 April 2011
India - Adulteration Found in Edible Oils 28 April 2011
India - Viral Disease Affects Chili Crop in Goa 28 April 2011
Chili - India: Goa\". 8 April 2011
Ivory Coast - Potential Cocoa Quality Decline despite Sufficient Surplus 11 April 2011
Japan - Sanuki Kanzume Co. and Failure to Comply with FDA Standards 27 April 2011
Madagascar - Toxic Sardines 14 April 2011
Madagascar - Update: Toxic Sardines 26 April 2011";
preg_match_all("/^([^\"]+?)(\d?\d\s[a-z]+\s\d{4})$/im", $str, $m);
echo '<pre>'.print_r($m, true).'</pre>';
?>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With