I'm trying to find a regex that matches if a Tweet it's a true mention. To be a mention, the string can't start with "@" and can't contain "RT" (case insensitive) and "@" must start the word.
In the examples I commented the desired output
Some examples:
function search($strings, $regexp) {
$regexp;
foreach ($strings as $string) {
echo "Sentence: \"$string\" <- " .
(preg_match($regexp, $string) ? "MATCH" : "NO MATCH") . "\n";
}
}
$strings = array(
"Hi @peter, I like your car ", // <- MATCH
"@peter I don't think so!", //<- NO MATCH: the string it's starting with @ it's a reply
"Helo!! :@ how are you!", // NO MATCH <- it's not a word, we need @(word)
"Yes @peter i'll eat them this evening! RT @peter: hey @you, do you want your pancakes?", // <- NO MATCH "RT/rt" on the string , it's a RT
"Helo!! [email protected] how are you!", //<- NO MATCH, it doesn't start with @
"@peter is the best friend you could imagine. RT @juliet: @you do you know if @peter it's awesome?" // <- NO MATCH starting with @ it's a reply and RT
);
echo "Example 1:\n";
search($strings, "/(?:[[:space:]]|^)@/i");
Current output:
Example 1:
Sentence: "Hi @peter, I like your car " <- MATCH
Sentence: "@peter I don't think so!" <- MATCH
Sentence: "Helo!! :@ how are you!" <- NO MATCH
Sentence: "Yes @peter i'll eat them this evening! RT @peter: hey @you, do you want your pancakes?" <- MATCH
Sentence: "Helo!! [email protected] how are you!" <- MATCH
Sentence: "@peter is the best friend you could imagine. RT @juliet: @you do you know if @peter it's awesome?" <- MATCH
EDIT:
I need it in regex beacause it can be used on MySQL and anothers languages too. Im am not looking for any username. I only want to know if the string it's a mention or not.
Twitter unfortunately doesn't support searching of tweets using regular expressions which means that you do have to post process. There's not actually any official documentation from Twitter to that effect, but everyone who uses the Twitter search API post-processes their tweets using regex (including me).
To validate a RegExp just run it against null (no need to know the data you want to test against upfront). If it returns explicit false ( === false ), it's broken. Otherwise it's valid though it need not match anything.
# does not have any special meaning in a regex, unless you use it as the delimiter. So just put it straight in and it should work. Note that \b detects a word boundary, and in #abc , the word boundary is after the # and before the abc . Therefore, you need to use the \b is superfluous and you just need #\w\w+ .
\w (word character) matches any single letter, number or underscore (same as [a-zA-Z0-9_] ). The uppercase counterpart \W (non-word-character) matches any single character that doesn't match by \w (same as [^a-zA-Z0-9_] ). In regex, the uppercase metacharacter is always the inverse of the lowercase counterpart.
This regexp might work a bit better: /\B\@([\w\-]+)/gim
Here's a jsFiddle example of it in action: http://jsfiddle.net/2TQsx/96/
Here's a regex that should work:
/^(?!.*\bRT\b)(?:.+\s)?@\w+/i
Explanation:
/^ //start of the string
(?!.*\bRT\b) //Verify that rt is not in the string.
(?:.*\s)? //Find optional chars and whitespace the
//Note: (?: ) makes the group non-capturing.
@\w+ //Find @ followed by one or more word chars.
/i //Make it case insensitive.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With