Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex Valid Twitter Mention

I'm trying to find a regex that matches if a Tweet it's a true mention. To be a mention, the string can't start with "@" and can't contain "RT" (case insensitive) and "@" must start the word.

In the examples I commented the desired output

Some examples:

function search($strings, $regexp) {
    $regexp;
    foreach ($strings as $string) {
        echo "Sentence: \"$string\" <- " .
        (preg_match($regexp, $string) ? "MATCH" : "NO MATCH") . "\n";
    }
}

$strings = array(
"Hi @peter, I like your car ", // <- MATCH
"@peter I don't think so!", //<- NO MATCH: the string it's starting with @ it's a reply
"Helo!! :@ how are you!", // NO MATCH <- it's not a word, we need @(word) 
"Yes @peter i'll eat them this evening! RT @peter: hey @you, do you want your pancakes?", // <- NO MATCH "RT/rt" on the string , it's a RT
"Helo!! [email protected] how are you!", //<- NO MATCH, it doesn't start with @
"@peter is the best friend you could imagine. RT @juliet: @you do you know if @peter it's awesome?" // <- NO MATCH starting with @ it's a reply and RT
);
echo "Example 1:\n";
search($strings,  "/(?:[[:space:]]|^)@/i");

Current output:

Example 1:
Sentence: "Hi @peter, I like your car " <- MATCH
Sentence: "@peter I don't think so!" <- MATCH
Sentence: "Helo!! :@ how are you!" <- NO MATCH
Sentence: "Yes @peter i'll eat them this evening! RT @peter: hey @you, do you want your pancakes?" <- MATCH
Sentence: "Helo!! [email protected] how are you!" <- MATCH
Sentence: "@peter is the best friend you could imagine. RT @juliet: @you do you know if @peter it's awesome?" <- MATCH

EDIT:

I need it in regex beacause it can be used on MySQL and anothers languages too. Im am not looking for any username. I only want to know if the string it's a mention or not.

like image 879
LDK Avatar asked Aug 22 '11 16:08

LDK


People also ask

Can you use regex in Twitter search?

Twitter unfortunately doesn't support searching of tweets using regular expressions which means that you do have to post process. There's not actually any official documentation from Twitter to that effect, but everyone who uses the Twitter search API post-processes their tweets using regex (including me).

How do you validate expressions in regex?

To validate a RegExp just run it against null (no need to know the data you want to test against upfront). If it returns explicit false ( === false ), it's broken. Otherwise it's valid though it need not match anything.

What does hashtag mean in regex?

# does not have any special meaning in a regex, unless you use it as the delimiter. So just put it straight in and it should work. Note that \b detects a word boundary, and in #abc , the word boundary is after the # and before the abc . Therefore, you need to use the \b is superfluous and you just need #\w\w+ .

What counts as a word character regex?

\w (word character) matches any single letter, number or underscore (same as [a-zA-Z0-9_] ). The uppercase counterpart \W (non-word-character) matches any single character that doesn't match by \w (same as [^a-zA-Z0-9_] ). In regex, the uppercase metacharacter is always the inverse of the lowercase counterpart.


2 Answers

This regexp might work a bit better: /\B\@([\w\-]+)/gim

Here's a jsFiddle example of it in action: http://jsfiddle.net/2TQsx/96/

like image 175
csuwldcat Avatar answered Oct 22 '22 08:10

csuwldcat


Here's a regex that should work:

/^(?!.*\bRT\b)(?:.+\s)?@\w+/i

Explanation:

/^             //start of the string
(?!.*\bRT\b)   //Verify that rt is not in the string.
(?:.*\s)?      //Find optional chars and whitespace the
                  //Note: (?: ) makes the group non-capturing.
@\w+           //Find @ followed by one or more word chars.
/i             //Make it case insensitive.
like image 31
Jacob Eggers Avatar answered Oct 22 '22 08:10

Jacob Eggers