Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract usernames out of Tweets?

Tags:

regex

twitter

I have the following example tweet:

RT @user1: who are @thing and @user2?

I only want to have user1, thing and user2.

What regular expression can I use to extract those three names?

PS: A username must only contain letters, numbers and underscores.

like image 606
caw Avatar asked Apr 11 '09 18:04

caw


Video Answer


1 Answers

Tested:

/@([a-z0-9_]+)/i

In Ruby (irb):

>> "RT @user1: who are @thing and @user2?".scan(/@([a-z0-9_]+)/i)
=> [["user1"], ["thing"], ["user2"]]

In Python:

>>> import re
>>> re.findall("@([a-z0-9_]+)", "RT @user1: who are @thing and @user2?", re.I)
['user1', 'thing', 'user2']

In PHP:

<?PHP
$matches = array();
preg_match_all(
    "/@([a-z0-9_]+)/i",
    "RT @user1: who are @thing and @user2?",
    $matches);

print_r($matches[1]);
?>

Array
(
    [0] => user1
    [1] => thing
    [2] => user2
)
like image 156
Stefan Gehrig Avatar answered Nov 05 '22 20:11

Stefan Gehrig