Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing hashtags in twitter API PHP

I want to parse hashtags from the tweets I'm retrieving from twitter. Now, I didn't find anything available in the api. So, I'm parsing it on my own using php. I've tried several things.

<?php
$subject = "This is a simple #hashtag";
$pattern = "#\S*\w";
preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>

I've also tried

$pattern = "/[#]"."[A-Za-z0-9-_]"."/g";

But then it shows /g isn't recognized by php. I've been trying to do this for quite a long time now but am not being able to do this. So please help.

P.S. : I've a very little idea about Regular Experssions.

like image 496
Shivam Mangla Avatar asked Jun 02 '26 08:06

Shivam Mangla


1 Answers

You need to consider where a hashtag might appear. There are three cases:

  • at the beginning of a tweet,
  • after whitespace,
  • in the middle of a word - this must not be counted as a hashtag.

So this will match them correctly:

'/(^|\s)\#\w+/'

Explanation:

  • ^ can be used in OR statements
  • \s is used to catch spaces, tabs and new lines

Here is the complete code:

<?php
$subject = "#hashtag This is a simple #hashtag hello world #hastag2 last string not-a-hash-tag#hashtag3 and yet not -#hashtag";
$pattern = "/(?:^|\s)(\#\w+)/";
preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>
like image 53
Haralan Dobrev Avatar answered Jun 03 '26 20:06

Haralan Dobrev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!