Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression for Parsing Hashtags in Java

Here is the Twitter message I am trying to parse (as you can see, some of these are not tags, just part of URLs):

#anothertag Arrogance and bad PR http://www.adobe.com/index.html#anchor1. John 
Nack on &#Adobe: Information about Photoshop© CS3 on Snow Leopard 
#fail #design</pre>

This regular expression is what I have so far, but it still picks up some the url tags:

[##]+([A-Za-z0-9-_]+)
like image 272
Daniel Dura Avatar asked Aug 27 '09 17:08

Daniel Dura


1 Answers

Isn't it ironic, as soon as I post this I find an answer. So if you are looking for a matching pattern to do this, the following seems to work:

(?:\s|\A)[##]+([A-Za-z0-9-_]+)

I am going to do a lot more testing with this to see if there are any edge cases that are outside the scope of this expression and will report back if I find any.

like image 195
Daniel Dura Avatar answered Sep 23 '22 15:09

Daniel Dura