Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for a valid hashtag

I need regular expression for validating a hashtag. Each hashtag should starts with hashtag("#").

Valid inputs:

1. #hashtag_abc

2. #simpleHashtag

3. #hashtag123

Invalid inputs:

1. #hashtag#

2. #hashtag@hashtag

I have been trying with this regex /#[a-zA-z0-9]/ but it is accepting invalid inputs also.

Any suggestions for how to do it?

like image 720
Ashok Avatar asked Feb 06 '17 10:02

Ashok


People also ask

What does hashtag mean in regex?

# does not have any special meaning in a regex, unless you use it as the delimiter. So just put it straight in and it should work. Note that \b detects a word boundary, and in #abc , the word boundary is after the # and before the abc . Therefore, you need to use the \b is superfluous and you just need #\w\w+ .

What is regex in validation?

The Validation (Regex) property helps you define a set of validation options for a given field. In general, this field property is used to perform validation checks (format, length, etc.) on the value that the user enters in a field. If the user enters a value that does not pass these checks, it will throw an error.


2 Answers

The current accepted answer fails in a few places:

  • It accepts hashtags that have no letters in them (i.e. "#11111", "#___" both pass).
  • It will exclude hashtags that are separated by spaces ("hey there #friend" fails to match "#friend").
  • It doesn't allow you to place a min/max length on the hashtag.
  • It doesn't offer a lot of flexibility if you decide to add other symbols/characters to your valid input list.

Try the following regex:

/(^|\B)#(?![0-9_]+\b)([a-zA-Z0-9_]{1,30})(\b|\r)/g

It'll close up the above edge cases, and furthermore:

  • You can change {1,30} to your desired min/max
  • You can add other symbols to the [0-9_] and [a-zA-Z0-9_] blocks if you wish to later

Here's a link to the demo.

like image 121
Nate Kimball Avatar answered Sep 17 '22 22:09

Nate Kimball


To answer the current question...

There are 2 issues:

  • [A-z] allows more than just letter chars ([, , ], ^, _, ` )
  • There is no quantifier after the character class and it only matches 1 char

Since you are validating the whole string, you also need anchors (^ and $)to ensure a full string match:

/^#\w+$/

See the regex demo.

If you want to extract specific valid hashtags from longer texts...

This is a bonus section as a lot of people seek to extract (not validate) hashtags, so here are a couple of solutions for you. Just mind that \w in JavaScript (and a lot of other regex libraries) equal to [a-zA-Z0-9_]:

  • #\w{1,30}\b - a # char followed with one to thirty word chars followed with a word boundary
  • \B#\w{1,30}\b - a # char that is either at the start of string or right after a non-word char, then one to thirty word (i.e. letter, digit, or underscore) chars followed with one to thirty word chars followed with a word boundary
  • \B#(?![\d_]+\b)(\w{1,30})\b - # that is either at the start of string or right after a non-word char, then one to thirty word (i.e. letter, digit, or underscore) chars (that cannot be just digits/underscores) followed with a word boundary

And last but not least, here is a Twitter hashtag regex from https://github.com/twitter/twitter-text/tree/master/js... Sorry, too long to paste in the SO post, here it is: https://gist.github.com/stribizhev/715ee1ee2dc1439ffd464d81d22f80d1.

like image 35
Wiktor Stribiżew Avatar answered Sep 21 '22 22:09

Wiktor Stribiżew