Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing usernames with links in Javascript with regular expressions

I'm trying to match usernames within a string like:

"user: hi, has anyone seen user today user"

The cases to match:

  • substring is the first word trailing a space, in the middle surrounded by spaces or the last and leading a space
  • Following characters are allowed to trail the word but not returned as a result: ":;,"

The following matches all the cases but returns unwanted spaces and characters (I only want to replace usernames):

/(^(user)[\s|:|;|,])|(\s(user)[\s|:|;|,]?\s)|(\s(user))/gi

In the end I want to replace only username with links.

EDIT: Note that the username can't be matched if it's part of url or other string, except cases when special characters are trailing it.

like image 253
jorilallo Avatar asked Nov 17 '11 21:11

jorilallo


People also ask

Can RegEx replace characters?

RegEx makes replace ing strings in JavaScript more effective, powerful, and fun. You're not only restricted to exact characters but patterns and multiple replacements at once.

How do you match a name in RegEx?

p{L} => matches any kind of letter character from any language. p{N} => matches any kind of numeric character. *- => matches asterisk and hyphen. + => Quantifier — Matches between one to unlimited times (greedy)

What is replace (/ g in JavaScript?

The "g" that you are talking about at the end of your regular expression is called a "modifier". The "g" represents the "global modifier". This means that your replace will replace all copies of the matched string with the replacement string you provide.

What does RegEx (? S match?

i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.


2 Answers

Depending upon how transparent you want it to be to users (or what your eventual goal is), you may consider requiring someone to put a symbol (such as @) before a user name, so that they can elect whether or not to have a link to the user...

Aside from that, your expression has several potential errors: character classes (denoted by []) treat nearly all characters literally, including |, the entire alternation syntax makes the third alternation ((\s(user))) into something that will allow matches to userSmith or userJones and not just user - which is something I think you specifically want to disallow...

I think you are asking for something like this:

(^|\s)(user)(?=[:;,\s]|$)

this breaks down to:

(^|\s)      # either assert that this is the beginning, or capture a whitespace character; capture into back-reference #1
(user)      # capture the username 'user' exactly 
(?=         # look-ahead to verify that the following CAN be matched
   [:;,\s]  #    one character that is      :  ;  ,  <or whitespace>
   |        #     -OR-
   $        #    the end of the string
)           # end look-ahead

However, there are a few cases that you might want to consider. By not allowing several types of punctuation after the username, you will exclude results from strings like: Let me know if you see user., have you seen user? or I really like user! - the rejection for URLs should already be accomplished by requiring whitespace (or the beginning of the string) before user - not allowing such punctuation afterwards will reject some cases I think you will want to match. You could simply add in this extra punctuation:

(^|\s)(user\b)(?=[;:,.?!)"\s]|$)

But I would suggest something more like the following (removing the following-punctuation requirement):

(^|\s)(user\b)

I've put all three suggestions on jsFiddle, to show you what you get and allow you to put some of your own strings in.

Which ever way you prefer, these expressions would be used in a find-replace wherein you would replace the whitespace consumed before the user's name with itself in the replace expression:

source.replace(/(^|\s)(user\b)/gi, '$1<a href="/linkToProfile?n=$2">$2</a>')

Though I'm pretty sure I answered the question, please let me know if there are cases you specified that aren't covered!

like image 92
Code Jockey Avatar answered Sep 30 '22 16:09

Code Jockey


I think you are looking for \b which means "word boundary":

/\buser\b/gi

Edit after your comment:

You can easily add the required characters after your username with a lookahead:

/\buser(?=[:;,\s]|$)/gi

Unfortunately you can't do the same for restrictions on the characters before the username because Javascript doesn't support lookbehinds. But perhaps this is good enough for your needs?

If not, as a workaround you can capture the characters that must occur before the string and replace them with themselves.

like image 31
Mark Byers Avatar answered Sep 30 '22 15:09

Mark Byers