Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regexp to search/replace only text, not in HTML attribute

I'm using JavaScript to do some regular expression. Considering I'm working with well-formed source, and I want to remove any space before[,.] and keep only one space after [,.], except that [,.] is part of a number. Thus I use:

text = text.replace(/ *(,|\.) *([^ 0-9])/g, '$1 $2');

The problem is that this replaces also text in the html tag attributes. For example my text is (always wrapped with a tag):

<p>Test,and test . Again <img src="xyz.jpg"> ...</p>

Now it adds a space like this src="xyz. jpg" that is not expected. How can I rewrite my regular expression? What I want is

<p>Test, and test. Again <img src="xyz.jpg"> ...</p>

Thanks!

like image 387
jcisio Avatar asked Aug 11 '10 15:08

jcisio


People also ask

Can I use regex in replace?

How to use RegEx with . replace in JavaScript. To use RegEx, the first argument of replace will be replaced with regex syntax, for example /regex/ . This syntax serves as a pattern where any parts of the string that match it will be replaced with the new substring.

What is $1 in regex replace?

For example, the replacement pattern $1 indicates that the matched substring is to be replaced by the first captured group. For more information about numbered capturing groups, see Grouping Constructs.


1 Answers

You can use a lookahead to make sure the match isn't occurring inside a tag:

text = text.replace(/(?![^<>]*>) *([.,]) *([^ \d])/g, '$1 $2');

The usual warnings apply regarding CDATA sections, SGML comments, SCRIPT elements, and angle brackets in attribute values. But I suspect your real problems will arise from the vagaries of "plain" text; HTML's not even in the same league. :D

like image 107
Alan Moore Avatar answered Sep 23 '22 18:09

Alan Moore