is it possible to use replaceAll() with wildcards

Question

Good morning. I realize there are a ton of questions out there regarding replace and replaceAll() but i havnt seen this.

What im looking to do is parse a string (which contains valid html to a point) then after I see the second instance of <p> in the string i want to remove everything that starts with & and ends with ; until i see the next </p>

To do the second part I was hoping to use something along the lines of s.replaceAll("&*;","")

That doesnt work but hopefully it gets my point across that I am looking to replace anything that starts with & and ends with ;

Brian · Accepted Answer

You should probably leave the parsing to a DOM parser (see this question). I can almost guarantee you'll have to do this to find text within the <p> tags.

For the replacement logic, String.replaceAll uses regular expressions, which can do the matching you want.

The "wildcard" in regular expressions that you want is the .* expression. Using your example:

String ampStr = "This &escape;String";
String removed = ampStr.replaceAll("&.*;", "");
System.out.println(removed);

This outputs This String. This is because the . represents any character, and the * means "this character 0 or more times." So .* basically means "any number of characters." However, feeding it:

"This &escape;String &anotherescape;Extended"

will probably not do what you want, and it will output This Extended. To fix this, you specify exactly what you want to look for instead of the . character. This is done using [^;], which means "any character that's not a semicolon:

String removed = ampStr.replaceAll("&[^;]*;", "");

This has performance benefits over &.*?; for non-matching strings, so I highly recommend using this version, especially since not all HTML files will contain a &abc; token and the &.*?; version can have huge performance bottle-necks as a result.

Jon Lin · Answer

The expression you want is:

s.replaceAll("&.*?;","");

But do you really want to be parsing HTML this way? You may be better off using an XML parser.

is it possible to use replaceAll() with wildcards

Tags:

java

html

string

Deslyxia

2 Answers

Brian

Jon Lin

Recent Activity

Donate For Us

is it possible to use replaceAll() with wildcards

Tags:

java

html

string

Deslyxia

2 Answers

Brian

Jon Lin

Related questions

Recent Activity

Donate For Us