Good morning. I realize there are a ton of questions out there regarding replace and replaceAll()
but i havnt seen this.
What im looking to do is parse a string (which contains valid html to a point) then after I see the second instance of <p>
in the string i want to remove everything that starts with & and ends with ; until i see the next </p>
To do the second part I was hoping to use something along the lines of s.replaceAll("&*;","")
That doesnt work but hopefully it gets my point across that I am looking to replace anything that starts with & and ends with ;
You should probably leave the parsing to a DOM parser (see this question). I can almost guarantee you'll have to do this to find text within the <p>
tags.
For the replacement logic, String.replaceAll
uses regular expressions, which can do the matching you want.
The "wildcard" in regular expressions that you want is the .*
expression. Using your example:
String ampStr = "This &escape;String";
String removed = ampStr.replaceAll("&.*;", "");
System.out.println(removed);
This outputs This String
. This is because the .
represents any character, and the *
means "this character 0 or more times." So .*
basically means "any number of characters." However, feeding it:
"This &escape;String &anotherescape;Extended"
will probably not do what you want, and it will output This Extended
. To fix this, you specify exactly what you want to look for instead of the .
character. This is done using [^;]
, which means "any character that's not a semicolon:
String removed = ampStr.replaceAll("&[^;]*;", "");
This has performance benefits over &.*?;
for non-matching strings, so I highly recommend using this version, especially since not all HTML files will contain a &abc;
token and the &.*?;
version can have huge performance bottle-necks as a result.
The expression you want is:
s.replaceAll("&.*?;","");
But do you really want to be parsing HTML this way? You may be better off using an XML parser.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With