Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pattern.DOTALL with String.replaceAll

Tags:

I have a multiline HTML document that I am trying to get some stuff from. I'm using java's regex (I know - XML parsers bla bla bla, just bear with me here please :) ).

    dfahfadhadaaaa<object classid="java:com.sun.java.help.impl.JHSecondaryViewer" width="14" height="14"> <param name="content" value="../Glossary/glInterlinkedTask.html">  <param name="text" value="interlinked task"> <param name="viewerActivator" value="javax.help.LinkLabel"> <param name="viewerStyle" value="javax.help.Popup"> <param name="viewerSize" value="390,340"> <param name="textFontFamily" value="SansSerif"> <param name="textFontWeight" value="plain"> <param name="textFontStyle" value="italic"> <param name="textFontSize" value="12pt"> <param name="textColor" value="blue">  <param name=iconByID" value=""> </object> sjtsjsrjrsjsrjsrj 

I've got this HTML in a string: input.

    input = input.replaceAll("<object classid=\"java:com.sun.java.help.impl.JHSecondaryViewer.*?object>", "buh bye!"); 

Obviously, it's not working. HOWEVER, I can get a pattern match if I use pattern.compile with Pattern.DOTALL.

So, my question is - how can I do something like Pattern.DOTALL with string.replaceall?

like image 850
arcologies Avatar asked Jun 27 '11 23:06

arcologies


People also ask

What is replaceAll \\ s in Java?

Java String replaceAll() The replaceAll() method replaces each substring that matches the regex of the string with the specified text.

What is pattern Dotall?

DOTALL is a static constant defined in the Pattern class. To enable dotall mode, we create an instance of the Pattern class using the compile() method, and pass the regex and Pattern. DOTALL constant.

Is replaceAll a regex?

The replaceAll() method returns a new string with all matches of a pattern replaced by a replacement . The pattern can be a string or a RegExp , and the replacement can be a string or a function to be called for each match.


1 Answers

Attach (?s) to the front of your pattern :

input = input.replaceAll("(?s)<object classid=\"java:com\\.sun\\.java\\.help\\.impl\\.JHSecondaryViewer.*?object>", "buh bye!"); 

From the Javadoc:

Dotall mode can also be enabled via the embedded flag expression (?s). (The s is a mnemonic for "single-line" mode, which is what this is called in Perl.)

Other flags work this way as well

Special constructs (non-capturing)

...

(?idmsux-idmsux) Nothing, but turns match flags i d m s u x on - off

On a side note, if your goal is to remove unsafe objects from HTML from an untrusted source, please don't use regular expressions, and please don't blacklist tags.

like image 197
Mike Samuel Avatar answered Oct 15 '22 10:10

Mike Samuel