Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java, escaping (using) quotes in a regex

I'm trying to use the following regex in Java, that's supposed to match any lang="2-char-lang-name":

String lang = "lang=\"" + L.detectLang(inputText) +"\"";
shovel.replaceFirst("lang=\"[..]\"", lang);

I know that a single slash would be interpreted by regex as a slash and not an escape character (so my code doesn't work), but if I escape the slash, the " won't be escaped any more and I'd get a syntax error.

In other words, how can I include a " in the regex? "lang=\\"[..]\\"" won't work. I've also tried three slashes and that didn't have any matches either.

I am also aware of the general rule that you don't use regex to parse XML/HTML. (and shovel is an XML) However, all I'm doing is, looking for a lang attribute that is within the first 30 characters of the XML, and I want to replace it. Is it really a bad idea to use regex in this case? I don't think using DOM would be any better/more efficient.

like image 565
Spectraljump Avatar asked Dec 08 '25 02:12

Spectraljump


1 Answers

Three slashes would be correct (\\ + \" becomes \ + " = \"). (Update: Actually, it turns out that isn't even necessary. A single slash also works, it seems.) The problem is your use of [..]; the [] symbols mean "any of the characters in here" (so [..] just means "any character").

Drop the [] and you should be getting what you want:

String ab = "foo=\"bar\" lang=\"AB\"";
String regex = "lang=\\\"..\\\"";
String cd = ab.replaceFirst(regex, "lang=\"CD\"");
System.out.println(cd);

Output:

foo="bar" lang="CD"
like image 188
Dan Tao Avatar answered Dec 10 '25 15:12

Dan Tao



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!