Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing double backslashes with single backslash

I have a string "\\u003c", which belongs to UTF-8 charset. I am unable to decode it to unicode because of the presence of double backslashes. How do i get "\u003c" from "\\u003c"? I am using java.

I tried with,

myString.replace("\\\\", "\\");

but could not achieve what i wanted.

This is my code,

String myString = FileUtils.readFileToString(file);
String a = myString.replace("\\\\", "\\");
byte[] utf8 = a.getBytes();

// Convert from UTF-8 to Unicode
a = new String(utf8, "UTF-8");
System.out.println("Converted string is:"+a);

and content of the file is

\u003c

like image 672
Vinay thallam Avatar asked Jun 13 '12 09:06

Vinay thallam


1 Answers

You can use String#replaceAll:

String str = "\\\\u003c";
str= str.replaceAll("\\\\\\\\", "\\\\");
System.out.println(str);

It looks weird because the first argument is a string defining a regular expression, and \ is a special character both in string literals and in regular expressions. To actually put a \ in our search string, we need to escape it (\\) in the literal. But to actually put a \ in the regular expression, we have to escape it at the regular expression level as well. So to literally get \\ in a string, we need write \\\\ in the string literal; and to get two literal \\ to the regular expression engine, we need to escape those as well, so we end up with \\\\\\\\. That is:

String Literal        String                      Meaning to Regex
−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−
\                     Escape the next character   Would depend on next char
\\                    \                           Escape the next character
\\\\                  \\                          Literal \
\\\\\\\\              \\\\                        Literal \\

In the replacement parameter, even though it's not a regex, it still treats \ and $ specially — and so we have to escape them in the replacement as well. So to get one backslash in the replacement, we need four in that string literal.

like image 161
mtyson Avatar answered Sep 22 '22 05:09

mtyson