Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java remove a pattern from string using regex

Tags:

java

regex

I need to clear my string from the following substrings:

\n

\uXXXX (X being a digit or a character)

e.g. "OR\n\nThe Central Site Engineering\u2019s \u201cfrontend\u201d, where developers turn to"

-> "OR The Central Site Engineering frontend , where developers turn to"
I tried using the String method replaceAll but dnt know how to overcome the \uXXXX issue as well as it didnt work for the \n

String s = "\\n";  
data=data.replaceAll(s," ");

how does this regex looks in java?

thanks for the help

like image 458
D.Shefer Avatar asked Aug 02 '15 17:08

D.Shefer


1 Answers

Problem with string.replaceAll("\\n", " "); is that replaceAll expects regular expression, and \ in regex is special character used for instance to create character classes like \d which represents digits, or to escape regex special characters like +.

So if you want to match \ in Javas regex you need to escape it twice:

  • once in regex \\
  • and once in String "\\\\".

like replaceAll("\\\\n"," ").

You can also let regex engine do escaping for you and use replace method like

replace("\\n"," ")

Now to remove \uXXXX we can use

replaceAll("\\\\u[0-9a-fA-F]{4}","")


Also remember that Strings are immutable, so each str.replace.. call doesn't affect str value, but it creates new String. So if you want to store that new string in str you will need to use

str = str.replace(..)

So your solution can look like

String text = "\"OR\\n\\nThe Central Site Engineering\\u2019s \\u201cfrontend\\u201d, where developers turn to\"";

text = text.replaceAll("(\\\\n)+"," ")
           .replaceAll("\\\\u[0-9A-Ha-h]{4}", "");
like image 191
Pshemo Avatar answered Oct 04 '22 02:10

Pshemo