Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

the correct regex for replacing em-dash with a basic "-" in java

Tags:

java

My question concerns the replaceAll method of String class.

My purpose is to replace all the em-dashes in a text with a basic "-". I know the unicode character of em-dash is \u2014.

I tried it in the following way:

String s = "asd – asd";
s = s.replaceAll("\u2014", "-");

Still, the em-dash is not replaced. What is it I'm doing wrong?

like image 245
user975705 Avatar asked Nov 20 '11 18:11

user975705


2 Answers

Minor edit after question edit:

You might not be using an em-dash at all. If you're not sure what you have, a nice solution is to simply find and replace all dashes... em or otherwise. Take a look at this answer, you can try to use the Unicode dash punctuation property for all dashes ==> \\p{Pd}

String s = "asd – asd";
s = s.replaceAll("\\p{Pd}", "-");

Working example replacing an em dash and regular dash both with the above code.

References:
public String replaceAll(String regex, String replacement)
Unicode Regular Expressions

like image 90
Peter Ajtai Avatar answered Sep 19 '22 15:09

Peter Ajtai


Based on what you posted, the problem may not actually lie with your code, but with your assumed dash. What you have looks like an en dash (width of a capital N) rather than an em dash (width of a capital M). The Unicode for the en dash is U+2013, try using that instead and see if it updates properly.

like image 36
Charlie Avatar answered Sep 21 '22 15:09

Charlie