Spilt String using Unicode delimiter

Tags:

I need to split a string with "-" as delimiter in java. Ex: "Single Room - Enjoy your stay"

I have the same data coming in english and german depending on locale . Hence I cannot use the usual string.split("-") . The unicode for "-" character is 8212(dec) or x2014(hex).How do I split the string using unicode ???

935

asked Mar 08 '12 04:03

Bhavya

1 Answers

You may be mistaken in which Unicode dash character you’re getting. As of Unicode v6.1, there are 27 code points that have the \p{Dash} property:

Click to copy

U+002D ‭ -  HYPHEN-MINUS
U+058A ‭ ֊  ARMENIAN HYPHEN
U+05BE ‭ ־  HEBREW PUNCTUATION MAQAF
U+1400 ‭ ᐀  CANADIAN SYLLABICS HYPHEN
U+1806 ‭ ᠆  MONGOLIAN TODO SOFT HYPHEN
U+2010 ‭ ‐  HYPHEN
U+2011 ‭ ‑  NON-BREAKING HYPHEN
U+2012 ‭ ‒  FIGURE DASH
U+2013 ‭ –  EN DASH
U+2014 ‭ —  EM DASH
U+2015 ‭ ―  HORIZONTAL BAR
U+2053 ‭ ⁓  SWUNG DASH
U+207B ‭ ⁻  SUPERSCRIPT MINUS
U+208B ‭ ₋  SUBSCRIPT MINUS
U+2212 ‭ −  MINUS SIGN
U+2E17 ‭ ⸗  DOUBLE OBLIQUE HYPHEN
U+2E1A ‭ ⸚  HYPHEN WITH DIAERESIS
U+2E3A ‭ ⸺  TWO-EM DASH
U+2E3B ‭ ⸻  THREE-EM DASH
U+301C ‭ 〜 WAVE DASH
U+3030 ‭ 〰 WAVY DASH
U+30A0 ‭ ゠ KATAKANA-HIRAGANA DOUBLE HYPHEN
U+FE31 ‭ ︱ PRESENTATION FORM FOR VERTICAL EM DASH
U+FE32 ‭ ︲ PRESENTATION FORM FOR VERTICAL EN DASH
U+FE58 ‭ ﹘ SMALL EM DASH
U+FE63 ‭ ﹣ SMALL HYPHEN-MINUS
U+FF0D ‭ － FULLWIDTH HYPHEN-MINUS

In Perl or ICU, you could just split directly on \p{dash}, but since the Sun Pattern class doesn’t support full Unicode properties like that, you have to synthesize it with an enumerated square-bracketed character class. So splitting on the pattern:

Click to copy

string.split("[\u002D\u058A\u05BE\u1400\u1806\u2010-\u2015\u2053\u207B\u208B\u2212\u2E17\u2E1A\u2E3A-\u301C\u3030\u30A0\uFE31\uFE32\uFE58\uFE63\uFF0D]")

should do the trick for you. You can actually double-backslash those if you fear for the Java preprocessor getting in your way, because the regex parser should know to understand the alternate notation.

127

answered Oct 05 '22 15:10

tchrist

Related questions
                            
                                Performance of Equality in Java ( instanceOf vs isAssignableFrom )
                            
                                How to store table or matrix in Java?
                            
                                Using BufferedReader to take input in java
                            
                                Avoid type safety warnings with Hibernate criteria query
                            
                                How to return anonymous List or ArrayList on AsyncTask in Android
                            
                                Spring Batch Multi Threading - How to make each thread read unique records?
                            
                                Java - Swing setting colour to text in JTextArea
                            
                                How to connect to mysql using java?
                            
                                Hadoop Job : Task fail to report status for 601 seconds
                            
                                Thread-Pool with multiple limits
                            
                                How to install/configure custom Java Look-And-Feel?
                            
                                Inheritance and casting: is this good java?
                            
                                Overlay a JButton over JLabel in Java Swing?
                            
                                Does handler.post(runnable) start a new thread?
                            
                                Assigning an integer literal to a double variable in Java
                            
                                Android - "The server could not process your apk. Try again.." error when uploading on market
                            
                                jndi LDAPS custom HostnameVerifier and TrustManager
                            
                                java.sql.SQLException: No data found
                            
                                Byte[] and java.lang.OutOfMemoryError reading file by bits
                            
                                Java - Difference between throwing an Exception and catching and rethrowing Exception

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spilt String using Unicode delimiter

Tags:

java

string

unicode

character-properties

Bhavya

People also ask

1 Answers

tchrist

Recent Activity

Donate For Us