Java remove non Latin-basic characters from string

Tags:

Let's say I have the following code:

String description = "★★★★★  ♫ ♬ This description ✔✔  ▬ █ ✖  is a mess. ♫ ♬ ★★★★★";

I'd like to remove the non-Latin characters: ✔, ▬, █, ✖, ♫, ♬ and ★.

And have it become this: This description is a mess.

I know there's probably tons of these wingdings-like characters, so instead of specifying what I'd like to remove, I think it's better to list what I want to keep: Basic Latin and Latin-1 supplements characters.

I found that I can use the following code to remove everything but the basic latin characters

String clean_description = description.replaceAll("[^\\x00-\\x7F]", "").trim();

But is there a way to also preserve the Latin-1 supplement characters?

830

asked Mar 16 '16 14:03

RoboticR

1 Answers

From looking at the character ranges you provided, it appears that "Basic Latin" and "Latin-1 Supplements" are adjacent (0x00-0x7F and 0x80-0xFF).

So you can use the same regex you provided, just extended out to include the "Latin-1 Supplement" characters. That would look like this:

String clean_description = description.replaceAll("[^\\x00-\\xFF]", "").trim();

As pointed out in the comments by Quinn, this does not get rid of the spaces between the removed sections, so the result has excess spaces (which may or may not be what you want). If you want those spaces removed, Quinn's regex ([^(\\x00-\\xFF)]+(?:$|\\s*), in case the comment is deleted) may work for you.

answered Oct 16 '22 15:10

resueman

Related questions
                            
                                Java Process getOutputStream to String
                            
                                Is the JDBC ResultSet an application-level query cursor
                            
                                Java DynamoDB -- Only insert if key not already present (without mapper)
                            
                                Javafx resize components when fullscreen
                            
                                Reliability of Random with a constant seed across different versions of Java
                            
                                What is size of my Bitset?
                            
                                Is there a replacement for the RequestInterceptor in Retrofit 2?
                            
                                Download or redirect with error message to another controller action in Spring web MVC
                            
                                Mockito, verify a function is invoked 0 time(s)
                            
                                Method iterator() declared in java.util.Collection and in java.lang.Iterable, its superinterface?
                            
                                java.net.SocketException: Not a multicast address
                            
                                Creating video player using Java
                            
                                Spring REST @RequestBody is always empty
                            
                                How can I use third party dependencies in custom task under buildSrc in Gradle
                            
                                Transaction atomicity in Spring
                            
                                How to call Collections.Shuffle on only part of an array Java
                            
                                IntelliJ insert blank line before last '}' in class files
                            
                                What are the exact difference between Java 8 Lambda constructs and JavaScript? [closed]
                            
                                Kafka: How do I enable client logging?
                            
                                Spring Boot vs. JAX-RS (Restlet) for dead simple microservice [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Java remove non Latin-basic characters from string

Tags:

java

regex

unicode

RoboticR

People also ask

1 Answers

resueman

Recent Activity

Donate For Us