Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determine if characters in a string are all of a specific character set

Tags:

I need to be able to take a string in Java and determine whether or not all of the characters contained within it are in a specified character set (e.g. ISO-8859-1). I've looked around quite a bit for a simple way to do this (including playing around with a CharsetDecoder), but have yet to be able to find something.

What is the best way to take a string and determine if all the characters are within a given character set?

like image 465
Michael Avatar asked Oct 30 '12 17:10

Michael


People also ask

How do you check if a string contains a set of characters?

The simplest and fastest way to check whether a string contains a substring or not in Python is the "in" operator . This operator returns true if the string contains the characters, otherwise, it returns false .

How do you check if a string contains a set of characters in Java?

The Java String contains() method is used to check whether the specific set of characters are part of the given string or not. It returns a boolean value true if the specified characters are substring of a given string and returns false otherwise.

How do you check if a particular character is present in a string JavaScript?

In JavaScript, includes() method determines whether a string contains the given characters within it or not. This method returns true if the string contains the characters, otherwise, it returns false.


Video Answer


2 Answers

Class CharsetEncoder in package java.nio.charset offer a method canEncode to test if a specific character is supported.

Michael basically did something like this:

Charset.forName( CharEncoding.ISO_8859_1 ).newEncoder().canEncode("string")

Note that CharEncoding.ISO_8859_1 rely on Apache commons and may be replaced by "ISO_8859_1".

like image 147
Aubin Avatar answered Nov 24 '22 00:11

Aubin


I think that the easiest way will be to have a table of which Unicode characters can be represented in the target character set encoding and then testing each character in the string. For the ISO-8859 family, the table can usually be represented by one or a few ranges of Unicode characters, making the test relatively easy. It's a lot of hand work, but needs to be done only once.

EDIT: or use Aubin's answer if the charset is supported in your Java implementation. :)

like image 22
Ted Hopp Avatar answered Nov 24 '22 00:11

Ted Hopp