How to make regex of a cyrillic
string, i want to use it in this a way somehow:
String.replaceAll("Кириллица","")
Of course it doesn't work. What am I to do, to make it work?
Ok,I see that the method works, but it doesn't work for me. How can I check, why does method not execute?
...
Hm, I tried to use s1 = s1.replaceAll("[\\p{InCyrillic}]", "");
for the string I get through the sockets. it works great, all cyrillic chars disapperar, including the word "Экзамен"
, but if I try s1=s1.replaceAll("Экзамен","")
nothing happens.
But method s1=s1.replaceAll("Экзамен","")
worked in the same program for a static string defined in this program. I guess that problem may be because of wrong charset, but I still can't understand what am I doing wrong. The charset of the string is windows-1251
. I tried to experiment with charset in program (it is jsp now), using methods
System.setProperty("file.encoding", "windows-1251");
response.setCharacterEncoding("windows-1251");
tried converting the string from one charset to another. And nothing changes
UTF-8. 128 characters are encoded using 1 byte (the ASCII characters). 1920 characters are encoded using 2 bytes (Roman, Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic characters). 63488 characters are encoded using 3 bytes (Chinese and Japanese among others).
Cyrillic is a Unicode block containing the characters used to write the most widely used languages with a Cyrillic orthography. The core of the block is based on the ISO 8859-5 standard, with additions for minority languages and historic orthographies.
[A-Za-z] will match all the alphabets (both lowercase and uppercase).
1.2 Example: Numbers [0-9]+ or \d+ A regex (regular expression) consists of a sequence of sub-expressions. In this example, [0-9] and + . The [...] , known as character class (or bracket list), encloses a list of characters. It matches any SINGLE character in the list.
It might be more clear if you show your result in case @Henry's answer. I suppose that the issue in characters or encoding. To identify is the String in cyrillic you can with this code:
String s1 = "Экзaмен";
s1 = s1.replaceAll("[\\p{InCyrillic}]", "");
System.out.println(s1);
The code will remove all cyrillic characters and you can identify invalid encoded characters.
If your result will be like "a" or "e", or "ae", It means that in your string exist latin characters which simular to cyrillic, so you should replace using this regex
s1 = s1.replaceAll("Экз[aa]м[ee]н", "");
where [a-is cyrillic character and a-is latin character] and so on.
If your result will be as "Экзaмен", the issue in encoding and I hope this link will help you
How to determine if a String contains invalid encoded characters
Just tried this:
String s1 = "Введение в специальность (Б.3.2.1-ПиКО)60,3Экзамен";
String s2 = s1.replaceAll("Экзамен", "");
System.out.println(s2);
The output is:
Введение в специальность (Б.3.2.1-ПиКО)60,3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With