When I try to compile a java file, the compiler said "illegal character \u3000",
after searching, I find it is CJK Unified Ideographs Chinese Korean and Japanese's SPACE. Instead of deleting the special SPACE manually, I decide to code a simple search-and-deleting java file to eliminate it.
However It doesnot point out the index error. So how to write a code to eliminate this special SPACE
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.File;
import java.io.IOException;
import java.util.*;
public class BufferReadAFile {
public static void main(String[] args) {
//BufferedReader br = null;
String sCurrentLine;
String message = "";
try {
/*br = new BufferedReader(new FileReader("/Users/apple/Test/Instance1.java"));
while ((sCurrentLine = br.readLine()) != null) {
message += sCurrentLine;
}
*/
String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\\Z").next();
//System.out.println(content);
searchSubString(content.toCharArray(),"\\u3000".toCharArray());
} catch (IOException e) {
e.printStackTrace();
}
}
public static void searchSubString(char[] text, char[] ptrn) {
int i = 0, j = 0;
// pattern and text lengths
int ptrnLen = ptrn.length;
int txtLen = text.length;
// initialize new array and preprocess the pattern
int[] b = preProcessPattern(ptrn);
while (i < txtLen) {
while (j >= 0 && text[i] != ptrn[j]) {
j = b[j];
}
i++;
j++;
// a match is found
if (j == ptrnLen) {
System.out.println("found substring at index:" + (i - ptrnLen));
j = b[j];
}
}
}
public static int[] preProcessPattern(char[] ptrn) {
int i = 0, j = -1;
int ptrnLen = ptrn.length;
int[] b = new int[ptrnLen + 1];
b[i] = j;
while (i < ptrnLen) {
while (j >= 0 && ptrn[i] != ptrn[j]) {
// if there is mismatch consider the next widest border
// The borders to be examined are obtained in decreasing order from
// the values b[i], b[b[i]] etc.
j = b[j];
}
i++;
j++;
b[i] = j;
}
return b;
}
}
I don't think "\\u3000" is what you want. You can print out the string and see the content yourself. You should use "\u3000" instead. Note the single back slash.
System.out.println("\\u3000"); // This prints out \u3000
System.out.println("\u3000"); // This prints out the CJK space
Alternatively, you could just use the actual CJK space character directly as in one of the if checks in your CheckEmpty class.
In my Question, I am trying to use KMP alogrithm to search the index of a pattern in my java file
if we use "\\u3000".toCharArray() the compiler will look through each character. Which is not what we want. \\u3000 is an special white space. It is FULL-WIDTH space that only existed in Chinese Korean and Japanese languages.
If we trying to write sentence by using the FULL-WIDTH Space. It will look like:
Here is Full-width demonstration.
Very distinctive space. but is not so visible in java file. It inspire me to write the code below
import java.util.*;
import java.io.*;
public class CheckEmpty{
public static void main(String []args){
try{
String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\\Z").next();
if(content.contains(" ")){
System.out.println("English Space");
}
if(content.contains("\\u3000")){
System.out.println("Backslash 3000");
}
if(content.contains(" ")){// notice the space is a SPECIAL SPACE
System.out.println("C J K fullwidth");
//Chinese Japanese Korean white space
}
}catch(FileNotFoundException e){
e.printStackTrace();
}
}
}
As expected, the result shows:

which means the java file contains both the normal and full-width Space.
After that I am thinking to write another java file to delete all the special space:
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.File;
import java.io.PrintWriter;
import java.io.IOException;
import java.util.*;
public class DeleteTheSpecialSpace {
public static void main(String[] args) {
//BufferedReader br = null;
String sCurrentLine;
String message = "";
try {
String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\\Z").next();
content.replaceAll(" ",""); // notice the left parameter is a SPECIAL SPACE
//System.out.println(content);
PrintWriter out = new PrintWriter( "/Users/apple/Coding/Instance1.java" );
out.println(content);
} catch (IOException e) {
e.printStackTrace();
}
}
}
Finally: amazing things happen, There is no error in "Instance1.java", since all full-width space have been eliminated

Compile SUCCESS :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With