Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Eliminate the "\u3000" error in java

When I try to compile a java file, the compiler said "illegal character \u3000",

enter image description here

after searching, I find it is CJK Unified Ideographs Chinese Korean and Japanese's SPACE. Instead of deleting the special SPACE manually, I decide to code a simple search-and-deleting java file to eliminate it.

However It doesnot point out the index error. So how to write a code to eliminate this special SPACE

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.File;
import java.io.IOException;
import java.util.*;
public class BufferReadAFile {
    public static void main(String[] args) {

        //BufferedReader br = null;
        String sCurrentLine;
        String message = "";
        try {

            /*br = new BufferedReader(new FileReader("/Users/apple/Test/Instance1.java"));

            while ((sCurrentLine = br.readLine()) != null) {
                message += sCurrentLine;
            }
            */
            String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\\Z").next();
            //System.out.println(content);
            searchSubString(content.toCharArray(),"\\u3000".toCharArray());

        } catch (IOException e) {
            e.printStackTrace();
        } 

    }


    public static void searchSubString(char[] text, char[] ptrn) {
        int i = 0, j = 0;
        // pattern and text lengths
        int ptrnLen = ptrn.length;
        int txtLen = text.length;

        // initialize new array and preprocess the pattern
        int[] b = preProcessPattern(ptrn);

        while (i < txtLen) {
            while (j >= 0 && text[i] != ptrn[j]) {
                j = b[j];
            }
            i++;
            j++;

            // a match is found
            if (j == ptrnLen) {
                System.out.println("found substring at index:" + (i - ptrnLen));
                j = b[j];
            }
        }
    }


    public static int[] preProcessPattern(char[] ptrn) {
        int i = 0, j = -1;
        int ptrnLen = ptrn.length;
        int[] b = new int[ptrnLen + 1];

        b[i] = j;
        while (i < ptrnLen) {            
                while (j >= 0 && ptrn[i] != ptrn[j]) {
                // if there is mismatch consider the next widest border
                // The borders to be examined are obtained in decreasing order from 
                //  the values b[i], b[b[i]] etc.
                j = b[j];
            }
            i++;
            j++;
            b[i] = j;
        }
    return b;
    }


}
like image 603
Sheldon Avatar asked Apr 28 '26 01:04

Sheldon


2 Answers

I don't think "\\u3000" is what you want. You can print out the string and see the content yourself. You should use "\u3000" instead. Note the single back slash.

System.out.println("\\u3000"); // This prints out \u3000
System.out.println("\u3000");  // This prints out the CJK space

Alternatively, you could just use the actual CJK space character directly as in one of the if checks in your CheckEmpty class.

like image 91
neurite Avatar answered Apr 30 '26 14:04

neurite


In my Question, I am trying to use KMP alogrithm to search the index of a pattern in my java file

if we use "\\u3000".toCharArray() the compiler will look through each character. Which is not what we want. \\u3000 is an special white space. It is FULL-WIDTH space that only existed in Chinese Korean and Japanese languages.

If we trying to write sentence by using the FULL-WIDTH Space. It will look like:

Here is Full-width demonstration.

Very distinctive space. but is not so visible in java file. It inspire me to write the code below

import java.util.*;
    import java.io.*;


public class CheckEmpty{
        public static void main(String []args){
            try{
                 String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\\Z").next();
                if(content.contains(" ")){
                     System.out.println("English Space");
                }
                if(content.contains("\\u3000")){
                     System.out.println("Backslash 3000");
                }

                if(content.contains(" ")){// notice the space is a SPECIAL SPACE
                     System.out.println("C J K fullwidth");
                    //Chinese Japanese Korean white space
                }
            }catch(FileNotFoundException e){
                e.printStackTrace();
           }

       }
}

As expected, the result shows:

enter image description here

which means the java file contains both the normal and full-width Space.

After that I am thinking to write another java file to delete all the special space:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.File;
import java.io.PrintWriter;
import java.io.IOException;
import java.util.*;
public class DeleteTheSpecialSpace {

public static void main(String[] args) {

    //BufferedReader br = null;
    String sCurrentLine;
    String message = "";
    try {


        String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\\Z").next();
        content.replaceAll(" ",""); // notice the left parameter is a SPECIAL SPACE
        //System.out.println(content);

    PrintWriter out = new PrintWriter( "/Users/apple/Coding/Instance1.java" );
        out.println(content);


    } catch (IOException e) {
        e.printStackTrace();
    } 

}

}

Finally: amazing things happen, There is no error in "Instance1.java", since all full-width space have been eliminated
enter image description here
Compile SUCCESS :)

like image 43
Sheldon Avatar answered Apr 30 '26 13:04

Sheldon