Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this character 口 causing my scanner to fail?

I'm using the Java Scanner.

I have a .txt file with this text saved in it.

PriceDB = {
    ["profileKeys"] = {
        ["Name - 回音山"] = "Name - 回音山",
    },
    ["char"] = {
        ["Name - 回音山"] = {
            ["CurrentValue"] = "一口价:|cffffffff70,197|TInterface\\MoneyFrame\\UI-GoldIcon:0:0:2:0|t|r",
        },
    },
}

All I am trying to do is open this file with a scanner and extract the "CurrentValue" of 70,197 from the file and save it as an int. However, every time the file is opened it will not read a line and throws a NoSuchElementException with "No line found" as the message. After fiddling around with the file and removing some of the Chinese characters one by one I narrowed it down to this little guy 口. For some reason, scanner does not like that character. I was just wondering if there's some encoding setting that I need to change or if I'm going to have use BufferedReader or what...I'm honestly not really sure what's going on except I guess there's an encoding error. So what's happening here?

Edit: Here's the initialization of my scanner.

Scanner scanner;
if (region.equals("US")) {
                scanner = new Scanner(new File("C:\\Program Files\\World of Warcraft\\WTF\\Account\\313023286#1\\SavedVariables\\WoWTokenPrice.lua"));
            } else if (region.equals("EU")) {
                scanner = new Scanner(new File("C:\\Program Files\\World of Warcraft\\WTF\\Account\\313495228#1\\SavedVariables\\WoWTokenPrice.lua"));
            } else if (region.equals("China")) {
                File file = new File("C:\\Program Files\\World of Warcraft\\WTF\\Account\\232241227#1\\SavedVariables\\WoWTokenPrice.lua");
                System.out.println(file.exists());
                scanner = new Scanner(file);
            } else {
                System.exit(1);
                break;
            }

I just copied it as is. region == "China"

like image 641
david2278 Avatar asked May 06 '15 09:05

david2278


1 Answers

You must specify the correct encoding when creating your Scanner. The constructor:

public Scanner(InputStream source, String charsetName)

Constructs a new Scanner that produces values scanned from the specified input stream. Bytes from the stream are converted into characters using the specified charset.

Find here your charset, i guess UTF-16 but not an expert in foreign characters :).

Scanner scanner = new Scanner(is, StandardCharsets.UTF-16.toString());
like image 144
Jordi Castilla Avatar answered Sep 22 '22 11:09

Jordi Castilla