Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Scanner Class bad character "®"

Tags:

java

unicode

I have a scanner class reading a file into a string. Any file with this character "®" causes it to fail. I'm new to Java, Is there a better way to read this file so that character would be accepted?

public void readFile(String fileName)
{
    fileText = "";

    try
    {
        Scanner file = new Scanner(new File(fileName));
        while (file.hasNextLine())
        {
            String line = file.nextLine();
            fileText += line +"\r"+"\n";
        }
        file.close();
    }
    catch (Exception e)
    {
         System.out.println(e);

   }
      }
like image 943
Minerbob Avatar asked Nov 17 '16 20:11

Minerbob


2 Answers

By default Scanner uses the platform default character encoding, this might not match the character encoding of the file. JavaDoc states:

Constructs a new Scanner that produces values scanned from the specified file. Bytes from the file are converted into characters using the underlying platform's default charset.

First determine what character encoding your file is in, this can be done with the Linux command line utility file -i. Pass the correct encoding into the scanner. Java 7 contains predefined constants in java.nio.charset.StandardCharsets for some well known character sets.

Scanner file = new Scanner(new File(fileName), StandardCharsets.UTF_8);
like image 92
Adam Avatar answered Nov 01 '22 16:11

Adam


Mention the encoding when you create the scanner.

Scanner file= new Scanner(new File(fileName), "utf-8");

like image 10
Jerin Joseph Avatar answered Nov 01 '22 18:11

Jerin Joseph