Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use encoding in supercsv getHeader

Tags:

java

supercsv

I'm using supercsv 2.1.0 to parse a CSV File with german words in it.

The given CSV file has a header at the first line. In this header there are some mutated vowels like: Ä,ä, Ü,ö and so on. For example: Betrag;Währung;Info

In my coding I'm trying to get the header of the csv like this:

ICsvBeanReader inFile = new CsvBeanReader(new InputStreamReader(new FileInputStream(file), "UTF8"), CsvPreference.EXCEL_NORTH_EUROPE_PREFERENCE);

final String[] header = inFile.getHeader(true);

Here is my problem with the header array. All headers with mutated vowels are not encoded correctly using utf8 charset.

Is there a way how I can read the header correctly?

Here is a pseudo unit test:

public class TestSuperCSV {


@Test
public void test() {
    String path = "C:\\Umsatz.csv";
    File file = new File(path);

    try {
        ICsvBeanReader inFile = new CsvBeanReader(new InputStreamReader(
                new FileInputStream(file), "UTF-8"),
                CsvPreference.EXCEL_NORTH_EUROPE_PREFERENCE);
        final String[] header = inFile.getHeader(true);
        System.out.println(header[9]); //getting "W?hrung" but needed "Währung" here


    } catch (UnsupportedEncodingException | FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}
}

Kind regards, Alex

like image 255
Alexander Geppart Avatar asked Oct 29 '13 09:10

Alexander Geppart


1 Answers

It sounds like your file isn't actually using UTF-8 encoding.

I can replicate your scenario by creating the CSV file using ISO-8859-1 encoding and running your code, and it appears as W?hrung.

If I then update the InputStreamReader to use "ISO-8859-1" as the encoding, then it appears correctly as Währung.

like image 176
James Bassett Avatar answered Oct 20 '22 11:10

James Bassett