Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the proper encoding to use with item Reader

I'm using spring batch to read csv files, when I open these files with Notepad++ I see that the used encode is encode in ANSI. Now when reading a line from a file, I notice that all accent character are not shown correctly. For example let's take this line:

Données issues de la reprise des données

It's transformed to be like this one with some special characters:

enter image description here

So as first solution I set the encode for my Item Reader to utf-8 but the problem still exist.

  • I thought that with UTF-8 encoding all my accent characters will be recognized, is that not true ? from what I heard UTF-8 is the best encoding to use to handle all character on web page for example ?

After setting my item Reader encoding to ISO-8859-1:

public class TestItemReader extends FlatFileItemReader<TestFileRow> {

    private static final Logger log = LoggerFactory.getLogger(TestItemReader.class);
    public ScelleItemReader(String path) {

        this.setResource( new FileSystemResource(path + "/Test.csv"));
        this.setEncoding("ISO-8859-1");

I cant see that these character are now displayed correctly.

  • As output I should write with utf-8 as encoding, did this is correct if I use ISO-8859-1 as encoding input and utf-8 as output?
like image 678
Feres.o Avatar asked Nov 15 '17 09:11

Feres.o


1 Answers

I had the same problem. Input file is ANSI, and "ü" gets displayed as a square in the output.

That's because your input file is encoded in ANSI, but by default, Spring Batch assumes ISO-8859-1 encoding (6.6.2 FlatFileItemReader).

Therefore, you have to set the encoding for your reader to "Cp1252" (setEncoding("Cp1252")) - that's how Java refers to ANSI encoding.

Furthermore, you will have to set your writer's encoding to "utf-8". I'm not entirely sure why it doesn't work with other encodings (that are generally able to display "ü", such as ISO-8859-1), but it works with UTF-8, so that's what I'm using.

like image 144
PixelMaster Avatar answered Oct 12 '22 10:10

PixelMaster