Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

FlatFileItemReader tab delimiter not working

I checkout this project from spring: https://github.com/spring-guides/gs-batch-processing

Source: https://spring.io/guides/gs/batch-processing/

I replace the ',' with 'tab' in 'sample-data.csv:

Jill    Doe
Joe Doe
Justin  Doe
Jane    Doe
John    Doe

Then I add the new Delimiter to the reader:

@Bean
public FlatFileItemReader<Person> reader() {
    return new FlatFileItemReaderBuilder<Person>()
        .name("personItemReader")
        .resource(new ClassPathResource("sample-data.csv"))
        .delimited()
        .delimiter(DelimitedLineTokenizer.DELIMITER_TAB) // NEW DELIMITER
        .names(new String[]{"firstName", "lastName"})
        .fieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {{
            setTargetType(Person.class);
        }})
        .build();
}

When I launch I get this error:

Caused by: org.springframework.batch.item.file.transform.IncorrectTokenCountException: Incorrect number of tokens found in record: expected 2 actual 1
    at org.springframework.batch.item.file.transform.AbstractLineTokenizer.tokenize(AbstractLineTokenizer.java:142) ~[spring-batch-infrastructure-4.0.1.RELEASE.jar:4.0.1.RELEASE]
    at org.springframework.batch.item.file.mapping.DefaultLineMapper.mapLine(DefaultLineMapper.java:43) ~[spring-batch-infrastructure-4.0.1.RELEASE.jar:4.0.1.RELEASE]
    at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:180) ~[spring-batch-infrastructure-4.0.1.RELEASE.jar:4.0.1.RELEASE]
    ... 50 common frames omitted

I have tried with '@' delimiter -> it works. For some reason, I can't make it work with the tab delimiter...

Of course in my real project, I have an input file with 'tab' separators...

Any solution here?

like image 765
Tyvain Avatar asked Jan 27 '23 15:01

Tyvain


2 Answers

You cant't set the tab delimiter that way. Since tab ('\t') doesnt contain any actual text it is ignored by the DelimitedLineTokenizer in static DelimitedBuilder class in FlatFileItemReaderBuilder.java . Any non-whitespace delimiter can be set using above code that you have given in the question.

FlatFileItemReaderBuilder sourceCode

This is how the LineTokenizer instance is built in FlatFileItemReaderBuilder.java.

public DelimitedLineTokenizer build() {
        Assert.notNull(this.fieldSetFactory, "A FieldSetFactory is required.");
        Assert.notEmpty(this.names, "A list of field names is required");

        DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();

        tokenizer.setNames(this.names.toArray(new String[this.names.size()]));

        // the hasText ignores the tab delimiter.

        if(StringUtils.hasText(this.delimiter)) {
            tokenizer.setDelimiter(this.delimiter);
        }
// more code

So to fix this issue, you need to provide bean of Type DelimitedLineTokenizer explicitly configured with tab delimiter.

use below code in your spring configuration file to set the tab delimiter:

@Bean
public FlatFileItemReader<Person> reader() {
    return new FlatFileItemReaderBuilder<Person>().name("personItemReader")
            .resource(new ClassPathResource("sample-data.csv"))
            .lineMapper(lineMapper()).build();
}

@Bean
public DefaultLineMapper<Person> lineMapper(){
      DefaultLineMapper<Person> lineMapper = new DefaultLineMapper<>();
      lineMapper.setLineTokenizer(lineTokenizer());
      lineMapper.setFieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {
                {
                    setTargetType(Person.class);
                }
            });
      return lineMapper;
}

@Bean
public DelimitedLineTokenizer lineTokenizer() {
    DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer(DelimitedLineTokenizer.DELIMITER_TAB);
    tokenizer.setNames(new String[] { "firstName", "lastName" });
    return tokenizer;
}
like image 155
Sangam Belose Avatar answered Jan 31 '23 08:01

Sangam Belose


Simple way:

@Bean
public FlatFileItemReader<Person> reader() {
    return new FlatFileItemReaderBuilder<Person>()
            .name("personItemReader")
            .resource(new ClassPathResource("sample-data.csv"))
            .lineTokenizer(new DelimitedLineTokenizer(DelimitedLineTokenizer.DELIMITER_TAB) {{
                setNames(new String[]{"firstName", "lastName"});
            }})
            .fieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {{
                setTargetType(Person.class);
            }})
            .build();
}
like image 32
Pavel Avatar answered Jan 31 '23 09:01

Pavel