Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spring Batch - Is it possible to have a dynamic column in FlatFileReader?

I'm dealing with many CSVs files that don't have a fixed header/column, saying that I can get file1.csv with 10 column and file2.csv with 50 column.

I can't know in advance the number of column that I'll have, I can't create a specific job for each file type, my input will be a black box: bunch of CSV that will have an X number of column from 10 to infinite.

As I want to use Spring Batch to auto import these CSVs, I want to know if it is possible? I know that I have to get a fixed number of column because of the processor and the fact that I need to serialize my data into a POJO before sending it back to a writer.

Could my processor serialize an Array? beside sending one simple Object, can I get an Array of Object and in the end of my job I'll will have an Array of an Array of Object?

What do you think?

Thanks

like image 480
TheCyberXP Avatar asked Nov 29 '16 18:11

TheCyberXP


2 Answers

I arrived to this old post with the very same question. Finally I managed to build a dynamic column FlatFileItemReader with the help of the skippedLinesCallback so I leave it here:

@Bean
public FlatFileItemReader<Person> reader() {

    DefaultLineMapper<Person> lineMapper = new DefaultLineMapper<>();
    DelimitedLineTokenizer delimitedLineTokenizer = new DelimitedLineTokenizer();
    lineMapper.setLineTokenizer(delimitedLineTokenizer);
    lineMapper.setFieldSetMapper(new BeanWrapperFieldSetMapper<>() {
        {
            setTargetType(Person.class);
        }
    });

    return new FlatFileItemReaderBuilder<Person>()
            .name("personItemReader")
            .resource(new FileSystemResource(inputFile))
            .linesToSkip(1)
            .skippedLinesCallback(line -> delimitedLineTokenizer.setNames(line.split(",")))
            .lineMapper(lineMapper)
            .build();
}

In the callback method you update the names of the tokenizer from the header line. You could also add some validation logic here. With this solution there is no need to write your own LineTokenizer implementation.

like image 185
pepevalbe Avatar answered Nov 19 '22 14:11

pepevalbe


Create your own LineTokenizer implementation. The DelimitedLineTokenizer expects a predefined number of columns. If you create your own, you can be as dynamic as you want. You can read more about the LineTokenizer in the documentation here: http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/item/file/transform/LineTokenizer.html

like image 25
Michael Minella Avatar answered Nov 19 '22 14:11

Michael Minella