Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spring Batch process an encoded zipped file

Tags:

spring-batch

I’m investigating the use of spring batch to process records from an encoded zipped file. The records are variable length with nested variable length data fields encoded within them.

I’m new to Spring and Spring Batch, this is how I plan to structure the batch configuration.

  • The ItemReader would need to read a single record from the zipped (*.gz) file input stream into a POJO (byte array), the length of this record would be contained in the first two bytes of the stream.
  • The ItemProcessor will decode the byte array and store info in relevant attributes in the POJO.
  • The ItemWriter would populate a database.

My initial problem is understanding how to set up the ItemReader, I’ve looked at some of the examples of using a FlatFileItemReader, but my difficulty is the expectation to have a Line Mapper. I don't see how I can do that in my case (no concept of a line in the file).

There are some articles indicating the use of a custom BufferedReaderFactory, but great to see a worked example of this.

Help would be appreciated.

like image 594
Hugh Lacey Avatar asked Aug 13 '15 09:08

Hugh Lacey


1 Answers

if the gzipped file is a simple txt file, you only need a custum BufferedReaderFactory, the linemaper then gets the String of the current line

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.List;
import java.util.zip.GZIPInputStream;
import org.springframework.batch.item.file.BufferedReaderFactory;
import org.springframework.core.io.Resource;

public class GZipBufferedReaderFactory implements BufferedReaderFactory {

    /** Default value for gzip suffixes. */
    private List<String> gzipSuffixes = new ArrayList<String>() {

        {
            add(".gz");
            add(".gzip");
        }
    };

    /**
     * Creates Bufferedreader for gzip Resource, handles normal resources
     * too.
     * 
     * @param resource
     * @param encoding
     * @return
     * @throws UnsupportedEncodingException
     * @throws IOException 
     */
    @Override
    public BufferedReader create(Resource resource, String encoding)
            throws UnsupportedEncodingException, IOException {
        for (String suffix : gzipSuffixes) {
            // test for filename and description, description is used when 
            // handling itemStreamResources
            if (resource.getFilename().endsWith(suffix)
                    || resource.getDescription().endsWith(suffix)) {
                return new BufferedReader(new InputStreamReader(new GZIPInputStream(resource.getInputStream()), encoding));
            }
        }
        return new BufferedReader(new InputStreamReader(resource.getInputStream(), encoding));
    }

    public List<String> getGzipSuffixes() {
        return gzipSuffixes;
    }

    public void setGzipSuffixes(List<String> gzipSuffixes) {
        this.gzipSuffixes = gzipSuffixes;
    }
}

simple itemreader configuration:

 <bean id="itemReader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
  <property name="resource" value="#{jobParameters['input.file']}" />
  <property name="lineMapper">
    <bean class="org.springframework.batch.item.file.mapping.PassThroughLineMapper" />
  </property>
  <property name="strict" value="true" />
  <property name="bufferedReaderFactory">
    <bean class="your.custom.GZipBufferedReaderFactory" />
  </property>
</bean>
like image 67
Michael Pralow Avatar answered Oct 07 '22 05:10

Michael Pralow