Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Validate every field in a single pass with SuperCSV

Tags:

java

csv

supercsv

I'm trying to write a large number of rows (~2 million) from a database to a CSV file using SuperCSV. I need to perform validation on each cell as it is written, and the built-in CellProcessors do very nicely. I want to capture all the exceptions that are thrown by the CellProcessors so that I can go back to the source data and make changes.

The problem is that when there are multiple errors in a single row (e.g. The first value is out of range, the second value is null but shouldn't be), only the first CellProcessor will execute, and so I'll only see one of the errors. I want to process the whole file in a single pass, and have a complete set of exceptions at the end of it.

This is the kind of approach I'm trying:

for (Row row : rows) {
    try {
        csvBeanWriter.write(row, HEADER_MAPPINGS, CELL_PROCESSORS);
    } catch (SuperCsvCellProcessorException e) {
        log(e);
    }
}

How can I achieve this? Thanks!

EDIT: Here is the code I wrote that's similar to Hound Dog's, in case it helps anyone:

import java.util.List;

import org.supercsv.cellprocessor.CellProcessorAdaptor;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.exception.SuperCsvCellProcessorException;
import org.supercsv.util.CsvContext;

public class ExceptionCapturingCellProcessor extends CellProcessorAdaptor {

    private final List<Exception> exceptions;

    private final CellProcessor current;

    public ExceptionCapturingCellProcessor(CellProcessor current, CellProcessor next, List<Exception> exceptions) {
        super(next);
        this.exceptions = exceptions;
        this.current = current;
    }

    @Override
    public Object execute(Object value, CsvContext context) {
        // Check input is not null
        try {
            validateInputNotNull(value, context);
        } catch (SuperCsvCellProcessorException e) {
            exceptions.add(e);
        }

        // Execute wrapped CellProcessor
        try {
            current.execute(value, context);
        } catch (SuperCsvCellProcessorException e) {
            exceptions.add(e);
        }

        return next.execute(value, context);
    }
}
like image 815
Conan Avatar asked Nov 30 '12 14:11

Conan


1 Answers

I'd recommend writing a custom CellProcessor to achieve this. The following processor can be placed at the start of each CellProcessor chain - it will simply delegate to the processor chained after it, and will suppress any cell processing exceptions.

package example;

import java.util.ArrayList;
import java.util.List;
import org.supercsv.cellprocessor.CellProcessorAdaptor;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.exception.SuperCsvCellProcessorException;
import org.supercsv.util.CsvContext;

public class SuppressException extends CellProcessorAdaptor {

    public static List<SuperCsvCellProcessorException> SUPPRESSED_EXCEPTIONS = 
            new ArrayList<SuperCsvCellProcessorException>();

    public SuppressException(CellProcessor next) {
        super(next);
    }

    public Object execute(Object value, CsvContext context) {
        try {
            // attempt to execute the next processor
            return next.execute(value, context);

        } catch (SuperCsvCellProcessorException e) {
            // save the exception
            SUPPRESSED_EXCEPTIONS.add(e);

            // and suppress it (null is written as "")
            return null;
        }
    }
}

And here it is in action:

package example;

import java.io.StringWriter;
import java.util.Arrays;
import java.util.List;

import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.constraint.StrMinMax;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.exception.SuperCsvCellProcessorException;
import org.supercsv.io.CsvBeanWriter;
import org.supercsv.io.ICsvBeanWriter;
import org.supercsv.prefs.CsvPreference;

public class TestSuppressExceptions {

    private static final CellProcessor[] PROCESSORS = {
            new SuppressException(new StrMinMax(0, 4)),
            new SuppressException(new NotNull()) };

    private static final String[] HEADER = { "name", "age" };

    public static void main(String[] args) throws Exception {

        final StringWriter stringWriter = new StringWriter();
        ICsvBeanWriter beanWriter = null;
        try {
            beanWriter = new CsvBeanWriter(stringWriter,
                    CsvPreference.STANDARD_PREFERENCE);

            beanWriter.writeHeader(HEADER);

            // set up the data
            Person valid = new Person("Rick", 43);
            Person nullAge = new Person("Lori", null);
            Person totallyInvalid = new Person("Shane", null);
            Person valid2 = new Person("Carl", 12);
            List<Person> people = Arrays.asList(valid, nullAge, totallyInvalid,
                    valid2);

            for (Person person : people) {
                beanWriter.write(person, HEADER, PROCESSORS);

                if (!SuppressException.SUPPRESSED_EXCEPTIONS.isEmpty()) {
                    System.out.println("Suppressed exceptions for row "
                                        + beanWriter.getRowNumber() + ":");
                    for (SuperCsvCellProcessorException e :
                                        SuppressException.SUPPRESSED_EXCEPTIONS) {
                        System.out.println(e);
                    }
                    // clear ready for next row
                    SuppressException.SUPPRESSED_EXCEPTIONS.clear();
                }

            }

        } finally {
            beanWriter.close();
        }

        // CSV will have empty columns for invalid data
        System.out.println(stringWriter);

    }

}

Here's the suppressed exceptions output (row 4 has two exceptions, one for each column):

Suppressed exceptions for row 3:
org.supercsv.exception.SuperCsvConstraintViolationException: null value 
encountered processor=org.supercsv.cellprocessor.constraint.NotNull
context={lineNo=3, rowNo=3, columnNo=2, rowSource=[Lori, null]}
Suppressed exceptions for row 4:
org.supercsv.exception.SuperCsvConstraintViolationException: the length (5) 
of value 'Shane' does not lie between the min (0) and max (4) values (inclusive)
processor=org.supercsv.cellprocessor.constraint.StrMinMax
context={lineNo=4, rowNo=4, columnNo=2, rowSource=[Shane, null]}
org.supercsv.exception.SuperCsvConstraintViolationException: null value 
encountered processor=org.supercsv.cellprocessor.constraint.NotNull
context={lineNo=4, rowNo=4, columnNo=2, rowSource=[Shane, null]}

And the CSV output

name,age
Rick,43
Lori,
,
Carl,12

Notice how the invalid values were written as "" because the SuppressException processor returned null for those values (not that you'd use the CSV output anyway, as it's not valid!).

like image 68
James Bassett Avatar answered Oct 19 '22 23:10

James Bassett