Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

skip malformed csv row

I have been trying to read a csv and add fields to a Data Structure. But, one of the row is not formed properly, and I am aware of that. I just want to skip the row and move on to another. But, even though I am catching the exception, It's still breaking the loop. Any idea what I am missing here?

My csv:

"id","name","email"
121212,"Steve","[email protected]"
121212,"Steve","[email protected]",,
121212,"Steve","[email protected]"

My code:

import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;

public static void main(String[] args) throws Exception{
    Path path = Paths.get("list2.csv");
    CsvMapper mapper = new CsvMapper();
    CsvSchema schema = CsvSchema.emptySchema().withHeader();
    MappingIterator<Object> it = mapper.reader(Object.class)
            .with(schema)
            .readValues(path.toFile());

    try{
        while(it.hasNext()){
            Object row;
            try{
                row = it.nextValue();
            } catch (IOException e){
                e.printStackTrace();
                continue;
            }
        }
    } catch (ArrayIndexOutOfBoundsException e){
        e.printStackTrace();
    }

}

Exception:

com.fasterxml.jackson.core.JsonParseException: Too many entries: expected at most 3 (value #3 (0 chars) "")
 at [Source: java.io.InputStreamReader@12b3519c; line: 3, column: 38]
    at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1486)
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518)
    at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntryExpectEOL(CsvParser.java:601)
    at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntry(CsvParser.java:587)
    at com.fasterxml.jackson.dataformat.csv.CsvParser.nextToken(CsvParser.java:474)
    at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapObject(UntypedObjectDeserializer.java:592)
    at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:440)
    at com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:188)
    at CSVTest.main(CSVTest.java:24)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
java.lang.ArrayIndexOutOfBoundsException: 3
    at com.fasterxml.jackson.dataformat.csv.CsvSchema.column(CsvSchema.java:941)
    at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNamedValue(CsvParser.java:614)
    at com.fasterxml.jackson.dataformat.csv.CsvParser.nextToken(CsvParser.java:476)
    at com.fasterxml.jackson.databind.MappingIterator.hasNextValue(MappingIterator.java:158)
    at CSVTest.main(CSVTest.java:21)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
like image 740
notacyborg Avatar asked Sep 24 '15 02:09

notacyborg


2 Answers

With Jackson 2.6 handling of readValues() has been improved to try to recover from processing errors, such that in many cases you can just try again, to read following valid rows. So make sure to use at least version 2.6.2.

Earlier versions did not recover as well, usually rendering rest of the content unprocessable; this may be what happened in your case.

Another possibility, given that your problem is not with invalid CSV, but rather one not mappable as POJOs (at least the way as POJO is defined), is to read content as a sequence of String[], and handling mapping manually. Jackson's CSV parser itself does not mind any number of columns, it is the higher level databinding that does like finding "extra" content that it does not recognize.

like image 54
StaxMan Avatar answered Sep 28 '22 09:09

StaxMan


Your CSV is not necessarily malformed, in fact it's very common to have rows with varying number of columns.

univocity-parsers handles this without any trouble.

The easiest way would be:

BeanListProcessor<TestBean> rowProcessor = new BeanListProcessor<TestBean>(TestBean.class);

CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);

CsvParser parser = new CsvParser(parserSettings);
parser.parse(new FileReader(Paths.get("list2.csv").toFile());

// The BeanListProcessor provides a list of objects extracted from the input.
List<TestBean> beans = rowProcessor.getBeans();

If you want to discard the elements built using a row with inconsistent number of column, override the beanProcessed method and use the ParsingContext object to analyse your data and decide whether to keep or drop the row.

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

like image 35
Jeronimo Backes Avatar answered Sep 28 '22 08:09

Jeronimo Backes