Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spock: Reading Test Data from CSV File

I'm trying to write an elegant Spock specification that will read a very large test data from CSV file without loading all the data into the memory. I'm looking for your feedback on how you might do it better than what I currently have here.

Let's assume my simplified CSV file looks like the below:-

1,2
3,4
5,6

The assertion is "column 1" + 1 == "column 2"

I'm using OpenCSV to do my CSV parsing simply because the actual CSV file contains strings with special characters like double quotes and commas, and rudimentary parsing through splitting the string by comma and such will not work.

<dependency>
    <groupId>net.sf.opencsv</groupId>
    <artifactId>opencsv</artifactId>
    <version>2.3</version>
</dependency>

Attempt 1

My first attempt is to loop through the CSV and perform assertion on every row. While this approach works, I can't use @Unroll to isolate every assertion into separate independent tests.

def "read from csv"() {
    expect:
    def reader = new CSVReader(...)
    def fields

    while ((fields = reader.readNext()) != null) {
        def firstNum = Integer.valueOf(fields[0])
        def secondNum = Integer.valueOf(fields[1])

        firstNum + 1 == secondNum
    }
}

Attempt 2

This attempt allows me to utilize @Unroll but this requires loading the entire data into memory, which is what I'm trying to avoid in the first place.

@Unroll
def "read from csv"() {
    expect:
    Integer.valueOf(firstNum as String) + 1 == Integer.valueOf(secondNum as String)

    where:
    [firstNum, secondNum] << new CSVReader(...).readAll()
}

Attempt 3

After reading http://spock-framework.readthedocs.org/en/latest/data_driven_testing.html#data-pipes , I can just create an object that implements Iterable... and Spock will only instruct the data provider to query the next value only when it is needed, which is exactly what I want.

@Unroll
def "read from csv"() {
    given:
    CSVParser csvParser = new CSVParser()

    expect:
    def fields = csvParser.parseLine(line as String)
    def firstNum = Integer.valueOf(fields[0])
    def secondNum = Integer.valueOf(fields[1])

    firstNum + 1 == secondNum

    where:
    line << new Iterable() {
        @Override
        Iterator iterator() {
            return new Scanner(...)
        }
    }
}

This attempt isn't too bad, but it looks weird that I have to do some CSV parsing in the expect block that clutters the actual intent here, which is to perform the assertion.

Attempt 4

My final attempt pretty much creates an iterator wrapper that will return the fields as separate variables, but the code is rather ugly to read unless I extract the Iterable class into a separate API.

@Unroll
def "read from csv"() {
    expect:
    Integer.valueOf(firstNum as String) + 1 == Integer.valueOf(secondNum as String)

    where:
    [firstNum, secondNum] << new Iterable() {
        @Override
        Iterator iterator() {
            new Iterator() {
                def reader = new CSVReader(...)

                def fields

                @Override
                boolean hasNext() {
                    fields = reader.readNext()
                    return fields != null
                }

                @Override
                Object next() {
                    return fields
                }

                @Override
                void remove() {
                    throw new UnsupportedOperationException()
                }
            }
        }
    }
}

Question

My question is... how would you approach this problem? Is there a better way (or a better CSV library)? I know Apache Commons CSV is probably the only parser I'm aware of that implements Iterable, but it has been a SNAPSHOT for a long time.

Thanks much.

like image 216
limc Avatar asked Aug 07 '14 18:08

limc


2 Answers

Write a utility class CSVFile that implements Iterable<Iterable<String>> (or Iterable<Iterable<Integer>>). Then use where: [firstNum, secondNum] << new CSVFile("path/to/file").

like image 180
Peter Niederwieser Avatar answered Oct 12 '22 09:10

Peter Niederwieser


Probably GroovyCSV will do what you are looking for:

GroovyCSV is a library to make csv processing just a little bit Groovier. The library uses opencsv behind the scenes and merely tries to add a thin layer of “Groovy-ness” to the mix.

It's CsvParser methods return iterators.

like image 34
Jeff Avatar answered Oct 12 '22 10:10

Jeff