Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Spock: Reading Test Data from CSV File

I'm trying to write an elegant Spock specification that will read a very large test data from CSV file without loading all the data into the memory. I'm looking for your feedback on how you might do it better than what I currently have here.

Let's assume my simplified CSV file looks like the below:-


The assertion is "column 1" + 1 == "column 2"

I'm using OpenCSV to do my CSV parsing simply because the actual CSV file contains strings with special characters like double quotes and commas, and rudimentary parsing through splitting the string by comma and such will not work.


Attempt 1

My first attempt is to loop through the CSV and perform assertion on every row. While this approach works, I can't use @Unroll to isolate every assertion into separate independent tests.

def "read from csv"() {
    def reader = new CSVReader(...)
    def fields

    while ((fields = reader.readNext()) != null) {
        def firstNum = Integer.valueOf(fields[0])
        def secondNum = Integer.valueOf(fields[1])

        firstNum + 1 == secondNum

Attempt 2

This attempt allows me to utilize @Unroll but this requires loading the entire data into memory, which is what I'm trying to avoid in the first place.

def "read from csv"() {
    Integer.valueOf(firstNum as String) + 1 == Integer.valueOf(secondNum as String)

    [firstNum, secondNum] << new CSVReader(...).readAll()

Attempt 3

After reading http://spock-framework.readthedocs.org/en/latest/data_driven_testing.html#data-pipes , I can just create an object that implements Iterable... and Spock will only instruct the data provider to query the next value only when it is needed, which is exactly what I want.

def "read from csv"() {
    CSVParser csvParser = new CSVParser()

    def fields = csvParser.parseLine(line as String)
    def firstNum = Integer.valueOf(fields[0])
    def secondNum = Integer.valueOf(fields[1])

    firstNum + 1 == secondNum

    line << new Iterable() {
        Iterator iterator() {
            return new Scanner(...)

This attempt isn't too bad, but it looks weird that I have to do some CSV parsing in the expect block that clutters the actual intent here, which is to perform the assertion.

Attempt 4

My final attempt pretty much creates an iterator wrapper that will return the fields as separate variables, but the code is rather ugly to read unless I extract the Iterable class into a separate API.

def "read from csv"() {
    Integer.valueOf(firstNum as String) + 1 == Integer.valueOf(secondNum as String)

    [firstNum, secondNum] << new Iterable() {
        Iterator iterator() {
            new Iterator() {
                def reader = new CSVReader(...)

                def fields

                boolean hasNext() {
                    fields = reader.readNext()
                    return fields != null

                Object next() {
                    return fields

                void remove() {
                    throw new UnsupportedOperationException()


My question is... how would you approach this problem? Is there a better way (or a better CSV library)? I know Apache Commons CSV is probably the only parser I'm aware of that implements Iterable, but it has been a SNAPSHOT for a long time.

Thanks much.

like image 216
limc Avatar asked Aug 07 '14 18:08


2 Answers

Write a utility class CSVFile that implements Iterable<Iterable<String>> (or Iterable<Iterable<Integer>>). Then use where: [firstNum, secondNum] << new CSVFile("path/to/file").

like image 180
Peter Niederwieser Avatar answered Oct 12 '22 09:10

Peter Niederwieser

Probably GroovyCSV will do what you are looking for:

GroovyCSV is a library to make csv processing just a little bit Groovier. The library uses opencsv behind the scenes and merely tries to add a thin layer of “Groovy-ness” to the mix.

It's CsvParser methods return iterators.

like image 34
Jeff Avatar answered Oct 12 '22 10:10
