I'm trying to write an elegant Spock specification that will read a very large test data from CSV file without loading all the data into the memory. I'm looking for your feedback on how you might do it better than what I currently have here.
Let's assume my simplified CSV file looks like the below:-
1,2
3,4
5,6
The assertion is "column 1" + 1 == "column 2"
I'm using OpenCSV to do my CSV parsing simply because the actual CSV file contains strings with special characters like double quotes and commas, and rudimentary parsing through splitting the string by comma and such will not work.
<dependency>
<groupId>net.sf.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>2.3</version>
</dependency>
Attempt 1
My first attempt is to loop through the CSV and perform assertion on every row. While this approach works, I can't use @Unroll
to isolate every assertion into separate independent tests.
def "read from csv"() {
expect:
def reader = new CSVReader(...)
def fields
while ((fields = reader.readNext()) != null) {
def firstNum = Integer.valueOf(fields[0])
def secondNum = Integer.valueOf(fields[1])
firstNum + 1 == secondNum
}
}
Attempt 2
This attempt allows me to utilize @Unroll
but this requires loading the entire data into memory, which is what I'm trying to avoid in the first place.
@Unroll
def "read from csv"() {
expect:
Integer.valueOf(firstNum as String) + 1 == Integer.valueOf(secondNum as String)
where:
[firstNum, secondNum] << new CSVReader(...).readAll()
}
Attempt 3
After reading http://spock-framework.readthedocs.org/en/latest/data_driven_testing.html#data-pipes , I can just create an object that implements Iterable
... and Spock will only instruct the data provider to query the next value only when it is needed, which is exactly what I want.
@Unroll
def "read from csv"() {
given:
CSVParser csvParser = new CSVParser()
expect:
def fields = csvParser.parseLine(line as String)
def firstNum = Integer.valueOf(fields[0])
def secondNum = Integer.valueOf(fields[1])
firstNum + 1 == secondNum
where:
line << new Iterable() {
@Override
Iterator iterator() {
return new Scanner(...)
}
}
}
This attempt isn't too bad, but it looks weird that I have to do some CSV parsing in the expect
block that clutters the actual intent here, which is to perform the assertion.
Attempt 4
My final attempt pretty much creates an iterator wrapper that will return the fields as separate variables, but the code is rather ugly to read unless I extract the Iterable class into a separate API.
@Unroll
def "read from csv"() {
expect:
Integer.valueOf(firstNum as String) + 1 == Integer.valueOf(secondNum as String)
where:
[firstNum, secondNum] << new Iterable() {
@Override
Iterator iterator() {
new Iterator() {
def reader = new CSVReader(...)
def fields
@Override
boolean hasNext() {
fields = reader.readNext()
return fields != null
}
@Override
Object next() {
return fields
}
@Override
void remove() {
throw new UnsupportedOperationException()
}
}
}
}
}
Question
My question is... how would you approach this problem? Is there a better way (or a better CSV library)? I know Apache Commons CSV is probably the only parser I'm aware of that implements Iterable
, but it has been a SNAPSHOT
for a long time.
Thanks much.
Write a utility class CSVFile
that implements Iterable<Iterable<String>>
(or Iterable<Iterable<Integer>>
). Then use where: [firstNum, secondNum] << new CSVFile("path/to/file")
.
Probably GroovyCSV will do what you are looking for:
GroovyCSV is a library to make csv processing just a little bit Groovier. The library uses opencsv behind the scenes and merely tries to add a thin layer of “Groovy-ness” to the mix.
It's CsvParser methods return iterators.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With