Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I read a large CSV file with Scala Stream class?

How do I read a large CSV file (> 1 Gb) with a Scala Stream? Do you have a code example? Or would you use a different way to read a large CSV file without loading it into memory first?

like image 838
Jan Willem Tulp Avatar asked Nov 23 '10 10:11

Jan Willem Tulp


2 Answers

Just use Source.fromFile(...).getLines as you already stated.

That returns an Iterator, which is already lazy (You'd use stream as a lazy collection where you wanted previously retrieved values to be memoized, so you can read them again)

If you're getting memory problems, then the problem will lie in what you're doing after getLines. Any operation like toList, which forces a strict collection, will cause the problem.

like image 92
Kevin Wright Avatar answered Sep 23 '22 05:09

Kevin Wright


I hope you don't mean Scala's collection.immutable.Stream with Stream. This is not what you want. Stream is lazy, but does memoization.

I don't know what you plan to do, but just reading the file line-by-line should work very well without using high amounts of memory.

getLines should evaluate lazily and should not crash (as long as your file does not have more than 2³² lines, afaik). If it does, ask on #scala or file a bug ticket (or do both).

like image 34
soc Avatar answered Sep 21 '22 05:09

soc