Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Idiomatic (functional) file processing pipeline in Scala

Tags:

scala

I would like to obtain an elegant pipeline for converting a textual input into a json output. The flow should go something like this:

(input file)              // concatenated htmls and url
Collection[String]        // unit: line
Collection[String,String] // unit: url, html doc
Collection[MyObj]         // unit: parsed MyObj
(output file)             // json representation of parsed objects

Currently I do this with nested for loops, but I would like to write this in a more functional style. Is there a standard way of doing this, or typical libraries I should have a look at? Note: the data is fairly large, so I can't have it entirely in memory.

like image 990
mitchus Avatar asked Nov 01 '22 05:11

mitchus


1 Answers

Perhaps you can use Scalaz-stream. The library provides compositionality, expressiveness, resource safety, and speed to process IO. Also, It uses instant memory which will be very useful for handling large data. Here is github for that:

https://github.com/scalaz/scalaz-stream

Youtube talk about it:

https://www.youtube.com/watch?v=GSZhUZT7Fyc

https://www.youtube.com/watch?v=nCxBEUyIBt0

like image 133
Xiaohe Dong Avatar answered Nov 04 '22 10:11

Xiaohe Dong