I would like to obtain an elegant pipeline for converting a textual input into a json output. The flow should go something like this:
(input file) // concatenated htmls and url
Collection[String] // unit: line
Collection[String,String] // unit: url, html doc
Collection[MyObj] // unit: parsed MyObj
(output file) // json representation of parsed objects
Currently I do this with nested for loops, but I would like to write this in a more functional style. Is there a standard way of doing this, or typical libraries I should have a look at? Note: the data is fairly large, so I can't have it entirely in memory.
Perhaps you can use Scalaz-stream. The library provides compositionality, expressiveness, resource safety, and speed to process IO. Also, It uses instant memory which will be very useful for handling large data. Here is github for that:
https://github.com/scalaz/scalaz-stream
Youtube talk about it:
https://www.youtube.com/watch?v=GSZhUZT7Fyc
https://www.youtube.com/watch?v=nCxBEUyIBt0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With