We are developing a small image processing library for Scala (student project). The library is completely functional (i.e. no mutability). The raster of image is stored as Stream[Stream[Int]]
to exploit the benefits of lazy evaluation with least efforts. However upon performing a few operations on an image the heap gets full and an OutOfMemoryError
is thrown. (for example, up to 4 operations can be performed on a jpeg image sized 500 x 400, 35 kb before JVM heap runs out of space.)
The approaches we have thought of are:
Stream[Stream[Int]]
, the one which is more suited to the task of image processing. (Again we do not have much idea about the functional data structures beyond the simple List
and Stream
.)The last option we have is giving up on immutability and making it a mutable library (like the popular image processing libraries), which we don't really want to do. Please suggest us some way to keep this library functional and still functional, if you know what I mean.
Thank you,
Siddharth Raina.
ADDENDUM:
For an image sized 1024 x 768, the JVM runs out of heap space even for a single mapping operation. Some example code from our test:
val image = Image from "E:/metallica.jpg"
val redded = image.map(_ & 0xff0000)
redded.display(title = "Redded")
And the output:
"C:\Program Files (x86)\Java\jdk1.6.0_02\bin\java" -Didea.launcher.port=7533 "-Didea.launcher.bin.path=C:\Program Files (x86)\JetBrains\IntelliJ IDEA Community Edition 10.0.2\bin" -Dfile.encoding=windows-1252 -classpath "C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\charsets.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\deploy.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\javaws.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\jce.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\jsse.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\management-agent.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\plugin.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\resources.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\rt.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\ext\dnsns.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\ext\localedata.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\ext\sunjce_provider.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\ext\sunmscapi.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\ext\sunpkcs11.jar;C:\new Ph\Phoebe\out\production\Phoebe;E:\Inventory\Marvin.jar;C:\scala-2.8.1.final\lib\scala-library.jar;C:\scala-2.8.1.final\lib\scala-swing.jar;C:\scala-2.8.1.final\lib\scala-dbc.jar;C:\new Ph;C:\scala-2.8.1.final\lib\scala-compiler.jar;E:\Inventory\commons-math-2.2.jar;E:\Inventory\commons-math-2.2-sources.jar;E:\Inventory\commons-math-2.2-javadoc.jar;E:\Inventory\jmathplot.jar;E:\Inventory\jmathio.jar;E:\Inventory\jmatharray.jar;E:\Inventory\Javax Media.zip;E:\Inventory\jai-core-1.1.3-alpha.jar;C:\Program Files (x86)\JetBrains\IntelliJ IDEA Community Edition 10.0.2\lib\idea_rt.jar" com.intellij.rt.execution.application.AppMain phoebe.test.ImageTest
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at scala.collection.Iterator$class.toStream(Iterator.scala:1011)
at scala.collection.IndexedSeqLike$Elements.toStream(IndexedSeqLike.scala:52)
at scala.collection.Iterator$$anonfun$toStream$1.apply(Iterator.scala:1011)
at scala.collection.Iterator$$anonfun$toStream$1.apply(Iterator.scala:1011)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:565)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:557)
at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:168)
at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:168)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:565)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:557)
at scala.collection.immutable.Stream$$anonfun$flatten1$1$1.apply(Stream.scala:453)
at scala.collection.immutable.Stream$$anonfun$flatten1$1$1.apply(Stream.scala:453)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:565)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:557)
at scala.collection.immutable.Stream.length(Stream.scala:113)
at scala.collection.SeqLike$class.size(SeqLike.scala:221)
at scala.collection.immutable.Stream.size(Stream.scala:48)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:388)
at scala.collection.immutable.Stream.toArray(Stream.scala:48)
at phoebe.picasso.Image.force(Image.scala:85)
at phoebe.picasso.SimpleImageViewer.<init>(SimpleImageViewer.scala:10)
at phoebe.picasso.Image.display(Image.scala:91)
at phoebe.test.ImageTest$.main(ImageTest.scala:14)
at phoebe.test.ImageTest.main(ImageTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115)
Process finished with exit code 1
Currently, the cats library and its ecosystem is the cornerstone of functional programming in Scala. For many, they will be the first choice when choosing stack for the new functional project.
Scala is a language that features full support of Functional Programming as well as the Object-Oriented Paradigm. It is one of the most widely utilized languages by the Functional community, up to par with F#, and Haskell, Clojure, among others.
Scala is also a functional language in the sense that every function is a value. Scala provides a lightweight syntax for defining anonymous functions, it supports higher-order functions, it allows functions to be nested, and it supports currying.
Spark has a functional programming API in multiple languages that provides more operators than map and reduce, and does this via a distributed data framework called resilient distributed datasets or RDDs.
If I understood correctly, you store each individual pixel in one Stream
element, and this can be inefficient. What you can do is create your custom LazyRaster
class which contains lazy references to blocks of the image of some size (for instance, 20x20). The first time some block is written, its corresponding array is initialized, and from there on changing a pixel means writing to that array.
This is more work, but may result in better performance. Furthermore, if you wish to support stacking of image operations (e.g. do a map - take - map), and then evaluating the image in "one-go", the implementation could get tricky - stream implementation is the best evidence for this.
Another thing one can do is ensure that the old Stream
s are being properly garbage collected. I suspect image
object in your example is a wrapper for your streams. If you wish to stack multiple image operations (like mapping) together and be able to gc the references you no longer need, you have to make sure that you don't hold any references to a stream - note that this is not ensured if:
image
in the example)Image
wrapper contains such a reference.Without knowing more about the exact use cases, its hard to say more.
Personally, I would avoid Stream
s altogether, and simply use some immutable array-based data structure which is both space-efficient and avoids boxing. The only place where I potentially see Stream
s being used is in iterative image transformations, like convolution or applying a stack of filters. You wouldn't have a Stream
of pixels, but a Stream
of images, instead. This could be a nice way to express a sequence of transformations - in this case, the comments about gc in the link given above apply.
If you process large streams, you need to avoid holding onto a reference to the head of the stream. This will prevent garbage collection.
It's possible that calling certain methods on Stream
will internally hold onto the head. See the discussion here: Functional processing of Scala streams without OutOfMemory errors
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With