Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala memory issue on List vs. Vector

I wrote a solution to project Euler problem #59 in Scala and I do not understand why switching between Vector and List adds what I think is a memory leak.

Here is a working, brute force solution using Vectors.

val code = scala.io.Source.fromFile("e59.txt").getLines()
    .flatMap(l => l.split(',')).map(_.toInt).toVector

val commonWords = scala.io.Source.fromFile("common_words.txt").getLines().toVector

def decode(k: Int)(code: Vector[Int])(pswd: Vector[Int]): Vector[Int] = {
    code.grouped(k).flatMap(cs => cs.toVector.zip(pswd).map(t => t._1 ^ t._2)).toVector
}

def scoreText(text: Vector[Int]): Int = {
    if (text.contains((c: Int) => (c < 0 || c > 128))) -1
    else {
        val words = text.map(_.toChar).mkString.toLowerCase.split(' ')
        words.length - words.diff(commonWords).length
    }
}

lazy val psswds = for {
    a <- (97 to 122);
    b <- (97 to 122);
    c <- (97 to 122)
    } yield Vector(a, b, c)

val ans = psswds.toStream.map(decode(3)(code))
    .map(text => (text, scoreText(text)))
    .maxBy(_._2)._1.sum

println(ans)

I store original code (a collection of ordered ints), each password and some common English words as Vectors.

However, if I replace Vector with List, my program slows down with each checked password and eventually crashes:

val code = scala.io.Source.fromFile("e59.txt").getLines()
    .flatMap(l => l.split(',')).map(_.toInt).toList

val commonWords = scala.io.Source.fromFile("common_words.txt").getLines().toList

def decode(k: Int)(code: List[Int])(pswd: List[Int]): List[Int] = {
    println(pswd)
    code.grouped(k).flatMap(cs => cs.toList.zip(pswd).map(t => t._1 ^ t._2)).toList
}

def scoreText(text: List[Int]): Int = {
    if (text.contains((c: Int) => (c < 0 || c > 128))) -1
    else {
        val words = text.map(_.toChar).mkString.toLowerCase.split(' ')
        words.length - words.diff(commonWords).length
    }
}

lazy val psswds = for {
    a <- (97 to 122);
    b <- (97 to 122);
    c <- (97 to 122)
    } yield List(a, b, c)

val ans = psswds.toStream.map(decode(3)(code))
    .map(text => (text, scoreText(text)))
    .maxBy(_._2)._1.sum

println(ans)

Error:

java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.lang.String.valueOf(String.java:2861)
    at java.lang.Character.toString(Character.java:4439)
    at java.lang.String.valueOf(String.java:2847)
    at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
    at scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:349)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:342)
    at scala.collection.AbstractTraversable.addString(Traversable.scala:104)
    at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:308)
    at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
    at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:310)
    at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
    at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:312)
    at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
    at Main$$anon$1.Main$$anon$$scoreText(e59_list.scala:14)
    at Main$$anon$1$$anonfun$5.apply(e59_list.scala:26)
    at Main$$anon$1$$anonfun$5.apply(e59_list.scala:26)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1222)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1212)
    at scala.collection.immutable.Stream.foreach(Stream.scala:595)
    at scala.collection.TraversableOnce$class.maxBy(TraversableOnce.scala:227)
    at scala.collection.AbstractTraversable.maxBy(Traversable.scala:104)
    at Main$$anon$1.<init>(e59_list.scala:27)
    at Main$.main(e59_list.scala:1)
    at Main.main(e59_list.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at scala.reflect.internal.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:70)

Files used: common_words.txt

a
able
about
across
after
all
almost
also
am
among
an
and
any
are
as
at
be
because
been
but
by
can
cannot
could
dear
did
do
does
either
else
ever
every
for
from
get
got
had
has
have
he
her
hers
him
his
how
however
i
if
in
into
is
it
its
just
least
let
like
likely
may
me
might
most
must
my
neither
no
nor
not
of
off
often
on
only
or
other
our
own
rather
said
say
says
she
should
since
so
some
than
that
the
their
them
then
there
these
they
this
tis
to
too
twas
us
wants
was
we
were
what
when
where
which
while
who
whom
why
will
with
would
yet
you
your

e59.txt

79,59,12,2,79,35,8,28,20,2,3,68,8,9,68,45,0,12,9,67,68,4,7,5,23,27,1,21,79,85,78,79,85,71,38,10,71,27,12,2,79,6,2,8,13,9,1,13,9,8,68,19,7,1,71,56,11,21,11,68,6,3,22,2,14,0,30,79,1,31,6,23,19,10,0,73,79,44,2,79,19,6,28,68,16,6,16,15,79,35,8,11,72,71,14,10,3,79,12,2,79,19,6,28,68,32,0,0,73,79,86,71,39,1,71,24,5,20,79,13,9,79,16,15,10,68,5,10,3,14,1,10,14,1,3,71,24,13,19,7,68,32,0,0,73,79,87,71,39,1,71,12,22,2,14,16,2,11,68,2,25,1,21,22,16,15,6,10,0,79,16,15,10,22,2,79,13,20,65,68,41,0,16,15,6,10,0,79,1,31,6,23,19,28,68,19,7,5,19,79,12,2,79,0,14,11,10,64,27,68,10,14,15,2,65,68,83,79,40,14,9,1,71,6,16,20,10,8,1,79,19,6,28,68,14,1,68,15,6,9,75,79,5,9,11,68,19,7,13,20,79,8,14,9,1,71,8,13,17,10,23,71,3,13,0,7,16,71,27,11,71,10,18,2,29,29,8,1,1,73,79,81,71,59,12,2,79,8,14,8,12,19,79,23,15,6,10,2,28,68,19,7,22,8,26,3,15,79,16,15,10,68,3,14,22,12,1,1,20,28,72,71,14,10,3,79,16,15,10,68,3,14,22,12,1,1,20,28,68,4,14,10,71,1,1,17,10,22,71,10,28,19,6,10,0,26,13,20,7,68,14,27,74,71,89,68,32,0,0,71,28,1,9,27,68,45,0,12,9,79,16,15,10,68,37,14,20,19,6,23,19,79,83,71,27,11,71,27,1,11,3,68,2,25,1,21,22,11,9,10,68,6,13,11,18,27,68,19,7,1,71,3,13,0,7,16,71,28,11,71,27,12,6,27,68,2,25,1,21,22,11,9,10,68,10,6,3,15,27,68,5,10,8,14,10,18,2,79,6,2,12,5,18,28,1,71,0,2,71,7,13,20,79,16,2,28,16,14,2,11,9,22,74,71,87,68,45,0,12,9,79,12,14,2,23,2,3,2,71,24,5,20,79,10,8,27,68,19,7,1,71,3,13,0,7,16,92,79,12,2,79,19,6,28,68,8,1,8,30,79,5,71,24,13,19,1,1,20,28,68,19,0,68,19,7,1,71,3,13,0,7,16,73,79,93,71,59,12,2,79,11,9,10,68,16,7,11,71,6,23,71,27,12,2,79,16,21,26,1,71,3,13,0,7,16,75,79,19,15,0,68,0,6,18,2,28,68,11,6,3,15,27,68,19,0,68,2,25,1,21,22,11,9,10,72,71,24,5,20,79,3,8,6,10,0,79,16,8,79,7,8,2,1,71,6,10,19,0,68,19,7,1,71,24,11,21,3,0,73,79,85,87,79,38,18,27,68,6,3,16,15,0,17,0,7,68,19,7,1,71,24,11,21,3,0,71,24,5,20,79,9,6,11,1,71,27,12,21,0,17,0,7,68,15,6,9,75,79,16,15,10,68,16,0,22,11,11,68,3,6,0,9,72,16,71,29,1,4,0,3,9,6,30,2,79,12,14,2,68,16,7,1,9,79,12,2,79,7,6,2,1,73,79,85,86,79,33,17,10,10,71,6,10,71,7,13,20,79,11,16,1,68,11,14,10,3,79,5,9,11,68,6,2,11,9,8,68,15,6,23,71,0,19,9,79,20,2,0,20,11,10,72,71,7,1,71,24,5,20,79,10,8,27,68,6,12,7,2,31,16,2,11,74,71,94,86,71,45,17,19,79,16,8,79,5,11,3,68,16,7,11,71,13,1,11,6,1,17,10,0,71,7,13,10,79,5,9,11,68,6,12,7,2,31,16,2,11,68,15,6,9,75,79,12,2,79,3,6,25,1,71,27,12,2,79,22,14,8,12,19,79,16,8,79,6,2,12,11,10,10,68,4,7,13,11,11,22,2,1,68,8,9,68,32,0,0,73,79,85,84,79,48,15,10,29,71,14,22,2,79,22,2,13,11,21,1,69,71,59,12,14,28,68,14,28,68,9,0,16,71,14,68,23,7,29,20,6,7,6,3,68,5,6,22,19,7,68,21,10,23,18,3,16,14,1,3,71,9,22,8,2,68,15,26,9,6,1,68,23,14,23,20,6,11,9,79,11,21,79,20,11,14,10,75,79,16,15,6,23,71,29,1,5,6,22,19,7,68,4,0,9,2,28,68,1,29,11,10,79,35,8,11,74,86,91,68,52,0,68,19,7,1,71,56,11,21,11,68,5,10,7,6,2,1,71,7,17,10,14,10,71,14,10,3,79,8,14,25,1,3,79,12,2,29,1,71,0,10,71,10,5,21,27,12,71,14,9,8,1,3,71,26,23,73,79,44,2,79,19,6,28,68,1,26,8,11,79,11,1,79,17,9,9,5,14,3,13,9,8,68,11,0,18,2,79,5,9,11,68,1,14,13,19,7,2,18,3,10,2,28,23,73,79,37,9,11,68,16,10,68,15,14,18,2,79,23,2,10,10,71,7,13,20,79,3,11,0,22,30,67,68,19,7,1,71,8,8,8,29,29,71,0,2,71,27,12,2,79,11,9,3,29,71,60,11,9,79,11,1,79,16,15,10,68,33,14,16,15,10,22,73
like image 200
hilberts_drinking_problem Avatar asked Dec 19 '22 00:12

hilberts_drinking_problem


1 Answers

Large amount of Lists create more load on GC comparing to the same Vectors. But your problem is not about right choice of collections, but about wrong use of Stream.

Scala's streams can be very memory inefficient if used improperly. In your case, I assume, you were trying to use Stream to avoid eager computation of the transformed passwds collection, but you actually made the things worse (as Stream not only memoized your elements, it created extra overhead with Stream wrappers of these elements).

What you had to do is just to replace toStream with view. It will create collection wrapper which makes nearly all transformations lazy (basically what you tried to achieve).

val ans = psswds.view.map(decode(3)(code))
 .map(text => (text, scoreText(text)))
 .maxBy(_._2)._1.sum

After this tiny fix you program runs fine even with -Xmx5m (I checked).

There are also many other things to optimize in your program (try to avoid creating excessive collections), but I'll leave it to you.

like image 102
Aivean Avatar answered Feb 01 '23 17:02

Aivean