Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala Garbage Collection ?

I'm relatively new to Scala.

If I have a construct like this,

sampleFile.map(line => line.map {
  var myObj = new MyClass(word); 
  myObj.func();
})

I create an object of MyClass and do something inside a class method (func()). I repeat this for all the lines in a file (through map). So, I create an object at every step of my iteration (for every line). The scope of myObj will be void when I start next iteration (will they be destroyed at the end of the block, or will they be orphaned out in memory?). My doubt is when does the garbage collection triggered? Also, is it expensive to create an object at every step of the iteration? Does this have any performance implication when the number of lines increases to 1 million?

like image 282
Learner Avatar asked Oct 21 '13 05:10

Learner


People also ask

Does Scala have garbage collection?

Garbage collection is the responsibility of the JVM, not Scala. So the precise details depend on which JVM you're running. There is no defined time at which garbage collection is triggered; the JVM tries to do it when it is opportune or necessary.

Why system GC is not recommended?

gc() is that it is inefficient. And in the worst case, it is horribly inefficient! Let me explain. A typical GC algorithm identifies garbage by traversing all non-garbage objects in the heap, and inferring that any object not visited must be garbage.

What is garbage collection spark?

Garbage Collection Spark runs on the Java Virtual Machine (JVM). Because Spark can store large amounts of data in memory, it has a major reliance on Java's memory management and garbage collection (GC). Therefore, garbage collection (GC) can be a major issue that can affect many Spark applications.

How often the heap memory is garbage collected?

The heap is created when the JVM starts up and may increase or decrease in size while the application runs. When the heap becomes full, garbage is collected. During the garbage collection objects that are no longer used are cleared, thus making space for new objects.


2 Answers

Your objects should all get garbage collected fairly quickly (assuming myObj.func() does not store a pointer to myObj somewhere else...). On the JVM, any unreferenced objects should get garbage collected - and your last reference to the new object disappears as soon as myObj goes out of scope.

Garbage collection of short-lived objects is generally very cheap and efficient, so you probably shouldn't worry about it (at least until you have benchmarks / measured performance problems that prove otherwise....)

In particular, since you appear to be doing IO (reading from a sample file?) then I expect the overhead of GC is negligible compared to the cost of your disk IO operations.

like image 57
mikera Avatar answered Sep 20 '22 17:09

mikera


Garbage collection is the responsibility of the JVM, not Scala. So the precise details depend on which JVM you're running. There is no defined time at which garbage collection is triggered; the JVM tries to do it when it is opportune or necessary.

Someone more knowledgeable than me on the subject of GC algorithms and JVM tuning could probably give you some concrete explanation to address your performance concerns, but in general I'd say you should just trust that JVMs are pretty good at garbage collecting "intelligently".

like image 25
Chris Martin Avatar answered Sep 22 '22 17:09

Chris Martin