Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using scala-native for in-memory data processing

Tags:

scala-native

I'm wondering whether it is possible to leverage scala-native for performing large in-memory jobs.

For instance, imagine you have a spark job that needs 150GB of RAM so you'd have to run 5x30GB executors in a spark cluster since JVM garbage collectors wouldn't catch up with heap bigger than that.

Imagine that 99% of the data being processed are Strings in collections.

Do you think that scala-native would help here? I mean, as an alternative to Spark?

How does it treat String? Does it also have this overhead because jvm treats it as class?

What are the memory ("Heap") GC limits as the classic 30GB in case of JVM? Would I also end up with a limit like 30GB?

Or is this generally a bad idea? To use scala-native for in-memory data processing. My guess is that scala-offheap is better way to go.

like image 991
lisak Avatar asked Nov 08 '22 09:11

lisak


1 Answers

in-memory data processing is a use case where scala-native will shine compared to Scala on JVM.

SN supports all types of memory allocations. Static allocation (you can define the global variable in C and return a pointer to it with a C function), stack allocation, dynamic allocation based on C malloc/free and garbaged dynamic allocation (Scala new).

For Strings, you can use 8 bits per char C String, Java style 16 bits per char or you can implement you own Small String Optimization as seen in C++, using @struct and pointers.

Of course, you have temporal drawbacks, like SN still a pre 0.1 version and lack of Java library being ported to Scala.

like image 127
Francois Bertrand Avatar answered Jan 04 '23 01:01

Francois Bertrand