I'm wondering whether it is possible to leverage scala-native for performing large in-memory jobs.
For instance, imagine you have a spark job that needs 150GB of RAM so you'd have to run 5x30GB executors in a spark cluster since JVM garbage collectors wouldn't catch up with heap bigger than that.
Imagine that 99% of the data being processed are Strings
in collections.
Do you think that scala-native would help here? I mean, as an alternative to Spark?
How does it treat String
? Does it also have this overhead because jvm treats it as class?
What are the memory ("Heap") GC limits as the classic 30GB in case of JVM? Would I also end up with a limit like 30GB?
Or is this generally a bad idea? To use scala-native for in-memory data processing. My guess is that scala-offheap is better way to go.
in-memory data processing is a use case where scala-native will shine compared to Scala on JVM.
SN supports all types of memory allocations. Static allocation (you can define the global variable in C and return a pointer to it with a C function), stack allocation, dynamic allocation based on C malloc/free and garbaged dynamic allocation (Scala new).
For Strings, you can use 8 bits per char C String, Java style 16 bits per char or you can implement you own Small String Optimization as seen in C++, using @struct and pointers.
Of course, you have temporal drawbacks, like SN still a pre 0.1 version and lack of Java library being ported to Scala.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With