Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cloud Dataflow - Increase JVM Xmx Value

We are trying to run a Google Cloud Dataflow job in the cloud but we keep getting "java.lang.OutOfMemoryError: Java heap space".

We are trying to process 610 million records from a Big Query table and writing the processed records to 12 different outputs (main + 11 side outputs).

We have tried increasing our number of instances to 64 n1-standard-4 instances but we are still getting the issue.

The Xmx value on the VMs seem to be set at ~4GB(-Xmx3951927296), even though the instances have 15GB memory. Is there any way of increasing the Xmx Value?

The job ID is - 2015-06-11_21_32_32-16904087942426468793

like image 572
DarrenCibis Avatar asked Jun 12 '15 05:06

DarrenCibis


People also ask

Which JVM parameter sets maximum heap size?

The -xmx option is used to set the final and maximum heap size in Java.

What is max value for XMX in Java?

The Xmx value is half the available memory with a minimum of 16 MB and a maximum of 512 MB.

Does XMX Reserve memory?

Yes, the JVM reserves the memory specified by Xms at the start and might reserve upto Xmx but the reservation need not be in the physical memory, it can also be in the swap.


1 Answers

You can't directly set the heap size. Dataflow, however, scales the heap size with the machine type. You can pick a machine with more memory by setting the flag "--machineType". The heap size should increase linearly with the total memory of the machine type.

Dataflow deliberately limits the heap size to avoid negatively impacting the shuffler.

Is your code explicitly accumulating values from multiple records in memory? Do you expect 4GB to be insufficient for any given record?

Dataflow's memory requirements should scale with the size of individual records and the amount of data your code is buffering in memory. Dataflow's memory requirements shouldn't increase with the number of records.

like image 141
Jeremy Lewi Avatar answered Oct 13 '22 00:10

Jeremy Lewi