Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how can you calculate the size of an apache spark data frame using pyspark?

Is there a way to calculate the size in bytes of an Apache spark Data Frame using pyspark?

like image 532
Mihai Tache Avatar asked Jul 04 '16 08:07

Mihai Tache


People also ask

How do you find the size of a list in PySpark?

Solution: Get Size/Length of Array & Map DataFrame Column Spark/PySpark provides size() SQL function to get the size of the array & map type columns in DataFrame (number of elements in ArrayType or MapType columns).

How do you size a table in PySpark?

You can determine the size of a table by calculating the total sum of the individual files within the underlying directory. You can also use queryExecution. analyzed. stats to return the size.

How do I find the length of a string in a Spark data frame?

char_length. char_length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.


1 Answers

why don't you just cache the df and then look in the spark UI under storage and convert the units to bytes

df.cache()
like image 98
thePurplePython Avatar answered Oct 05 '22 18:10

thePurplePython