Is there a way to calculate the size in bytes of an Apache spark Data Frame using pyspark?
Solution: Get Size/Length of Array & Map DataFrame Column Spark/PySpark provides size() SQL function to get the size of the array & map type columns in DataFrame (number of elements in ArrayType or MapType columns).
You can determine the size of a table by calculating the total sum of the individual files within the underlying directory. You can also use queryExecution. analyzed. stats to return the size.
char_length. char_length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
why don't you just cache the df and then look in the spark UI under storage and convert the units to bytes
df.cache()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With