I have a spark dataframe which i can convert to pandas dataframe using the
toPandas()
method available in pyspark.
I have the following queries regarding this?
Thanks
Yes, once toPandas
is called on spark-dataframe it will get out of distributed system and new pandas dataframe will be in driver node of cluster.
And if the spark-data frame is huge and if doesnt fit into driver memory it will crash.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With