I am facing a lot of issues integrating/adding Pyspark dataframes to existing Pandas code.
1) If I convert Pandas dataframes to Pyspark dataframes, multiple operations do not translate well since Pyspark dataframes do not seem to be as rich as Pandas dataframes.
2) If I choose to use Pyspark dataframes and Pandas to handle different datasets within the same code, Pyspark transformations(like map) do not seem to work at all when the function called through map contains any pandas dataframes.
I have existing code in Python that uses pandas and numpy; and works fine on a single machine. My initial attempt to translate the entire code to Spark dataframes failed since Spark dataframes do not support many operations that Pandas does.
Now, I am trying to apply pyspark to the existing code to gain from Pyspark's distributed computations. Using Spark 2.1.0(Cloudera parcel) and Anaconda distribution - with Python 2.7.14.
Are Pyspark and Pandas certified to work together? Any good references where I can find documentation and examples of using them together?
Your responses will be highly appreciated.
I don't think pySpark is a replacement of Pandas. As per my understanding
I will pick
Edit: (Incorporating comments)
My challenge is that I have an existing pandas based python code that I want to run in distributed way. Hence the need to use pandas within pyspark framework.
PySpark and Pandas both refer their data structure as 'dataframe' but they are different platforms at runtime.
All we can do is, rewrite application from pandas to PySpark (suggested). If any functionality is not available in PySpark we need to implement it by UDF or UDAF.
Another alternate solution is converting Pandas dataframe to PySpark but that's generally not suggested because Pandas dataframe is not distributed and it can be a bottle neck in future.
Example (Pandas to PySpark):
import pandas as pd
pandas_df = pd.DataFrame([("foo", 1), ("bar", 2)], columns=("k", "v"))
spark_df = spark.createDataFrame(pandas_df)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With