converting pandas dataframes to spark dataframe in zeppelin

Tags:

I am new to zeppelin. I have a usecase wherein i have a pandas dataframe.I need to visualize the collections using in-built chart of zeppelin I do not have a clear approach here. MY understanding is with zeppelin we can visualize the data if it is a RDD format. So, i wanted to convert to pandas dataframe into spark dataframe, and then do some querying (using sql), I will visualize. To start with, I tried to convert pandas dataframe to spark's but i failed

%pyspark
import pandas as pd
from pyspark.sql import SQLContext
print sc
df = pd.DataFrame([("foo", 1), ("bar", 2)], columns=("k", "v"))
print type(df)
print df
sqlCtx = SQLContext(sc)
sqlCtx.createDataFrame(df).show()

And I got the below error

Traceback (most recent call last): File "/tmp/zeppelin_pyspark.py", 
line 162, in <module> eval(compiledCode) File "<string>", 
line 8, in <module> File "/home/bala/Software/spark-1.5.0-bin-hadoop2.6/python/pyspark/sql/context.py", 
line 406, in createDataFrame rdd, schema = self._createFromLocal(data, schema) File "/home/bala/Software/spark-1.5.0-bin-hadoop2.6/python/pyspark/sql/context.py", 
line 322, in _createFromLocal struct = self._inferSchemaFromList(data) File "/home/bala/Software/spark-1.5.0-bin-hadoop2.6/python/pyspark/sql/context.py", 
line 211, in _inferSchemaFromList schema = _infer_schema(first) File "/home/bala/Software/spark-1.5.0-bin-hadoop2.6/python/pyspark/sql/types.py", 
line 829, in _infer_schema raise TypeError("Can not infer schema for type: %s" % type(row)) 
TypeError: Can not infer schema for type: <type 'str'>

Can someone please help me out here? Also, correct me if I am wrong anywhere.

214

asked Oct 06 '15 09:10

Bala

1 Answers

The following works for me with Zeppelin 0.6.0, Spark 1.6.2 and Python 3.5.2:

%pyspark
import pandas as pd
df = pd.DataFrame([("foo", 1), ("bar", 2)], columns=("k", "v"))
z.show(sqlContext.createDataFrame(df))

which renders as:

enter image description here

103

answered Oct 05 '22 23:10

eddies

Related questions
                            
                                Double requirement given when trying to use pip install pandas
                            
                                How to store a numpy arrays in a column of a Pandas dataframe?
                            
                                Check if pandas dataframe is subset of other dataframe
                            
                                keep/slice specific columns in pandas
                            
                                Multidimensional Scaling Fitting in Numpy, Pandas and Sklearn (ValueError)
                            
                                Python Pandas inferring column datatypes
                            
                                How to remove common rows in two dataframes in Pandas?
                            
                                How to check if you are in a Jupyter notebook
                            
                                `.loc` and `.iloc` with MultiIndex'd DataFrame
                            
                                pandas cut: how to convert categorical labels to strings (otherwise cannot export to Excel)?
                            
                                Can Pandas run on Google App Engine for Python?
                            
                                pandas dataframe selecting the nan indexes
                            
                                Groupby, transpose and append in Pandas?
                            
                                Run function exactly once for each row in a Pandas dataframe
                            
                                Python pandas to_csv zip format
                            
                                using python pandas lookup another dataframe and return corresponding values
                            
                                matplotlib plot datetime in pandas DataFrame
                            
                                adding one to all the values in a dataframe
                            
                                loc function in pandas
                            
                                Python: limit the width of printed columns of pandas DataFrame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

converting pandas dataframes to spark dataframe in zeppelin

Tags:

pandas

dataframe

apache-spark

apache-zeppelin

Bala

People also ask

1 Answers

eddies

Recent Activity

Donate For Us