I have a list that is generated by a function. when I execute <code>print</code> on my list: <pre class="prettyprint"><code>print(preds_labels) </code></pre> I obtain: <pre class="prettyprint"><code>[(0.,8.),(0.,13.),(0.,19.),(0.,19.),(0.,19.),(0.,19.),(0.,19.),(0.,20.),(0.,21.),(0.,23.)] </code></pre> but when I want to create a <code>DataFrame</code> with this command: <pre class="prettyprint"><code>df = sqlContext.createDataFrame(preds_labels, ["prediction", "label"]) </code></pre> I get an error message: <blockquote> not supported type: type 'numpy.float64' </blockquote> If I create the list manually, I have no problem. Do you have an idea?

pyspark uses its own type system and unfortunately it doesn't deal with numpy well. It works with python types though. So you could manually convert the <code>numpy.float64</code> to <code>float</code> like <pre class="prettyprint"><code>df = sqlContext.createDataFrame( [(float(tup[0]), float(tup[1]) for tup in preds_labels], ["prediction", "label"] ) </code></pre> Note pyspark will then take them as <code>pyspark.sql.types.DoubleType</code>

Cannot create dataframe from list: pyspark

Tags:

python

apache-spark

apache-spark-sql

pyspark

I have a list that is generated by a function. when I execute print on my list:

Click to copy

print(preds_labels)

I obtain:

Click to copy

[(0.,8.),(0.,13.),(0.,19.),(0.,19.),(0.,19.),(0.,19.),(0.,19.),(0.,20.),(0.,21.),(0.,23.)]

but when I want to create a DataFrame with this command:

Click to copy

df = sqlContext.createDataFrame(preds_labels, ["prediction", "label"])

I get an error message:

not supported type: type 'numpy.float64'

If I create the list manually, I have no problem. Do you have an idea?

399

asked Aug 07 '16 02:08

a.moussa

1 Answers

pyspark uses its own type system and unfortunately it doesn't deal with numpy well. It works with python types though. So you could manually convert the numpy.float64 to float like

Click to copy

df = sqlContext.createDataFrame(
    [(float(tup[0]), float(tup[1]) for tup in preds_labels], 
    ["prediction", "label"]
)

Note pyspark will then take them as pyspark.sql.types.DoubleType

answered Oct 08 '22 19:10

shuaiyuancn

Related questions
                            
                                Search Everywhere for Comments in Pycharm
                            
                                Name of and reason for Python function parameters of type `name=value`
                            
                                Cryptography module is Fernet safe and can i do AES encryption with that module?
                            
                                Why can't I send `None` as data in a POST request using Python's `requests` library?
                            
                                How to create object of derived class inside base class in Python?
                            
                                Can lambda work with *args as its parameter? [duplicate]
                            
                                BigInts seem slow in Julia
                            
                                Transforming Dataframe columns into Dataframe of rows
                            
                                runspider: error: File not found: - Scrapy
                            
                                Pythonic way to break out of loop
                            
                                pandas - scatter plot with different color legend for each point
                            
                                Spark SQL performance - JOIN on value BETWEEN min and max
                            
                                How to receive file_id through python-telegram-bot?
                            
                                Random access over all pair-wise combinations of large list in Python
                            
                                Capitalization of filenames storing Python classes
                            
                                subclassing dict; dict.update returns incorrrect value - python bug?
                            
                                Pandas gives an error from str.extractall('#')
                            
                                Sum list of list elements in python like sql group by [duplicate]
                            
                                Python Scrapy 301 redirects
                            
                                how to convert the %3A and %2F to : and / in the url in python?

Cannot create dataframe from list: pyspark

Tags:

python

apache-spark

apache-spark-sql

pyspark

a.moussa

People also ask

1 Answers

shuaiyuancn

Recent Activity

Donate For Us