I am using pyspark 2.0 to create a DataFrame object by reading a csv using: <pre class="prettyprint"><code>data = spark.read.csv('data.csv', header=True) </code></pre> I find the type of the data using <pre class="prettyprint"><code>type(data) </code></pre> The result is <pre class="prettyprint"><code>pyspark.sql.dataframe.DataFrame </code></pre> I am trying to convert the some columns in data to LabeledPoint in order to apply a classification. <pre class="prettyprint"><code>from pyspark.sql.types import * from pyspark.sql.functions import loc from pyspark.mllib.regression import LabeledPoint data.select(['label','features']). map(lambda row:LabeledPoint(row.label, row.features)) </code></pre> I came across with this problem: <pre class="prettyprint"><code>AttributeError: 'DataFrame' object has no attribute 'map' </code></pre> Any idea on the error? Is there a way to generate a LabelPoint from DataFrame in order to perform classification?

Use <code>.rdd.map</code>: <pre class="prettyprint"><code>>>> data.select(...).rdd.map(...) </code></pre> <code>DataFrame.map</code> has been removed in Spark 2.

pyspark error: 'DataFrame' object has no attribute 'map'

Tags:

apache-spark

spark-dataframe

apache-spark-2.0

I am using pyspark 2.0 to create a DataFrame object by reading a csv using:

data = spark.read.csv('data.csv', header=True)

I find the type of the data using

type(data)

The result is

pyspark.sql.dataframe.DataFrame

I am trying to convert the some columns in data to LabeledPoint in order to apply a classification.

from pyspark.sql.types import *    
from pyspark.sql.functions import loc
from pyspark.mllib.regression import LabeledPoint

data.select(['label','features']).
              map(lambda row:LabeledPoint(row.label, row.features))

I came across with this problem:

AttributeError: 'DataFrame' object has no attribute 'map'

Any idea on the error? Is there a way to generate a LabelPoint from DataFrame in order to perform classification?

902

asked Sep 08 '16 01:09

Xi Liang

1 Answers

Use .rdd.map:

>>> data.select(...).rdd.map(...)

DataFrame.map has been removed in Spark 2.

126

answered Sep 23 '22 16:09

user6022341

Related questions
                            
                                Count particular characters within a column using Spark Dataframe API
                            
                                How to use Spark SQL to parse the JSON array of objects
                            
                                Sort Spark Dataframe with two columns in different order
                            
                                take top N after groupBy and treat them as RDD
                            
                                use an external library in pyspark job in a Spark cluster from google-dataproc
                            
                                Converting a vector column in a dataframe back into an array column
                            
                                Remove an element from a Python list of lists in PySpark DataFrame
                            
                                How to flatten tuples in Spark?
                            
                                scala generic encoder for spark case class
                            
                                PySpark - Get indices of duplicate rows
                            
                                org.apache.spark.SparkException: Task not serializable
                            
                                NoClassDefFound : Scala/xml/metadata
                            
                                Column filtering in PySpark
                            
                                'yarn application -list' doesnt show any results
                            
                                Convert RDD to Dataframe in Spark/Scala
                            
                                Explicit cast reading .csv with case class Spark 2.1.0
                            
                                spark - scala - save dataframe to a table with overwrite mode
                            
                                spark foreachPartition, how to get an index of each partition?
                            
                                What is the result of RDD transformation in Spark?
                            
                                Detected Guava issue #1635 which indicates that a version of Guava less than 16.01 is in use

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With