AttributeError: 'DataFrame' object has no attribute 'map'

Tags:

I wanted to convert the spark data frame to add using the code below:

from pyspark.mllib.clustering import KMeans spark_df = sqlContext.createDataFrame(pandas_df) rdd = spark_df.map(lambda data: Vectors.dense([float(c) for c in data])) model = KMeans.train(rdd, 2, maxIterations=10, runs=30, initializationMode="random")

The detailed error message is:

--------------------------------------------------------------------------- AttributeError                            Traceback (most recent call last) <ipython-input-11-a19a1763d3ac> in <module>()       1 from pyspark.mllib.clustering import KMeans       2 spark_df = sqlContext.createDataFrame(pandas_df) ----> 3 rdd = spark_df.map(lambda data: Vectors.dense([float(c) for c in data]))       4 model = KMeans.train(rdd, 2, maxIterations=10, runs=30, initializationMode="random")  /home/edamame/spark/spark-2.0.0-bin-hadoop2.6/python/pyspark/sql/dataframe.pyc in __getattr__(self, name)     842         if name not in self.columns:     843             raise AttributeError( --> 844                 "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))     845         jc = self._jdf.apply(name)     846         return Column(jc)  AttributeError: 'DataFrame' object has no attribute 'map'

Does anyone know what I did wrong here? Thanks!

587

asked Sep 16 '16 15:09

Edamame

1 Answers

You can't map a dataframe, but you can convert the dataframe to an RDD and map that by doing spark_df.rdd.map(). Prior to Spark 2.0, spark_df.map would alias to spark_df.rdd.map(). With Spark 2.0, you must explicitly call .rdd first.

154

answered Sep 19 '22 20:09

David

Related questions
                            
                                Grouping tests in pytest: Classes vs plain functions
                            
                                Using py.test with coverage doesn't include imports
                            
                                How to add percentages on top of bars in seaborn
                            
                                Redirect Python 'print' output to Logger
                            
                                conda stuck on Proceed ([y]/n)? when updating packages in ipython console
                            
                                Why is this loop faster than a dictionary comprehension for creating a dictionary?
                            
                                How to convert a pandas DataFrame subset of columns AND rows into a numpy array?
                            
                                Compare two files report difference in python
                            
                                Suppress newline in Python logging module
                            
                                Python title() with apostrophes
                            
                                Invoke Python SimpleHTTPServer from command line with no cache option
                            
                                How to get the values from a NumPy array using multiple indices
                            
                                How to remove a field from the parent Form in a subclass?
                            
                                Reconstruct a categorical variable from dummies in pandas
                            
                                Figure out if a business name is very similar to another one - Python
                            
                                Anaconda: disable prompt change
                            
                                When to use pandas series, numpy ndarrays or simply python dictionaries?
                            
                                PyCharm can't find the right paths if I open a directory that is not the Django root
                            
                                I want Python argparse to throw an exception rather than usage
                            
                                Why finally block is executing after calling sys.exit(0) in except block?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

AttributeError: 'DataFrame' object has no attribute 'map'

Tags:

python

apache-spark

pyspark

spark-dataframe

apache-spark-mllib

Edamame

People also ask

1 Answers

David

Recent Activity

Donate For Us