I'm new in pyspark . I write this code in pyspark: <pre class="prettyprint"><code>def filterOut2(line): return [x for x in line if x != 2] filtered_lists = data.map(filterOut2) </code></pre> but I get this error: <pre class="prettyprint"><code>'list' object has no attribute 'map' </code></pre> How do I perform a <code>map</code> operation specifically on my data in PySpark in a way that allows me to filter my data to only those values for which my condition evaluates to true?

<code>map(filterOut2, data)</code> works: <pre class="prettyprint"><code>>>> data = [[1,2,3,5],[1,2,5,2],[3,5,2,8],[6,3,1,2],[5,3,2,5],[4,1,2,5] ] ... def filterOut2(line): ... return [x for x in line if x != 2] ... list(map(filterOut2, data)) ... [[1, 3, 5], [1, 5], [3, 5, 8], [6, 3, 1], [5, 3, 5], [4, 1, 5]] </code></pre> <blockquote> map() takes exactly 1 argument (2 given) </blockquote> Looks like you redefined <code>map</code>. Try <code>__builtin__.map(filterOut2, data)</code>. Or, use a list comprehension: <pre class="prettyprint"><code>>>> [filterOut2(line) for line in data] [[1, 3, 5], [1, 5], [3, 5, 8], [6, 3, 1], [5, 3, 5], [4, 1, 5]] </code></pre>

'list' object has no attribute 'map' in pyspark

Tags:

python

apache-spark

pyspark

bigdata

I'm new in pyspark . I write this code in pyspark:

def filterOut2(line):
    return [x for x in line if x != 2]
filtered_lists = data.map(filterOut2)

but I get this error:

'list' object has no attribute 'map'

How do I perform a map operation specifically on my data in PySpark in a way that allows me to filter my data to only those values for which my condition evaluates to true?

715

asked Nov 03 '17 07:11

zahra

1 Answers

map(filterOut2, data) works:

>>> data = [[1,2,3,5],[1,2,5,2],[3,5,2,8],[6,3,1,2],[5,3,2,5],[4,1,2,5] ]
... def filterOut2(line):
...     return [x for x in line if x != 2]
... list(map(filterOut2, data))
...
[[1, 3, 5], [1, 5], [3, 5, 8], [6, 3, 1], [5, 3, 5], [4, 1, 5]]

map() takes exactly 1 argument (2 given)

Looks like you redefined map. Try __builtin__.map(filterOut2, data).

Or, use a list comprehension:

>>> [filterOut2(line) for line in data]
[[1, 3, 5], [1, 5], [3, 5, 8], [6, 3, 1], [5, 3, 5], [4, 1, 5]]

137

answered Oct 26 '22 05:10

Mike Müller

Related questions
                            
                                return the index using pandas series.sample()?
                            
                                Python program outputting different results, even though no random is used
                            
                                How do you append the values of the first column to all other columns in a pandas dataframe
                            
                                Using Python Selenium Webdriver to open Electron Application
                            
                                How to get original values after using factorize() in Python?
                            
                                Anaconda Prompt Corrupts after Installation
                            
                                Why does `head` need `()` and `shape` does not?
                            
                                Python PCA plot using Hotelling's T2 for a confidence interval
                            
                                How to create custom transport for asyncio?
                            
                                Python: Comparing two JSON objects in pytest
                            
                                Fastest way to loop over Pandas DataFrame for API calls
                            
                                Python/pandas - Using DataFrame.apply with function returning dictionary
                            
                                Lambda not supporting NLTK file size
                            
                                Getting Labels on top of Bar in Polar/Radial Bar Chart in Matplotlib, Python3
                            
                                Count unique elements along an axis of a NumPy array
                            
                                Sqlalchemy get row in timeslot
                            
                                Connecting to Oracle RDS
                            
                                Can I parameterize a pytest fixture with other fixtures?
                            
                                How to remove numeric characters present in Countvectorizer?
                            
                                Flattening two last dimensions of a tensor in TensorFlow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With