Lets say I have a RDD like
[(u'Some1', (u'ABC', 9989)),
(u'Some2', (u'XYZ', 235)),
(u'Some3', (u'BBB', 5379)),
(u'Some4', (u'ABC', 5379))]
I am using map
to get one tuple at a time but how can I access to individual element of a tuple like to see if a tuple contains some character. Actually I want to filter out those that contains some character. Here the tuples that contain ABC
I was trying to do something like this but its not helping
def foo(line):
if(line[1]=="ABC"):
return (line)
new_data = data.map(foo)
I am new to spark and python as well please help!!
RDDs can be filtered directly. Below will give you all records that contain "ABC" in the 0th position of the 2nd element of the tuple.
new_data = data.filter(lambda x: x[1][0] == "ABC")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With