Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark RDD .filter() with wildcard

I have an Pyspark RDD with a text column that I want to use as a a filter, so I have the following code:

table2 = table1.filter(lambda x: x[12] == "*TEXT*")

To problem is... As you see I'm using the * to try to tell him to interpret that as a wildcard, but no success. Anyone has a help no that ?

like image 452
Lucas Mattos Avatar asked Aug 31 '16 18:08

Lucas Mattos


1 Answers

The lambda function is pure python, so something like below would work

table2 = table1.filter(lambda x: "TEXT" in x[12])
like image 84
David Avatar answered Oct 20 '22 11:10

David