Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

map vs filter in Apache Spark

Tags:

apache-spark

From official documentation for Apache Spark:

http://spark.apache.org/docs/latest/rdd-programming-guide.html

map(func):Return a new distributed dataset formed by passing each element of the source through a function func.

filter(func) Return a new dataset formed by selecting those elements of the source on which func returns true.

Going by bold words, is it a big difference?And is it really a difference?

like image 481
Mandroid Avatar asked Jun 26 '26 09:06

Mandroid


1 Answers

It's really just a difference from the end-user in how you use the API. map is meant to take a record as input and return a record that you've applied some function to. Whereas filter is meant to take a record as input and return a boolean. Internally Spark will execute both with mapPartitions.

like image 150
Silvio Avatar answered Jun 29 '26 15:06

Silvio



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!