I'm building a generic function which receives a RDD and does some calculations on it. Since I run more than one calculation on the input RDD I would like to cache it. For example:
public JavaRDD<String> foo(JavaRDD<String> r) {
r.cache();
JavaRDD t1 = r... //Some calculations
JavaRDD t2 = r... //Other calculations
return t1.union(t2);
}
My question is, since r
is given to me it may or may not already be cached. If it is cached and I call cache on it again, will spark create a new layer of cache meaning that while t1
and t2
are calculated I will have two instances of r
in the cache? or will spark is aware of the fact that r
is cached and will ignore it?
Nothing. If you call cache
on a cached RDD, nothing happens, RDD will be cached (once). Caching, like many other transformations, is lazy:
cache
, the RDD's storageLevel
is set to MEMORY_ONLY
cache
again, it's set to the same value (no change)storageLevel
and if it requires caching, it will cache it. So you're safe.
just test on my cluster, Zohar is right, nothing happens, it will just cache the RDD for once. The reason, I think, is that every RDD has an id
internally, spark will use the id
to mark whether a RDD have been cached or not. so cache one RDD for multiple times will do nothing.
bellow is my code and screenshot:
updated [ add code as required ]
### cache and count, then will show the storage info on WEB UI
raw_file = sc.wholeTextFiles('hdfs://10.21.208.21:8020/user/mercury/names', minPartitions=40)\
.setName("raw_file")\
.cache()
raw_file.count()
### try to cache and count again, then take a look at the WEB UI, nothing changes
raw_file.cache()
raw_file.count()
### try to change rdd's name and cache and count again, to see will it cache a new rdd as the new name again, still
### nothing changes, so I think maybe it is using the RDD id as a mark, for more we need to take a detailed read on
### the document even then source code
raw_file.setName("raw_file_2")
raw_file.cache().count()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With