Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Spark in Java compare two Keys when doing a join or groupWith?

I am trying to do the following,

JavaPairRDD<JsonObject, JsonObject> rdd1 = ..
JavaPairRDD<JsonObject, String> rdd2 = .. 
JavaPairRDD<JsonObject, Tuple2<Iterable<String>, Iterable<JsonObject>>> 
groupedRDD = rdd1.groupWith(rdd2);

But I'm not sure how Spark will compare two JsonObject keys.

More generally, how are keys compared when doing a join or groupWith?

like image 477
pettinato Avatar asked Jan 23 '26 11:01

pettinato


1 Answers

It uses the Java .equals() method.

The thing is equals() is not implemented in JsonObject. So it will use the default Java implementation which compares just object references.

The equals method for class Object implements the most discriminating possible equivalence relation on objects; that is, for any non-null reference values x and y, this method returns true if and only if x and y refer to the same object (x == y has the value true).

like image 117
vanekjar Avatar answered Jan 25 '26 01:01

vanekjar