Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra spark connector joinWithCassandraTable on field with differents name

I'm looking to make a join on a RDD and a cassandra table which have not the same name for the same key ex (simplified):

case class User(id : String, name : String)

and

case class Home( address : String, user_id : String)

If would like to do :

rdd[Home].joinWithCassandraTable("testspark","user").on(SomeColumns("id"))

How can I precise the name of the field on which the join will be made. And I don't want to map the rdd to have only the right id because I would like to join all values after the joinWithCassandraTable.

like image 431
crak Avatar asked Nov 15 '25 20:11

crak


2 Answers

You can use the "as" syntax just like in a select to change the mapping of what the joined columns are.

An example

sc.cassandraTable[Home]("ks","home").joinWithCassandraTable("ks","user").on(SomeColumns("id" as "user_id")).collect

Will map the "id" column from the user table to the "user_id" field from the Home case class.

like image 198
RussS Avatar answered Nov 17 '25 17:11

RussS


You could try changing the column name when you read in the Cassandra table so that it matched the RDD field you want to join on:

For example:

import org.apache.spark.sql.cassandra.CassandraSQLContext
val sc: SparkContext = ...
val cc = new CassandraSQLContext(sc)
val rdd: SchemaRDD = cc.sql("SELECT user_id AS id, <other columns> from testspark.user WHERE ...")
like image 26
Jim Meyer Avatar answered Nov 17 '25 17:11

Jim Meyer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!