Cassandra spark connector joinWithCassandraTable on field with differents name

Question

I'm looking to make a join on a RDD and a cassandra table which have not the same name for the same key ex (simplified):

case class User(id : String, name : String)

and

case class Home( address : String, user_id : String)

If would like to do :

rdd[Home].joinWithCassandraTable("testspark","user").on(SomeColumns("id"))

How can I precise the name of the field on which the join will be made. And I don't want to map the rdd to have only the right id because I would like to join all values after the joinWithCassandraTable.

RussS · Accepted Answer

You can use the "as" syntax just like in a select to change the mapping of what the joined columns are.

An example

sc.cassandraTable[Home]("ks","home").joinWithCassandraTable("ks","user").on(SomeColumns("id" as "user_id")).collect

Will map the "id" column from the user table to the "user_id" field from the Home case class.

Jim Meyer · Answer

You could try changing the column name when you read in the Cassandra table so that it matched the RDD field you want to join on:

For example:

import org.apache.spark.sql.cassandra.CassandraSQLContext
val sc: SparkContext = ...
val cc = new CassandraSQLContext(sc)
val rdd: SchemaRDD = cc.sql("SELECT user_id AS id, <other columns> from testspark.user WHERE ...")

Cassandra spark connector joinWithCassandraTable on field with differents name

Tags:

scala

cassandra

apache-spark

spark-cassandra-connector

datastax-enterprise

crak

2 Answers

RussS

Jim Meyer

Recent Activity

Donate For Us

Cassandra spark connector joinWithCassandraTable on field with differents name

Tags:

scala

cassandra

apache-spark

spark-cassandra-connector

datastax-enterprise

crak

2 Answers

RussS

Jim Meyer

Related questions

Recent Activity

Donate For Us