Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Cassandra connector filtering with IN clause

I am facing some issues with spark cassandra connector filtering for java. Cassandra allows the filtering by last column of the partition key with IN clause. e.g

create table cf_text
(a varchar,b varchar,c varchar, primary key((a,b),c))

Query : select * from cf_text where a ='asdf' and b in ('af','sd');

sc.cassandraTable("test", "cf_text").where("a = ?", "af").toArray.foreach(println)

How count I specify the IN clause which is used in the CQL query in spark? How range queries can be specified as well?

like image 529
107 Avatar asked Jun 25 '15 10:06

107


1 Answers

Just wondering, but does your Spark code above work? I thought that Spark won't allow a WHERE on partition keys (a and b in your case), since it uses them under the hood (see last answer to this question): Spark Datastax Java API Select statements

In any case, with the Cassandra Spark connector, you are allowed to stack your WHERE clauses, and an IN can be specified with a List<String>.

List<String> valuesList = new ArrayList<String>();
valuesList.Add("value2");
valuesList.Add("value3");

sc.cassandraTable("test", "cf")
    .where("column1 = ?", "value1")
    .where("column2 IN ?", valuesList)
    .keyBy(new Function<MyCFClass, String>() {
                public String call(MyCFClass _myCF) throws Exception {
                    return _myCF.getId();
                }
            });

Note that the normal rules of using IN with Cassandra/CQL still apply here.

Range queries function in a similar manner:

sc.cassandraTable("test", "person")
    .where("age > ?", "15")
    .where("age < ?", "20")
    .keyBy(new Function<Person, String>() {
                public String call(Person _person) throws Exception {
                    return _person.getPersonid();
                }
            });
like image 113
Aaron Avatar answered Oct 24 '22 10:10

Aaron