How do I use "not rlike" in spark-sql?

Question

rlike works fine but not rlike throws an error:

scala> sqlContext.sql("select * from T where columnB rlike '^[0-9]*$'").collect()
res42: Array[org.apache.spark.sql.Row] = Array([412,0], [0,25], [412,25], [0,25])

scala> sqlContext.sql("select * from T where columnB not rlike '^[0-9]*$'").collect()
java.lang.RuntimeException: [1.35] failure: ``in'' expected but `rlike' found


val df = sc.parallelize(Seq(
  (412, 0),
  (0, 25), 
  (412, 25), 
  (0, 25)
)).toDF("columnA", "columnB")

Or it is continuation of issue https://issues.apache.org/jira/browse/SPARK-4207 ?

pleicht17 · Accepted Answer

A concise way to do it in PySpark is:

df.filter(~df.column.rlike(pattern))

Srini · Answer

There is nothing as such not rlike, but in regex you have something called negative lookahead, which means it will give the words that does not match.

For above query, you can use the regex as below. Lets say, you want the ColumnB should not start with digits '0'

Then you can do like this.

sqlContext.sql("select * from T where columnB rlike '^(?!.*[1-9]).*$'").collect() 
Result: Array[org.apache.spark.sql.Row] = Array([412,0])

What I meant over all is, you have to do with regex it self to negate the match, not with rlike. Rlike simply matches the regex that you asked to match. If your regex tells it to not match, it applies that, if your regex is for matching then it does that.

How do I use "not rlike" in spark-sql?

Tags:

scala

apache-spark

apache-spark-sql

WoodChopper

2 Answers

pleicht17

Srini

Recent Activity

Donate For Us

How do I use "not rlike" in spark-sql?

Tags:

scala

apache-spark

apache-spark-sql

WoodChopper

2 Answers

pleicht17

Srini

Related questions

Recent Activity

Donate For Us