rlike
works fine but not rlike
throws an error:
scala> sqlContext.sql("select * from T where columnB rlike '^[0-9]*$'").collect()
res42: Array[org.apache.spark.sql.Row] = Array([412,0], [0,25], [412,25], [0,25])
scala> sqlContext.sql("select * from T where columnB not rlike '^[0-9]*$'").collect()
java.lang.RuntimeException: [1.35] failure: ``in'' expected but `rlike' found
val df = sc.parallelize(Seq(
(412, 0),
(0, 25),
(412, 25),
(0, 25)
)).toDF("columnA", "columnB")
Or it is continuation of issue https://issues.apache.org/jira/browse/SPARK-4207 ?
A concise way to do it in PySpark is:
df.filter(~df.column.rlike(pattern))
There is nothing as such not rlike, but in regex you have something called negative lookahead, which means it will give the words that does not match.
For above query, you can use the regex as below. Lets say, you want the ColumnB should not start with digits '0'
Then you can do like this.
sqlContext.sql("select * from T where columnB rlike '^(?!.*[1-9]).*$'").collect()
Result: Array[org.apache.spark.sql.Row] = Array([412,0])
What I meant over all is, you have to do with regex it self to negate the match, not with rlike. Rlike simply matches the regex that you asked to match. If your regex tells it to not match, it applies that, if your regex is for matching then it does that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With