Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use LEFT and RIGHT keyword in SPARK SQL

I am new to spark SQL,

In MS SQL, we have LEFT keyword, LEFT(Columnname,1) in('D','A') then 1 else 0.

How to implement the same in SPARK SQL.

like image 705
Miruthan Avatar asked Oct 19 '16 16:10

Miruthan


1 Answers

You can use substring function with positive pos to take from the left:

import org.apache.spark.sql.functions.substring

substring(column, 0, 1)

and negative pos to take from the right:

substring(column, -1, 1)

So in Scala you can define

import org.apache.spark.sql.Column
import org.apache.spark.sql.functions.substring

def left(col: Column, n: Int) = {
  assert(n >= 0)
  substring(col, 0, n)
}

def right(col: Column, n: Int) = {
  assert(n >= 0)
  substring(col, -n, n)
}

val df = Seq("foobar").toDF("str")

df.select(
  Seq(left _, right _).flatMap(f => (1 to 3).map(i => f($"str", i))): _*
).show
+--------------------+--------------------+--------------------+---------------------+---------------------+---------------------+
|substring(str, 0, 1)|substring(str, 0, 2)|substring(str, 0, 3)|substring(str, -1, 1)|substring(str, -2, 2)|substring(str, -3, 3)|
+--------------------+--------------------+--------------------+---------------------+---------------------+---------------------+
|                   f|                  fo|                 foo|                    r|                   ar|                  bar|
+--------------------+--------------------+--------------------+---------------------+---------------------+---------------------+

Similarly in Python:

from pyspark.sql.functions import substring
from pyspark.sql.column import Column

def left(col, n):
    assert isinstance(col, (Column, str))
    assert isinstance(n, int) and n >= 0
    return substring(col, 0, n)

def right(col, n):
    assert isinstance(col, (Column, str))
    assert isinstance(n, int) and n >= 0
    return substring(col, -n, n)
like image 191
zero323 Avatar answered Dec 18 '22 14:12

zero323