spark sql window function lag

Tags:

I am looking at the window slide function for a Spark DataFrame in Scala.

I have a DataFrame with columns Col1, Col2, Col3, date, volume and new_col.

Col1    Col2    Col3    date     volume new_col
                        201601  100.5   
                        201602  120.6   100.5
                        201603  450.2   120.6
                        201604  200.7   450.2
                        201605  121.4   200.7`

Now I want to add a new column with name(new_col) with one row slided down, as shown above.

I tried below option to use the window function.

val windSldBrdrxNrx_df = df.withColumn("Prev_brand_rx", lag("Prev_brand_rx",1))

Do you have any suggestion ?

428

asked Dec 15 '16 06:12

Ramesh

1 Answers

You are doing correctly all you missed is over(window expression) on lag

val df = sc.parallelize(Seq((201601, 100.5),
  (201602, 120.6),
  (201603, 450.2),
  (201604, 200.7),
  (201605, 121.4))).toDF("date", "volume")

val w = org.apache.spark.sql.expressions.Window.orderBy("date")  

import org.apache.spark.sql.functions.lag

val leadDf = df.withColumn("new_col", lag("volume", 1, 0).over(w))

leadDf.show()

+------+------+-------+
|  date|volume|new_col|
+------+------+-------+
|201601| 100.5|    0.0|
|201602| 120.6|  100.5|
|201603| 450.2|  120.6|
|201604| 200.7|  450.2|
|201605| 121.4|  200.7|
+------+------+-------+

This code was run on Spark shell 2.0.2

answered Nov 10 '22 13:11

mrsrinivas

Related questions
                            
                                Declare a variable without an initial value
                            
                                What new features will be added to Scala 2.9?
                            
                                Catching an exception within a map
                            
                                Add tools.jar in the classpath of sbt project
                            
                                Assign multiple variables at once in scala
                            
                                Is Option GenTraversableOnce?
                            
                                How do I wait for asynchronous tasks to complete in scala?
                            
                                Spark UDF with varargs
                            
                                print type of variable in Scala
                            
                                Please explain use of Option's orNull method
                            
                                What should a Scala developer know about Java and/or the JVM?
                            
                                How to lexicographically compare scala tuples?
                            
                                Parallel map operations?
                            
                                Spark Dataframes UPSERT to Postgres Table
                            
                                Generic wildcards in variable declarations in Scala
                            
                                How can I add unmanaged JARs in sbt-assembly to the final fat JAR?
                            
                                How to make a jar file from scala
                            
                                Nested iteration in Scala
                            
                                Return type in If expression
                            
                                Loaner Pattern in Scala

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

spark sql window function lag

Tags:

scala

window-functions

apache-spark

Ramesh

People also ask

1 Answers

mrsrinivas

Recent Activity

Donate For Us