I hava a DataFrame,the DataFrame hava two column 'value' and 'timestamp',,the 'timestmp' is ordered,I want to get the last row of the DataFrame,what should I do?
this is my input:
+-----+---------+
|value|timestamp|
+-----+---------+
| 1| 1|
| 4| 2|
| 3| 3|
| 2| 4|
| 5| 5|
| 7| 6|
| 3| 7|
| 5| 8|
| 4| 9|
| 18| 10|
+-----+---------+
this is my code:
val arr = Array((1,1),(4,2),(3,3),(2,4),(5,5),(7,6),(3,7),(5,8),(4,9),(18,10))
var df=m_sparkCtx.parallelize(arr).toDF("value","timestamp")
this is my expected result:
+-----+---------+
|value|timestamp|
+-----+---------+
| 18| 10|
+-----+---------+
iloc – Pandas Dataframe. iloc is used to retrieve data by specifying its index. In python negative index starts from the end so we can access the last element of the dataframe by specifying its index to -1.
Method 1: Using tail() method DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end). By default n = 5, it return the last 5 rows if the value of n is not passed to the method.
Use tail() action to get the Last N rows from a DataFrame, this returns a list of class Row for PySpark and Array[Row] for Spark with Scala.
Pandas iloc is used to retrieve data by specifying its integer index. In python negative index starts from end therefore we can access the last element by specifying index to -1 instead of length-1 which will yield the same result.
Try this, it works for me.
df.orderBy($"value".desc).show(1)
I would use simply the query that - orders your table by descending order - takes 1st value from this order
df.createOrReplaceTempView("table_df")
query_latest_rec = """SELECT * FROM table_df ORDER BY value DESC limit 1"""
latest_rec = self.sqlContext.sql(query_latest_rec)
latest_rec.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With