Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving a dataframe result value to a string variable?

I created a dataframe in spark when find the max date I want to save it to the variable. Just trying to figure out how to get the result, which is a string, and save it to a variable.

code so far:

sqlDF = spark.sql("SELECT MAX(date) FROM account")
sqlDF.show()

what results look likes:

+--------------------+
| max(date)|
+--------------------+
|2018-04-19T14:11:...|
+--------------------+

thanks

like image 765
oharr Avatar asked Apr 20 '18 18:04

oharr


2 Answers

Assuming you're computing a global aggregate (where the output will have a single row) and are using PySpark, the following should work:

spark.sql("SELECT MAX(date) as maxDate FROM account").first()["maxDate"]

I believe this will return a datetime object but you can either convert that to a string in your driver code or do a SELECT CAST(MAX(DATE) as string) instead.

like image 64
Josh Rosen Avatar answered Oct 24 '22 00:10

Josh Rosen


Try something like this :

from pyspark.sql.functions import max as max_

# get last partition from all deltas
alldeltas=sqlContext.read.json (alldeltasdir)
last_delta=alldeltas.agg(max_("ingest_date")).collect()[0][0]

last_delta will give you a value, in this sample the maximum value of the column ingest_date in the dataframe.

like image 4
Hauke Mallow Avatar answered Oct 24 '22 00:10

Hauke Mallow