I created a dataframe in spark when find the max date I want to save it to the variable. Just trying to figure out how to get the result, which is a string, and save it to a variable.
code so far:
sqlDF = spark.sql("SELECT MAX(date) FROM account")
sqlDF.show()
what results look likes:
+--------------------+
| max(date)|
+--------------------+
|2018-04-19T14:11:...|
+--------------------+
thanks
Assuming you're computing a global aggregate (where the output will have a single row) and are using PySpark, the following should work:
spark.sql("SELECT MAX(date) as maxDate FROM account").first()["maxDate"]
I believe this will return a datetime
object but you can either convert that to a string in your driver code or do a SELECT CAST(MAX(DATE) as string)
instead.
Try something like this :
from pyspark.sql.functions import max as max_
# get last partition from all deltas
alldeltas=sqlContext.read.json (alldeltasdir)
last_delta=alldeltas.agg(max_("ingest_date")).collect()[0][0]
last_delta will give you a value, in this sample the maximum value of the column ingest_date in the dataframe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With