Here's my spark code. It works fine and returns 2517. All I want to do is to print "2517 degrees"...but I'm not sure how to extract that 2517 into a variable. I can only display the dataframe but not extract values from it. Sounds super easy but unfortunately I'm stuck! Any help will be appreciated. Thanks!
df = sqlContext.read.format("csv").option("header", "true").option("inferSchema", "true").option("delimiter", "\t").load("dbfs:/databricks-datasets/power-plant/data")
df.createOrReplaceTempView("MyTable")
df = spark.sql("SELECT COUNT (DISTINCT AP) FROM MyTable")
display(df)
In PySpark, the substring() function is used to extract the substring from a DataFrame string column by providing the position and length of the string you wanted to extract. In this tutorial, I have explained with an example of getting substring of a column using substring() from pyspark. sql.
Method 6: Using select() with collect() method This method is used to select a particular row from the dataframe, It can be used with collect() function. where, dataframe is the pyspark dataframe. Columns is the list of columns to be displayed in each row.
You can select the single or multiple columns of the DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with selected columns. show() function is used to show the Dataframe contents.
here is the alternative:
df.first()['column name']
it will give you the desired output. you can store it in a variable.
I think you're looking for collect
. Something like this should get you the value:
df.collect()[0]['count(DISTINCT AP)']
assuming the column name is 'count(DISTINCT AP)'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With