Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark sql count(*) query store result

Hello I use Spark with Python, I performed a basic count(*) query on a dataframe as follow

myquery = sqlContext.sql("SELECT count(*) FROM myDF")

Result is

+--------+
|count(1)|
+--------+
|    3469|
+--------+

How can I save this value in order to perform futher operation.

For instance divide 3469 by 24 [whatever 24 means...]

like image 564
S12000 Avatar asked Aug 01 '17 23:08

S12000


People also ask

Where does Spark SQL store data?

Spark SQL is not a database but a module that is used for structured data processing. It majorly works on DataFrames which are the programming abstraction and usually act as a distributed SQL query engine.

What will count (*) do in SQL?

The COUNT(*) function counts the total rows in the table, including the NULL values.

What does count () do in Spark?

In Spark, the Count function returns the number of elements present in the dataset.

What is the result of Spark SQL?

The sql function on a SparkSession enables applications to run SQL queries programmatically and returns the result as a DataFrame . Find full example code at "examples/src/main/python/sql/basic.py" in the Spark repo.


2 Answers

Given that your query returns dataframe as

+-----+
|count|
+-----+
|3469 |
+-----+

You need to get the first (and only) row, and then its (only) field 'count'

count = dataframe.first()['count'] 
like image 180
Arthur PICHOT UTRERA Avatar answered Sep 29 '22 07:09

Arthur PICHOT UTRERA


Given that you have dataframe as

+-----+
|count|
+-----+
|3469 |
+-----+

You can perform mathematical operation on columns and create new columns or overwrite on the same using .withColumn api

df.withColumn('devided', df.count/24).show(false)

You should get

+-----+------------------+
|count|devided           |
+-----+------------------+
|3469 |144.54166666666666|
+-----+------------------+
like image 45
Ramesh Maharjan Avatar answered Sep 29 '22 09:09

Ramesh Maharjan