Hello I use Spark with Python, I performed a basic count(*) query on a dataframe as follow <pre class="prettyprint"><code>myquery = sqlContext.sql("SELECT count(*) FROM myDF") </code></pre> Result is <pre class="prettyprint"><code>+--------+ |count(1)| +--------+ | 3469| +--------+ </code></pre> How can I save this value in order to perform futher operation. For instance divide 3469 by 24 [whatever 24 means...]

Given that your query returns <code>dataframe</code> as <pre class="prettyprint"><code>+-----+ |count| +-----+ |3469 | +-----+ </code></pre> You need to get the first (and only) row, and then its (only) field 'count' <pre class="prettyprint"><code>count = dataframe.first()['count'] </code></pre>

Given that you have <code>dataframe</code> as <pre class="prettyprint"><code>+-----+ |count| +-----+ |3469 | +-----+ </code></pre> You can perform mathematical operation on columns and create new columns or overwrite on the same using <code>.withColumn</code> api <pre class="prettyprint"><code>df.withColumn('devided', df.count/24).show(false) </code></pre> You should get <pre class="prettyprint"><code>+-----+------------------+ |count|devided | +-----+------------------+ |3469 |144.54166666666666| +-----+------------------+ </code></pre>

spark sql count(*) query store result

Tags:

sql

apache-spark

apache-spark-sql

Hello I use Spark with Python, I performed a basic count(*) query on a dataframe as follow

myquery = sqlContext.sql("SELECT count(*) FROM myDF")

Result is

+--------+
|count(1)|
+--------+
|    3469|
+--------+

How can I save this value in order to perform futher operation.

For instance divide 3469 by 24 [whatever 24 means...]

564

asked Aug 01 '17 23:08

S12000

2 Answers

Given that your query returns dataframe as

+-----+
|count|
+-----+
|3469 |
+-----+

You need to get the first (and only) row, and then its (only) field 'count'

count = dataframe.first()['count']

180

answered Sep 29 '22 07:09

Arthur PICHOT UTRERA

Given that you have dataframe as

+-----+
|count|
+-----+
|3469 |
+-----+

You can perform mathematical operation on columns and create new columns or overwrite on the same using .withColumn api

df.withColumn('devided', df.count/24).show(false)

You should get

+-----+------------------+
|count|devided           |
+-----+------------------+
|3469 |144.54166666666666|
+-----+------------------+

answered Sep 29 '22 09:09

Ramesh Maharjan

Related questions
                            
                                Case Statements with conditionals in SQL server
                            
                                Sqlzoo SELECT within SELECT Tutorial #5
                            
                                How do I select data from an sqlite3 database into variables using sqlite3_exec?
                            
                                INNER JOIN where **every** row must match the WHERE clause?
                            
                                How to concatenate a column value with single quotes in sql?
                            
                                How do I pass input parameters to sp_executesql?
                            
                                Wildfly 10 failing to load MySQL XA driver on startup
                            
                                Dapper.SimpleCRUD Insert / Update / Get fails with message "Entity must have at least one [Key] property"
                            
                                mysql: order the result by column name
                            
                                SQL Server SELECT @VARIABLE = TOP 1
                            
                                Replace All Occurrences using Oracle SQL regexp_replace Case-insensitive
                            
                                Concatenate in PostgreSQL
                            
                                Group by day of week
                            
                                How can I normalize the capitalization of a group-by column?
                            
                                sql: like any vs like all
                            
                                What is the difference between TRUNC and TO_DATE in Hive
                            
                                xml path returns "&lt;" for < and "&gt;" for > while executing query. how to get original value?
                            
                                Using CASE statement with isnull and else
                            
                                sequelize (js) association for multiple columns with one table
                            
                                in SQL, why is this JOIN returning the key column twice?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With