Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I check for equality using Spark Dataframe without SQL Query?

I want to select a column that equals to a certain value. I am doing this in scala and having a little trouble.

Heres my code

df.select(df("state")==="TX").show() 

this returns the state column with boolean values instead of just TX

Ive also tried

df.select(df("state")=="TX").show()  

but this doesn't work either.

like image 287
Instinct Avatar asked Jul 09 '15 17:07

Instinct


People also ask

How do you know if two columns are equal in PySpark?

Column equality with === We can also evaluate column equality by comparing both columns with the === operator and making sure all values evaluate to true . Let's write a function to verify that all the values in a given column are true.

How do you check DataFrame in Spark?

Spark show() – Display DataFrame Contents in Table. Spark DataFrame show() is used to display the contents of the DataFrame in a Table Row & Column Format. By default, it shows only 20 Rows and the column values are truncated at 20 characters.

Can we use SQL query directly in Spark?

Spark SQL allows you to execute Spark queries using a variation of the SQL language. You can execute Spark SQL queries in Scala by starting the Spark shell. When you start Spark, DataStax Enterprise creates a Spark session instance to allow you to run Spark SQL queries against database tables.


2 Answers

I had the same issue, and the following syntax worked for me:

df.filter(df("state")==="TX").show() 

I'm using Spark 1.6.

like image 149
user3487888 Avatar answered Oct 20 '22 00:10

user3487888


There is another simple sql like option. With Spark 1.6 below also should work.

df.filter("state = 'TX'") 

This is a new way of specifying sql like filters. For a full list of supported operators, check out this class.

like image 34
Jegan Avatar answered Oct 19 '22 23:10

Jegan