How to filter a Spark dataframe by a boolean column?

Tags:

I created a dataframe that has the following schema:

In [43]: yelp_df.printSchema()
root
 |-- business_id: string (nullable = true)
 |-- cool: integer (nullable = true)
 |-- date: string (nullable = true)
 |-- funny: integer (nullable = true)
 |-- id: string (nullable = true)
 |-- stars: integer (nullable = true)
 |-- text: string (nullable = true)
 |-- type: string (nullable = true)
 |-- useful: integer (nullable = true)
 |-- user_id: string (nullable = true)
 |-- name: string (nullable = true)
 |-- full_address: string (nullable = true)
 |-- latitude: double (nullable = true)
 |-- longitude: double (nullable = true)
 |-- neighborhoods: string (nullable = true)
 |-- open: boolean (nullable = true)
 |-- review_count: integer (nullable = true)
 |-- state: string (nullable = true)

I want to select only the records with the "open" column that is "true". The following command I run in PySpark returns nothing:

yelp_df.filter(yelp_df["open"] == "true").collect()

497

asked Apr 22 '16 02:04

Nasreddin

2 Answers

You're comparing data types incorrectly. open is listed as a Boolean value, not a string, so doing yelp_df["open"] == "true" is incorrect - "true" is a string.

Instead you want to do

yelp_df.filter(yelp_df["open"] == True).collect()

This correctly compares the values of open against the Boolean primitive True, rather than the non-Boolean string "true".

124

answered Sep 28 '22 02:09

Akshat Mahajan

from pyspark.sql import functions as F

filtered_df = df.filter(F.col('my_bool_col'))

answered Sep 28 '22 01:09

X_Trust

Related questions
                            
                                Why isn't Odoo picking up my module?
                            
                                Unique Salt per User using Flask-Security
                            
                                Getting Django 1.7 to work on Google App Engine
                            
                                pandas scatter matrix display correlation coefficient
                            
                                Reading a Matlab's cell array saved as a v7.3 .mat file with H5py
                            
                                List of unicode character names
                            
                                Iterate over non None items in Python
                            
                                Does Python really create all bound method for every new instance?
                            
                                show matplotlib colorbar instead of legend for multiple plots with gradually changing colors
                            
                                How to replace many 'if...elif' statements in Python? [duplicate]
                            
                                Runtime error when trying to logout django
                            
                                pandas - plot sorted column to increasing integer index
                            
                                What are the parentheses for at the end of Python method names? [duplicate]
                            
                                Get a list of file names from HDFS using python
                            
                                os.walk very slow, any way to optimise?
                            
                                Run Web app with Bokeh plots in an offline mode? Where to dl Required Bokeh files
                            
                                python converting video to audio
                            
                                Convert Pandas dataframe to list of list with index, data, and columns
                            
                                To replace but the last occurrence of string in a text [duplicate]
                            
                                Fastest way to find Indexes of item in list?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to filter a Spark dataframe by a boolean column?

Tags:

python

filter

apache-spark

apache-spark-sql

Nasreddin

People also ask

2 Answers

Akshat Mahajan

X_Trust

Recent Activity

Donate For Us