How to filter column on values in list in pyspark?

Tags:

I have a dataframe rawdata on which i have to apply filter condition on column X with values CB,CI and CR. So I used the below code:

df = dfRawData.filter(col("X").between("CB","CI","CR"))

But I am getting the following error:

between() takes exactly 3 arguments (4 given)

Please let me know how I can resolve this issue.

719

asked Oct 12 '17 10:10

LKA

1 Answers

The function between is used to check if the value is between two values, the input is a lower bound and an upper bound. It can not be used to check if a column value is in a list. To do that, use isin:

import pyspark.sql.functions as f
df = dfRawData.where(f.col("X").isin(["CB", "CI", "CR"]))

answered Sep 22 '22 20:09

Shaido

Related questions
                            
                                SBT Test Error: java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream
                            
                                java.util.Date is not supported
                            
                                Replace null values in Spark DataFrame
                            
                                Getting the value of a DataFrame column in Spark
                            
                                Apache spark error: not found: value sqlContext
                            
                                Spark Shell "Failed to Initialize Compiler" Error on a mac
                            
                                Add extra hours to timestamp columns in Pyspark data frame [duplicate]
                            
                                Spark SQL: how to cache sql query result without using rdd.cache()
                            
                                How to randomly sample from a Scala list or array?
                            
                                How to filter based on array value in PySpark?
                            
                                How do you automate pyspark jobs on emr using boto3 (or otherwise)?
                            
                                Spark-Shell Startup Errors
                            
                                Amazon s3a returns 400 Bad Request with Spark
                            
                                How to use groupBy to collect rows into a map?
                            
                                Hadoop “Unable to load native-hadoop library for your platform” error on docker-spark?
                            
                                AWS Glue executor memory limit
                            
                                Does SparkSQL support subquery?
                            
                                Pyspark - Aggregation on multiple columns
                            
                                Spark, add new Column with the same value in Scala [duplicate]
                            
                                Zeppelin: How to restart sparkContext in zeppelin

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to filter column on values in list in pyspark?

Tags:

apache-spark

apache-spark-sql

pyspark

pyspark-sql

spark-dataframe

LKA

People also ask

1 Answers

Shaido

Recent Activity

Donate For Us