Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark SQL case insensitive filter for column conditions

How to use Spark SQL filter as a case insensitive filter.

For example:

dataFrame.filter(dataFrame.col("vendor").equalTo("fortinet"));

just return rows that 'vendor' column is equal to 'fortinet' but i want rows that 'vendor' column equal to 'fortinet' or 'Fortinet' or 'foRtinet' or ...

like image 314
Arman Avatar asked Jan 20 '16 07:01

Arman


People also ask

Is Spark case sensitive for column names?

caseSensitive is set to false , Spark does case insensitive column name resolution between Hive metastore schema and Parquet schema, so even column names are in different letter cases, Spark returns corresponding column values.

How do you check if a column contains a particular value in PySpark?

The contains() method checks whether a DataFrame column string contains a string specified as an argument (matches on part of the string). Returns true if the string exists and false if not.

How do I make PySpark case sensitive?

Try sqlContext. sql("set spark. sql. caseSensitive=true") in your Python code, which worked for me.

Is Spark schema case sensitive?

Column names that differ only by case are considered duplicate. Delta Lake is case preserving, but case insensitive, when storing a schema. Parquet is case sensitive when storing and returning column information. Spark can be case sensitive, but it is case insensitive by default.

What is a case statement in Spark SQL?

The CASE WHEN and OTHERWISE function or statement tests whether any of a sequence of expressions is true, and returns a corresponding result for the first true expression. Spark SQL DataFrame CASE Statement Examples. You can write the CASE statement on DataFrame column values or you can write your own expression to test conditions.

How to filter rows on spark dataframe based on multiple conditions?

To filter () rows on Spark DataFrame based on multiple conditions using AND (&&), OR (||), and NOT (!), you case use either Column with a condition or SQL expression as explained above. Below is just a simple example, you can extend this with AND (&&), OR (||), and NOT (!) conditional expressions as needed.

How do you select in case insensitive in SQL?

Case insensitive SQL SELECT: Use upper or lower functions. The SQL standard way to perform case insensitive queries is to use the SQL upper or lower functions, like this: select * from users where upper (first_name) = 'FRED'; or this: select * from users where lower (first_name) = 'fred';

How do you use case when in spark?

Spark SQL CASE WHEN on DataFrame. The CASE WHEN and OTHERWISE function or statement tests whether any of a sequence of expressions is true, and returns a corresponding result for the first true expression. Spark SQL DataFrame CASE Statement Examples


Video Answer


2 Answers

You can either use case-insensitive regex:

val df = sc.parallelize(Seq(
  (1L, "Fortinet"), (2L, "foRtinet"), (3L, "foo")
)).toDF("k", "v")

df.where($"v".rlike("(?i)^fortinet$")).show
// +---+--------+
// |  k|       v|
// +---+--------+
// |  1|Fortinet|
// |  2|foRtinet|
// +---+--------+

or simple equality with lower / upper:

import org.apache.spark.sql.functions.{lower, upper}

df.where(lower($"v") === "fortinet")
// +---+--------+
// |  k|       v|
// +---+--------+
// |  1|Fortinet|
// |  2|foRtinet|
// +---+--------+

df.where(upper($"v") === "FORTINET")
// +---+--------+
// |  k|       v|
// +---+--------+
// |  1|Fortinet|
// |  2|foRtinet|
// +---+--------+

For simple filters I would prefer rlike although performance should be similar, for join conditions equality is a much better choice. See How can we JOIN two Spark SQL dataframes using a SQL-esque "LIKE" criterion? for details.

like image 156
zero323 Avatar answered Sep 18 '22 08:09

zero323


Try to use lower/upper string functions:

dataFrame.filter(lower(dataFrame.col("vendor")).equalTo("fortinet"))

or

dataFrame.filter(upper(dataFrame.col("vendor")).equalTo("FORTINET"))
like image 35
Shawn Guo Avatar answered Sep 20 '22 08:09

Shawn Guo