SQL LIKE in Spark SQL

Tags:

I'm trying to implement a join in Spark SQL using a LIKE condition.

The row I am performing the join on looks like this and is called 'revision':

Table A:

8NXDPVAE

Table B:

Click to copy

[4,8]NXD_V%

Performing the join on SQL server (A.revision LIKE B.revision) works just fine, but when doing the same in Spark SQL, the join returns no rows (if using inner join) or null values for Table B (if using outer join).

This is the query I am running:

Click to copy

val joined = spark.sql("SELECT A.revision, B.revision FROM RAWDATA A LEFT JOIN TPTYPE B ON A.revision LIKE B.revision")

The plan looks like this:

Click to copy

== Physical Plan ==
BroadcastNestedLoopJoin BuildLeft, LeftOuter, revision#15 LIKE revision#282, false
:- BroadcastExchange IdentityBroadcastMode
:  +- *Project [revision#15]
:     +- *Scan JDBCRelation(RAWDATA) [revision#15] PushedFilters: [EqualTo(bulk_id,2016092419270100198)], ReadSchema: struct<revision>
+- *Scan JDBCRelation(TPTYPE) [revision#282] ReadSchema: struct<revision>

Is it possible to perform a LIKE join like this or am I way off?

413

asked Nov 06 '16 20:11

Dan Markhasin

1 Answers

You are only a little bit off. Spark SQL and Hive follow SQL standard conventions where LIKE operator accepts only two special characters:

_ (underscore) - which matches an arbitrary character.
% (percent) - which matches an arbitrary sequence of characters.

Square brackets have no special meaning and [4,8] matches only a [4,8] literal:

Click to copy

spark.sql("SELECT '[4,8]' LIKE '[4,8]'").show

Click to copy

+----------------+
|[4,8] LIKE [4,8]|
+----------------+
|            true|
+----------------+

To match complex patterns you can use RLIKE operator which suports Java regular expressions:

Click to copy

spark.sql("SELECT '8NXDPVAE' RLIKE '^[4,8]NXD.V.*$'").show

Click to copy

+-----------------------------+
|8NXDPVAE RLIKE ^[4,8]NXD.V.*$|
+-----------------------------+
|                         true|
+-----------------------------+

answered Oct 20 '22 17:10

zero323

Related questions
                            
                                SQL Server: OPENXML vs SELECT..FROM when dealing with XML?
                            
                                How to log someone trying to make sql injection
                            
                                SQL Azure Reset autoincrement
                            
                                SQL: Select (null = null);
                            
                                convert excel worksheet to sql script
                            
                                Efficient way to store reorderable items in a database [closed]
                            
                                An issue of SqlCommand with parameters for IN [duplicate]
                            
                                Doctrine Query to find total number of Result in MySQL with LIMIT
                            
                                Split date range into one row per month in sql server
                            
                                SQL Server LOG: "An invalid floating point operation occurred" - even though input is within range
                            
                                Reading data from SQL Server using Spark SQL
                            
                                Query with ORDER BY is 13 times as slow when I add LIMIT 1
                            
                                SQL: GROUP BY multiple columns with CASE statement
                            
                                Why COUNT(*) is equal to 1 without FROM clause? [duplicate]
                            
                                Condense Time Periods with SQL
                            
                                Aggregating the most recent joined records per week
                            
                                Sql combine value of two columns as primary key
                            
                                How can I determine which condition on WHERE clause fails?
                            
                                How can I convert a pyspark.sql.dataframe.DataFrame back to a sql table in databricks notebook
                            
                                Redshift: Executing a dynamic query from a string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL LIKE in Spark SQL

Tags:

regex

sql

apache-spark

apache-spark-sql

Dan Markhasin

People also ask

1 Answers

zero323

Recent Activity

Donate For Us