Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive - LIKE Operator

I can not figure out how I deal with that problem:

This is my Data:

Table1:         Table2:
BRAND           PRODUCT           SOLD
Sony            Sony ABCD         1233
Apple           Sony adv          1233
Google          Sony aaaa         1233
IBM             Apple 123         1233
etc.            Apple 345         1233
                IBM 13123         1233

Is it possible to filter the query that I have a table where stands the brand and the total solds? My idea is:

Select table1.brand, sum(table2.sold) from table1
join table2
on (table1.brand LIKE '%table2.product%')
group by table.1.brand

That was my idea, but i always get an Error

The biggest problem is the Like-Operator or is there any other solution?

like image 866
Daniel Avatar asked Jul 10 '15 11:07

Daniel


People also ask

Can we use like operator in Hive?

Apache Hive RLIKE statementYou can use Hive RLIKE wherever LIKE statement is allowed.

What is the difference between like and Rlike operators in Hive?

Hive has both LIKE (which functions the same as in SQL Server and other environments) and RLIKE, which uses regular expressions. These are mentioned briefly in the LanguageManual UDF documentation.

Is HiveQL similar to SQL?

It is very much similar to SQL and highly scalable. It reuses familiar concepts from the relational database world, such as tables, rows, columns and schema, to ease learning.

Is Hive better than SQL?

Hive is better for analyzing complex data sets. SQL is better for analyzing less complicated data sets very quickly. SQL supports Online Transactional Processing (OLTP). Hive doesn't support OLTP.


2 Answers

I see two issues: First of all, JOINs in hive only work with equality conditions, that like isn't going to work there.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins

Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job.

Instead, that wants to go into a where clause.

Secondly, I also see a problem with the like statement itself: '%table2.product%' is being interpreted as literally the string '%table2.product%'. Additionally, even if this was doing what was intended, it would try to look for table2.product inside of brand, when you seem to want it the other way. To get the evaluation you intended, you need to add the wildcard to the contents of table1.brand; to accomplish this, you want to concatenate your wildcards into your expression.

table2.product LIKE concat('%',table1.brand,'%'))

By doing this, your like will evaluate for strings '%Sony%', '%Apple%'...etc instead of '%table2.product%'.

What you want is Brandon Bell's query, which I've merged into this answer:

SELECT table1.brand, SUM(table2.sold) 
FROM table1, table2
WHERE table2.product LIKE concat('%', table1.brand, '%') 
GROUP BY table1.brand;
like image 146
Brandon McKenzie Avatar answered Nov 02 '22 17:11

Brandon McKenzie


You should be able to accomplish this without a JOIN. See the following query:

SELECT table1.brand, sum(table2.sold) 
FROM table1, table2 
WHERE table2.product LIKE concat('%', table1.brand, '%') 
GROUP BY table1.brand;

This returns

Apple   2466
IBM     1233
Sony    3699

Where my input files are as follows:

Sony
Apple
Google
IBM    

and

Sony ABCD       1233
Sony adv        1233
Sony aaaa       1233
Apple 123       1233
Apple 345       1233
IBM 13123       1233
like image 45
brandon.bell Avatar answered Nov 02 '22 18:11

brandon.bell