Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive and regular expression

Tags:

hadoop

hive

I am trying to filter all the ip adresses in a username. But this doesnt really work properly in my query:

select distinct regexp_extract(username, '^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$', 0) from ips. 

The problem is that he even recognizes numbers as 1000000 as ip adress. Any idea how to fix it?

like image 255
user2523848 Avatar asked Mar 27 '26 21:03

user2523848


1 Answers

You need extra backslashes to escape special characters like . or \s. There's some more info on the wiki at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Try something like:

select
    distinct regexp_extract(ip, '^([0-9]{1,3})\\.([0-9]{1,3})\\.([0-9]{1,3})\\.([0-9]{1,3})$', 0) as match
from
    ips
having
    match <> "";
like image 79
Carter Shanklin Avatar answered Apr 01 '26 07:04

Carter Shanklin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!