Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search for a particular text in a string - Hive

/google/gmail/inbox
 /google/drive/map
 /google/apps
 /yahoo/news/cricket
 /yahoo/mail/
 /yahoo/sports
 /wiki/ind/jack
 /wiki/us/jil

I need to get the required page groups. If i search for page group starting with 'google' using hive query, I need to get the data of the first 3 rows.

/google/gmail/inbox
 /google/drive/map
 /google/apps

In this way I need to get the data based on page group.


I searched for the string using like function.

select * from table where field like '%/google/%';
like image 273
Krishna Avatar asked Jul 16 '15 09:07

Krishna


2 Answers

It sounds like you want page group. Which may be google, but it seems like it could also be yahoo perhaps. If you want to extract the page group by search engine you can use a regular expression. You can place multiple websites in the (page1|page2|...|pageN).

Select column from table
where column rlike '.*(google|yahoo).*'

Output:

/google/gmail/inbox
/google/drive/map
/google/apps

You may want to create a new column that has the search engine name it or landing page. It seems like the first place in the path is the landing page. You can extact the landing page in this fashion:

select * from
    (Select column
    , regexp_extract('^(\\/[a-zA-Z]*\\/)',1) as landing_page
    from table) a
  where landing page in ('google','yahoo',...,'bing')
  ;

Output:

column                   new column
/google/gmail/inbox      /google/
/google/drive/map        /google/
/google/apps             /google/
/yahoo/news/cricket      /yahoo/
/yahoo/mail/             /yahoo/
/yahoo/sports            /yahoo/
/bing/meats/delisandwich /bing/
/bing/maps/delis         /bing/

If you don't want /google/ and prefer just google then do:

regexp_extract('^\\/([a-zA-Z]*)\\/',1) as landing_page

Now I am assuming that the landing page comes first in the paths you describe.

like image 145
invoketheshell Avatar answered Oct 25 '22 20:10

invoketheshell


The question is a bit ambiguous, but I believe you are trying to search for the word google in the string and return the rows which contain the word google in the string.

Suppose you have the following table:

create table test (val string);

And it contain the following records:

hive> select * from test;
/google/gmail/inbox
/google/drive/map
/yahoo/mail/

You can select the rows which contains the string google using the below query:

select val from test
where instr(val, 'google') = 2;

This gives the result:

/google/gmail/inbox
/google/drive/map

instr gives the position of the string you searched for. In this case, the position of google is 2. If you are trying to get all the rows which contains google anywhere, then you can use:

select val from test
where instr(val, 'google') != 0;

You can read about the various Hive string functions from the documentation.

like image 38
visakh Avatar answered Oct 25 '22 22:10

visakh