Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do select based on a substring of a particular column in HiveQL and SQL?

I have a table in hive

S.no  Age  minutes  code  
 1     10   20     75081     
 2     11   114    75080      
 3     21   104    75180     
 4     31   124    75108    
 5     10   20     75083     
 6     11   114    75180    
 7     21   104    75180    

I want to write an hivesql/sql query that gives ranked list of total minutes spoken based on the region i.e first 4 digits of code. How should i go about it? I know that SUBSTRING() gives me the required cut, but I have not been able to go from there.

Select code, minutes as total  
from TableT   
where S.no > 1
group by code 
order by total

Edit: The ranked results based on first 4 digits of zipcode should look something like this

total code

322(i.e 104+114+104) 7518
154(i.e 20+114+20) 7508
124 7510

like image 535
DJ_Stuffy_K Avatar asked Oct 13 '15 21:10

DJ_Stuffy_K


People also ask

How do I SELECT a substring from a column in SQL?

Use the SUBSTRING() function. The first argument is the string or the column name. The second argument is the index of the character at which the substring should begin. The third argument is the length of the substring.

How do I extract part of a string in Hive?

There are two ways to extract part of a string: Getting a substring, using: SUBSTR. Extracting part of a string using a Regular Expression: REGEXP_EXTRACT.

Can I use substring in WHERE clause?

The SUBSTRING SQL function is very useful when you want to make sure that the string values returned from a query will be restricted to a certain length. In the following example, using the 'firstname' column, the last two characters are matched with the word 'on' using the SQL SUBSTRING function in the where clause.

Is there a substring function in SQL?

SUBSTRING() Function in SQL Server The SUBSTRING() function extracts a substring starting from a position in an input string with a given length. In the case of substring, you need an input string and need to mention the starting point and the total length of the string.


1 Answers

Hmmm. I'm thinking you want this:

select substr(zip, 1, 4) as region, sum(minutes) as minutes,
       rank() over (order by sum(minutes) desc) as therank
from tableT
where s.no > 1
group by substr(zip, 1, 4)
order by minutes desc;
like image 167
Gordon Linoff Avatar answered Oct 22 '22 14:10

Gordon Linoff