Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to get input file name as column within hive query

Tags:

I have a hive external tables that mapped to some directory. This directory includes a several files.

I want to run query like find file name where there is a user "abc"

 select file_name , usr from usrs_tables where usr = "abc" 

But of course the data doesn't includes file name inside.

In MapReduce I can do it by

FileSplit fileSplit = (FileSplit)context.getInputSplit(); String filename = fileSplit.getPath().getName(); System.out.println("File name "+filename); System.out.println("Directory and File name"+fileSplit.getPath().toString()); 

How can I do it in Hive?

like image 378
Julias Avatar asked May 23 '13 13:05

Julias


People also ask

How do I get column details in Hive?

use desc tablename from Hive CLI or beeline to get all the column names. If you want the column names in a file then run the below command from the shell. where dbname is the name of the Hive database where your table is residing You can find the file columnnames. txt in your root directory.

How do I find column names in Hive?

Here's the query you can use on the metastore: select TBL_NAME, COLUMN_NAME, TYPE_NAME from TBLS left join COLUMNS_V2 on CD_ID = TBL_ID where COLUMN_NAME like 'column'; where 'column' is the column name you're looking for.

What inputs we have to give to submit Hive query?

One is INPUT__FILE__NAME , which is the input file's name for a mapper task. the other is BLOCK__OFFSET__INSIDE__FILE , which is the current global file position. For block compressed file, it is the current block's file offset, which is the current block's first byte's file offset. Since Hive 0.8.


1 Answers

Yes, you can retrieve the file the record was found in using the virtual column named INPUT__FILE__NAME, for example:

select INPUT__FILE__NAME, id, name from users where ...; 

yields something like:

hdfs://localhost.localdomain:8020/user/hive/warehouse/users/users1.txt    2    user2 hdfs://localhost.localdomain:8020/user/hive/warehouse/users/users2.txt    42    john.doe 

If necessary, use the provided string functions to trim the host and directories from the uri.

You can find the documentation on virtual columns here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns

like image 197
jkovacs Avatar answered Sep 25 '22 18:09

jkovacs