Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flag -useHCatalog not working

I installed CDH5.4 in single node following the instructions here, also, I put the hive-metastore in localmode using these instructions and everything works perfectly, except when I tried to connect pig with the metastore:

➜  ~  pig -useHCatalog
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2015-05-01 15:45:08,657 [main] INFO  org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.4.0 (rUnversioned directory) compiled Apr 21 2015, 12:19:15
2015-05-01 15:45:08,658 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/itam/pig_1430495108571.log
2015-05-01 15:45:09,035 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:09,035 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:09,035 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2015-05-01 15:45:09,940 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:09,941 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
2015-05-01 15:45:09,941 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:09,999 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,001 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,088 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,089 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,125 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,126 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,160 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,162 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,194 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,195 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,227 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,228 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,261 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,262 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,295 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,296 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

and when I tried to access the table:

grunt> a = load 'ufos' using org.apache.hcatalog.pig.HCatLoader();
2015-05-01 15:46:11,656 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /home/itam/pig_1430495108571.log
grunt> 

Hadoop version

➜  ~  hadoop version
Hadoop 2.6.0-cdh5.4.0
Subversion http://github.com/cloudera/hadoop -r c788a14a5de9ecd968d1e2666e8765c5f018c271 
Compiled by jenkins on 2015-04-21T19:16Z
Compiled with protoc 2.5.0
From source with checksum cd78f139c66c13ab5cee96e15a629025
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.4.0.jar

UPDATE: I just tried with Impala, and It neither sees anything:

➜  ~  impala-shell                                                                                                                                                                                                                             
/usr/lib/python2.7/dist-packages/pkg_resources.py:1049: UserWarning: /home/itam/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extracti
on_path or the PYTHON_EGG_CACHE environment variable).
  warnings.warn(msg, UserWarning)
Starting Impala Shell without Kerberos authentication
Connected to 6b512e41337d:21000
Server version: impalad version 2.2.0-cdh5 RELEASE (build 2ffd73a4255cefd521362ffe1cfb37463f67f75c)
Welcome to the Impala shell. Press TAB twice to see a list of available commands.

Copyright (c) 2012 Cloudera, Inc. All rights reserved.

(Shell build version: Impala Shell v2.2.0-cdh5 (2ffd73a) built on Tue Apr 21 12:09:21 PDT 2015)
[6b512e41337d:21000] > invalidate metadata;
Query: invalidate metadata
[6b512e41337d:21000] > show tables;
Query: show tables

Fetched 0 row(s) in 0.00s

but from beeline:

~  beeline -u jdbc:hive2://
scan complete in 2ms
Connecting to jdbc:hive2://
Connected to: Apache Hive (version 1.1.0-cdh5.4.0)
Driver: Hive JDBC (version 1.1.0-cdh5.4.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.4.0 by Apache Hive
0: jdbc:hive2://> show tables;
OK
+-----------+--+
| tab_name  |
+-----------+--+
| ufos      |
+-----------+--+
1 row selected (0.701 seconds)

It worked... What is happening?

UPDATE: I am running hcatalog too

➜  ~  sudo service hive-webhcat-server status
 * WEBHCat server is running

➜  ~  hcat -e "desc ufos"                    
OK
timestamp               string                  from deserializer   
city                    string                  from deserializer   
state                   string                  from deserializer   
shape                   string                  from deserializer   
duration                string                  from deserializer   
summary                 string                  from deserializer   
posted                  string                  from deserializer   
Time taken: 1.314 seconds

UPDATE: The problem with impala was due that I didn't copy hive-site.xml to /etc/impala/conf, once this is done, impala-shell worked properly.

like image 336
nanounanue Avatar asked May 01 '15 15:05

nanounanue


2 Answers

The loader you are using is deprecated. Instead of using org.apache.hcatalog.pig.HCatLoader, you need to use org.apache.hive.hcatalog.pig.HCatLoader.

From org.apache.hcatalog.pig.HCatLoader:

Deprecated. Use/modify HCatLoader instead

like image 80
Balduz Avatar answered Oct 17 '22 22:10

Balduz


I was facing the issue in HDP 2.3 and Pig 0.15 .

Package name for HCatLoader() class is different in Hortonworks distribution.

The following worked for me

USING org.apache.hive.hcatalog.pig.HCatLoader()

instead of USING org.apache.hcatalog.pig.HCatLoader();

like image 31
minhas23 Avatar answered Oct 17 '22 23:10

minhas23