Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Pig permissions issue

I'm attempting to get Apache Pig up and running on my Hadoop cluster, and am encountering a permissions problem. Pig itself is launching and connecting to the cluster just fine- from within the Pig shell, I can ls through and around my HDFS directories. However, when I try and actually load data and run Pig commands, I run into permissions-related errors:

grunt> A = load 'all_annotated.txt' USING PigStorage() AS (id:long, text:chararray, lang:chararray);
grunt> DUMP A;
2011-08-24 18:11:40,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - You don't have permission to perform the operation. Error from the server: org.apache.hadoop.security.AccessControlException: Permission denied: user=steven, access=WRITE, inode="":hadoop:supergroup:r-xr-xr-x
2011-08-24 18:11:40,977 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias A
Details at logfile: /Users/steven/Desktop/Hacking/hadoop/pig/pig-0.9.0/pig_1314230681326.log
grunt> 

In this case, all_annotated.txt is a file in my HDFS home directory that I created, and most definitely have permissions to; the same problem occurs no matter what file I try to load. However, I don't think that's the problem, as the error itself indicates Pig is trying to write somewhere. Googling around, I found a few mailing list posts suggesting that certain Pig Latin statements (order, etc.) need write access to a temporary directory on the HDFS file system whose location is controlled by the hadoop.tmp.dir property in hdfsd-site.xml. I don't think load falls into that category, but just to be sure, I changed hadoop.tmp.dir to point to a directory within my HDFS home directory, and the problem persisted.

So, anybody out there have any ideas as to what might be going on?

like image 840
Steven Bedrick Avatar asked Aug 25 '11 16:08

Steven Bedrick


People also ask

Is Apache Pig still used?

The answer is 'Yes, there is, and that is with Apache Pig'. Apache Pig is a high-level platform for creating programs that run on Hadoop. The language for this platform is called Pig Latin. Pig Latin is a SQL like scripting language, that abstracts the programming concepts of MapReduce.

What is the difference between Hive and Apache Pig?

Apache Pig is 36% faster than Apache Hive for join operations on datasets. Apache Pig is 46% faster than Apache Hive for arithmetic operations. Apache Pig is 10% faster than Apache Hive for filtering 10% of the data. Apache Pig is 18% faster than Apache Hive for filtering 90% of the data.

What is the purpose of using Apache Pig?

Apache Pig reduces the time of development using the multi-query approach. Also, Pig is beneficial for programmers who are not from Java background. 200 lines of Java code can be written in only 10 lines using the Pig Latin language. Programmers who have SQL knowledge needed less effort to learn Pig Latin.


1 Answers

Probably your pig.temp.dir setting. It defaults to /tmp on hdfs. Pig will write temporary result there. If you don't have permission to /tmp, Pig will complain. Try to override it by -Dpig.temp.dir.

like image 113
Daniel Dai Avatar answered Oct 19 '22 12:10

Daniel Dai