Running Pig query over data stored in Hive

1 Answers

Here's what I found out: Using HiveColumnarLoader makes sense if you store data as a RCFile. To load table using this you need to register some jars first:

register /srv/pigs/piggybank.jar
register /usr/lib/hive/lib/hive-exec-0.5.0.jar
register /usr/lib/hive/lib/hive-common-0.5.0.jar

a = LOAD '/user/hive/warehouse/table' USING org.apache.pig.piggybank.storage.HiveColumnarLoader('ts int, user_id int, url string');

To load data from Sequence file you have to use PiggyBank (as in previous example). SequenceFile loader from Piggybank should handle compressed files:

register /srv/pigs/piggybank.jar
DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
a = LOAD '/user/hive/warehouse/table' USING SequenceFileLoader AS (int, int);

This doesn't work with Pig 0.7 because it's unable to read BytesWritable type and cast it to Pig type and you get this exception:

2011-07-01 10:30:08,589 WARN org.apache.pig.piggybank.storage.SequenceFileLoader: Unable to translate key class org.apache.hadoop.io.BytesWritable to a Pig datatype
2011-07-01 10:30:08,625 WARN org.apache.hadoop.mapred.Child: Error running child
org.apache.pig.backend.BackendException: ERROR 0: Unable to translate class org.apache.hadoop.io.BytesWritable to a Pig datatype
    at org.apache.pig.piggybank.storage.SequenceFileLoader.setKeyType(SequenceFileLoader.java:78)
    at org.apache.pig.piggybank.storage.SequenceFileLoader.getNext(SequenceFileLoader.java:132)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:142)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:448)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)

How to compile piggybank is described here: Unable to build piggybank -> /home/build/ivy/lib does not exist

answered Sep 18 '22 01:09

wlk

Related questions
                            
                                Iterator behaviour in flink reduceGroup
                            
                                Equivalent of Distributed Cache in Spark? [duplicate]
                            
                                Using CSV Serde with Hive create table converts all field types to string
                            
                                Ever increasing physical memory for a Spark application in YARN
                            
                                How do I specify multiple libpath in oozie job?
                            
                                How can I Read and Transfer chunks of file with Hadoop WebHDFS?
                            
                                Spark/Hadoop - Not able to save to s3 with server side encryption
                            
                                dep interpreter not found
                            
                                How to setup Apache Spark to use local hard disk when data does not fit in RAM in local mode?
                            
                                How to count number of files under specific directory in hadoop?
                            
                                How to decrease heartbeat time of slave nodes in Hadoop
                            
                                Running from a local IDE against a remote Spark cluster
                            
                                error: not found: value assemblyJarName in assembly
                            
                                How do I restart hadoop services on dataproc cluster
                            
                                Why is Apache Orc RecordReader.searchArgument() not filtering correctly?
                            
                                How to run hive script from hive cli
                            
                                How to use new Hadoop parquet magic commiter to custom S3 server with Spark
                            
                                How to read Parquet file from S3 without spark? Java
                            
                                Need help implementing this algorithm with map Hadoop MapReduce
                            
                                How to transfer mysql table to hive?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Running Pig query over data stored in Hive

Tags:

hadoop

hive

apache-pig

wlk

People also ask

1 Answers

wlk

Recent Activity

Donate For Us