sample map reduce script in python for hive produces exception

Tags:

I am learning hive. I have setup a table named records. With schema as follows:

year        : string
temperature : int
quality     : int

Here are sample rows

1999 28 3
2000 28 3
2001 30 2

Now I wrote a sample map reduce script in python exactly as specified in the book Hadoop The Definitive Guide:

import re
import sys

for line in sys.stdin:
    (year,tmp,q) = line.strip().split()
    if (tmp != '9999' and re.match("[01459]",q)):
        print "%s\t%s" % (year,tmp)

I run this using following command:

ADD FILE /usr/local/hadoop/programs/sample_mapreduce.py;
SELECT TRANSFORM(year, temperature, quality)
USING 'sample_mapreduce.py'
AS year,temperature;

Execution fails. On the terminal I get this:

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2012-08-23 18:30:28,506 Stage-1 map = 0%,  reduce = 0%
2012-08-23 18:30:59,647 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201208231754_0005 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201208231754_0005_m_000002 (and more) from job job_201208231754_0005
Exception in thread "Thread-103" java.lang.RuntimeException: Error while reading from task log url
    at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
    at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
    at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://master:50060/tasklog?taskid=attempt_201208231754_0005_m_000000_2&start=-8193
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
    at java.net.URL.openStream(URL.java:1010)
    at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
    ... 3 more

I go to failed job list and this is the stack trace

java.lang.RuntimeException: Hive Runtime Error while closing operators
    at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hit error while closing ..
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:452)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
    at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
    ... 8 more

The same trace repeated 3 times more.

Please, can someone help me with this? What is wrong here? I am going exactly by the book. What seems to be the problem. There are two errors it seems. On terminal it says that it can't read from task log url. In the failed job list, the exception says something different. Please help

323

asked Aug 23 '12 13:08

Shades88

1 Answers

I went to stedrr log from the hadoop admin interface and saw that there was syntax error from python. Then I found that when I created hive table the field delimiter was tab. And in the split() i hadn't mentioned. So I changed it to split('\t') and it worked alright !

200

answered Oct 10 '22 04:10

Shades88

Related questions
                            
                                Skip the IP headers with tcpdump
                            
                                How do you find the CPU consumption for a piece of Python?
                            
                                How to create a text along with curve using QPainterPath
                            
                                Is there a way to unit test Gtk/GLib code written in Python?
                            
                                Understanding Python Web Application Deployment
                            
                                Undefined variable from import when using protocol buffers in PyDev
                            
                                slicing pandas dataframe on date range
                            
                                python subprocess.Popen hide real "display name"
                            
                                python: Google Search Scraper with BeautifulSoup
                            
                                Django matching query does not exist after object save in Celery task
                            
                                Getting Django to recognize PIL JPEG support
                            
                                Scheduling reservations (not restaurant) with python
                            
                                Creating a python module
                            
                                POST-then-redirect and MethodViews
                            
                                Pass list argument to a Call node in a Jinja2 extension
                            
                                Can Flask provide CherryPy style routing?
                            
                                How do I communicate between 2 Python programs using sockets that are on separate NATs?
                            
                                SQL Alchemy Relationship loader leaves a lock on table?
                            
                                Built in Numpy Exceptions?
                            
                                Is it possible to use POD(plain old documentation) with Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

sample map reduce script in python for hive produces exception

Tags:

python

hadoop

hive

Shades88

People also ask

1 Answers

Shades88

Recent Activity

Donate For Us