Hive transform using Python: Unable to initialize custom script

Question

I'm trying to test Hive TRANSFORM by feeding a Python script as mapper. My hive script is:

add file  /full/path/to/mapper.py;

set mapred.job.queue.name=queue_name;

use my_database;

select transform(s.year, s.month, s.day, s.hour) 
using 'mapper.py' 
from my_table s limit 10;

and my Python mapper script is simply trying to echo the input:

#!/usr/local/bin/python
import sys
for line in sys.stdin:
    print line

I have tried to run this with the following combinations:

Removing the add file ... in the hive script and providing full path to mapper.py in the select ... statement
Keeping the add file ... and the full path for mapper: /path/to/mapper.py
Keeping the add file ... and relative path for mapper: ./mapper.py
Tried selecting mapper output using AS clause (using 'mapper.py' as line)

So far, all of the above attempts have resulted in Hive reporting that it cannot initialize my custom script:

FAILED: Execution Error, return code 20000 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Unable to initialize custom script.

I'm not able to understand the nature of this 'initialization.' Is Hive not able to

find my script (i.e., a path issue)?
locate the python executable (i.e., the #! shebang)

I'm following the "Custom map/reduce scripts" in the Hive tutorial.

RDK · Accepted Answer

Resolved it by modifying my select... statement as

add file  /full/path/to/mapper.py;
select transform(s.year, s.month, s.day, s.hour) 
using ' python mapper.py' --<--- This line changed
from my_table s limit 10;

Reference post

Hive transform using Python: Unable to initialize custom script

Tags:

python

hadoop

hive

RDK

1 Answers

RDK

Recent Activity

Donate For Us

Hive transform using Python: Unable to initialize custom script

Tags:

python

hadoop

hive

RDK

1 Answers

RDK

Related questions

Recent Activity

Donate For Us