I'm trying to test Hive TRANSFORM by feeding a Python script as mapper. My hive script is:
add file /full/path/to/mapper.py;
set mapred.job.queue.name=queue_name;
use my_database;
select transform(s.year, s.month, s.day, s.hour)
using 'mapper.py'
from my_table s limit 10;
and my Python mapper script is simply trying to echo the input:
#!/usr/local/bin/python
import sys
for line in sys.stdin:
print line
I have tried to run this with the following combinations:
Removing the add file ...
in the hive script and providing full path to mapper.py
in the select ...
statement
Keeping the add file ...
and the full path for mapper: /path/to/mapper.py
Keeping the add file ...
and relative path for mapper: ./mapper.py
Tried selecting mapper output using AS
clause (using 'mapper.py' as line
)
So far, all of the above attempts have resulted in Hive reporting that it cannot initialize my custom script:
FAILED: Execution Error, return code 20000 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Unable to initialize custom script.
I'm not able to understand the nature of this 'initialization.' Is Hive not able to
#!
shebang) I'm following the "Custom map/reduce scripts" in the Hive tutorial.
Resolved it by modifying my select...
statement as
add file /full/path/to/mapper.py;
select transform(s.year, s.month, s.day, s.hour)
using ' python mapper.py' --<--- This line changed
from my_table s limit 10;
Reference post
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With