Using Pig and Python

Tags:

Apologies if this question is poorly worded: I am embarking on a large scale machine learning project and I don't like programming in Java. I love writing programs in Python. I have heard good things about Pig. I was wondering if someone could clarify to me how usable Pig is in combination with Python for mathematically related work. Also, if I am to write "streaming python code", does Jython come into the picture? Is it more efficient if it does come into the picture?

Thanks

P.S: I for several reasons would not prefer to use Mahout's code as is. I might want to use a few of their data structures: It would be useful to know if that would be possible to do.

922

asked Jul 08 '11 09:07

dvk

2 Answers

Another option to use Python with Hadoop is PyCascading. Instead of writing only the UDFs in Python/Jython, or using streaming, you can put the whole job together in Python, using Python functions as "UDFs" in the same script as where the data processing pipeline is defined. Jython is used as the Python interpreter, and the MapReduce framework for the stream operations is Cascading. The joins, groupings, etc. work similarly to Pig in spirit, so there is no surprise there if you already know Pig.

A word counting example looks like this:

@map(produces=['word'])
def split_words(tuple):
    # This is called for each line of text
    for word in tuple.get(1).split():
        yield [word]

def main():
    flow = Flow()
    input = flow.source(Hfs(TextLine(), 'input.txt'))
    output = flow.tsv_sink('output')

    # This is the processing pipeline
    input | split_words | GroupBy('word') | Count() | output

    flow.run()

190

answered Oct 28 '22 19:10

Gabor Szabo

When you use streaming in pig, it doesn't matter what language you use... all it is doing is executing a command in a shell (like via bash). You can use Python, just like you can use grep or a C program.

You can now define Pig UDFs in Python natively. These UDFs will be called via Jython when they are being executed.

answered Oct 28 '22 19:10

Donald Miner

Related questions
                            
                                Javaw still creates a console window; Why is this? [duplicate]
                            
                                AttributeError when accessing member of Java class in Jython
                            
                                Why sometimes Python subprocess failed to get the correct exit code after running a process?
                            
                                Benchmarking Java, Groovy, Jython and Python
                            
                                Jython AttributeError: read-only attr
                            
                                Jython and the SAX Parser: No more than 64000 entities allowed?
                            
                                How to use the --core-library option in eclipse/Android
                            
                                When does running Jython on a .py file generate a .class file?
                            
                                How can I install jython on Windows 7?
                            
                                jython.exe "2.7.0 final release" failed execution on my Windows OS
                            
                                Python String to Java byte[]
                            
                                Add file filters to JavaFx Filechooser in Jython and parametrize them
                            
                                In Jython, can I make an inline anonymous class that implements a Java interface?
                            
                                Why not Rhino for JVM apps?
                            
                                Hidden Multithreading Bottlenecks in Jython?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using Pig and Python

Tags:

jython

apache-pig

dvk

People also ask

2 Answers

Gabor Szabo

Donald Miner

Recent Activity

Donate For Us