Recommended language for multithreaded data work

Tags:

Right now, I use a combination of Python and R for all of my data processing needs. However, some of my datasets are incredibly large and would benefit strongly from multithreaded processing.

For example, if there are two steps that each have to performed on a set of several millions of data points, I would like to be able to start the second step while the first step is still being run, using the part of the data that has already been processed through the first step.

From my understanding, neither Python nor R is the ideal language for this type of work (at least, I don't know how to implement it in either language). What would be the best language/implementation for this type of data processing?

746

asked Aug 17 '10 22:08

chimeracoder

1 Answers

It is possible to do this in Python using the multiprocessing module -- this spawns multiple processes instead of threads, which bypasses the GIL and hence allows true concurrency.

That is not to say that Python is the 'best' language for this job; that's a subjective point which can be argued over. But it is certainly capable of it.

EDIT: Yes, there are several ways to share data between processes. Pipes are the simplest; they are sort-of file-like handles which one process can write to and then another can read from. Straight from the docs:

from multiprocessing import Process, Pipe

def f(conn):
    conn.send([42, None, 'hello'])
    conn.close()

if __name__ == '__main__':
    parent_conn, child_conn = Pipe()
    p = Process(target=f, args=(child_conn,))
    p.start()
    print parent_conn.recv()   # prints "[42, None, 'hello']"
    p.join()

You could for instance have one process performing the first step and sending the results down a pipe to another process for the second step.

answered Sep 23 '22 01:09

Katriel

Related questions
                            
                                Something like pubsubhubbub that does not depend on google app engine
                            
                                How can I approximate Python's or operator for set comparison in Scala?
                            
                                Suppress linebreak on file.write
                            
                                How can I correct a corrupted $PYTHONPATH?
                            
                                Traversing FTP listing
                            
                                why is python reusing a class instance inside in function
                            
                                Listing buildout configuration variables
                            
                                Sorting CSV in Python
                            
                                os.walk() python: xml representation of a directory structure, recursion
                            
                                Python: how so fast?
                            
                                Python's layout of low-value ints in memory
                            
                                Python Jabber/XMPP client library for Twisted [closed]
                            
                                App Engine, Python: how to filter query by ID?
                            
                                Collaborative Filtering: Non-Personalized item-to-item similarity
                            
                                python sax error "junk after document element"
                            
                                OOWrite is to LaTeX as OODraw is to?
                            
                                Mercurial/Python - What Does The Underscore Function Do?
                            
                                Python profiler usage with objects
                            
                                Why does the Python 2.7 AMD 64 installer seem to run Python in 32 bit mode?
                            
                                Python ctypes not loading dynamic library on Mac OS X

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Recommended language for multithreaded data work

Tags:

python

r

multithreading

chimeracoder

People also ask

1 Answers

Katriel

Recent Activity

Donate For Us