Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Parallel.Foreach equivalent in Python

I have 96 txt files that have to be processed. Right now I am using a for loop and doing them one at a time, this process is very slow. The resulting 96 files, do not need to be merged. Is there a way to make them run in parallel, ala Parallel.foreach in C#? Current code:

for src_name in glob.glob(source_dir+'/*.txt'):
   outfile = open (...)
   with open(...) as infile:
      for line in infile:
         --PROCESS--
   for --condition--:
      outfile.write(...)
   infile.close()
   outfile.close()

Want this process to run in parallel for all files in source_dir.

like image 618
Reise45 Avatar asked Mar 24 '15 15:03

Reise45


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr. Stroustroupe.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.

Why do we write C?

We write C for Carbon Because in some element the symbol of the element is taken form its first words and Co for Cobalt beacause in some elements the symbol of the element is taken from its first second letters, so that the we don't get confuse.


1 Answers

Assuming that the limiting factor is indeed the processing and not the I/O, you can use joblib to easily run your loop on multiple CPUs.

A simple example from their documentation:

>>> from math import sqrt
>>> from joblib import Parallel, delayed
>>> Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
like image 198
Carsten Avatar answered Sep 20 '22 05:09

Carsten