parallelize 'for' loop in Python 3

Tags:

I am trying to do some analysis of the MODIS satellite data. My code primarily reads a lot of files (806) of the dimension 1200 by 1200 (806*1200*1200). It do it using a for loop and perform mathematical operations.

Following is the general way in which I read files.

mindex=np.zeros((1200,1200))
for i in range(1200):
    var1 = xray.open_dataset('filename.nc')['variable'][:,i,:].data
    for j in range(1200):
        var2 = var1[:,j]
        ## Mathematical Calculations to find var3[i,j]## 
        mindex[i,j] = var3[i,j]

Since its a lot of data to handle, the process is very slow and I was considering parallelizing it. I tried doing something with joblib, but I have not been able to do it.

I am unsure how to tackle this problem.

897

asked Jul 13 '18 12:07

Nirav L Lekinwala

1 Answers

My guess is that you want to work on several files at the same time. To do so, the best way (in my opinion) is to use multiprocessing. To use this, you need to define an elementary step, and it is already done in your code.

import numpy as np
import multiprocessing as mp
import os

def f(file):
    mindex=np.zeros((1200,1200))
    for i in range(1200):
        var1 = xray.open_dataset(file)['variable'][:,i,:].data
        for j in range(1200):
            var2 = var1[:,j]
            ## Mathematical Calculations to find var3[i,j]## 
            mindex[i,j] = var3[i,j]
    return (file, mindex)


if __name__ == '__main__':
    N= mp.cpu_count()

    files = os.scandir(folder)

    with mp.Pool(processes = N) as p:
        results = p.map(f, [file.name for file in files])

This should return a list of element results in which each element is a tuple with the file name and the mindex matrix. With this, you can work on multiple files at the same time. It is particularly efficient if the computation on each file is long.

answered Oct 10 '22 02:10

Mathieu

Related questions
                            
                                Assign a Series to several Rows of a Pandas DataFrame
                            
                                Python test fixture to run a single test?
                            
                                Getting 405 error while trying to download nltk data
                            
                                Why doesn't super() work with static methods other than __new__?
                            
                                Accepting integers as keys of **kwargs
                            
                                manage.py doesn't log to stdout/stderr in Docker on Raspberry Pi
                            
                                Mean Euclidean distance in Tensorflow
                            
                                Keras - how to get unnormalized logits instead of probabilities
                            
                                Read Parquet file stored in S3 with AWS Lambda (Python 3)
                            
                                How to properly close mysql connections in sqlalchemy?
                            
                                How to make cerberus required rule depends on condition
                            
                                How to solve binary mode doesn't take an encoding argument
                            
                                Using python3 in shell script in crontab
                            
                                Inserting comments into jupyter notebook
                            
                                Spotipy Refreshing a token with authorization code flow
                            
                                Why is the Mean Average Percentage Error(mape) extremely high?
                            
                                How to reuse a selenium browser session
                            
                                How to group functions without side effects?
                            
                                How does the Gensim Fasttext pre-trained model get vectors for out-of-vocabulary words?
                            
                                no such element: Unable to locate element using chromedriver and Selenium in production environment

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

parallelize 'for' loop in Python 3

Tags:

python

python-3.x

python-multithreading

multiprocessing

python-multiprocessing

Nirav L Lekinwala

People also ask

1 Answers

Mathieu

Recent Activity

Donate For Us