Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difficulty getting OpenMP to work with f2py

I am working on some simulation work for my research and have run into a snag importing fortran into my python scripts. As background, I have worked with Python for some years now, and have only toyed around inside of Fortran when the need arose.

I have done some work in the past with Fortran implementing some simple OpenMP functionality. I am no expert in this, but I have gotten the basics going before.

I am now using f2py to create a library I can call on from my python script. When I try to compile openmp, it compiles correctly and runs to completion, but there is zero improvement in speed and looking at top I see that the CPU usage indicates that only one thread is running.

I've scoured the documentation for f2py (which is not very well documented) as well as done the normal web sleuthing for answers. I've included the Fortran code I am compiling as well as a simple python script that calls on it. I also am throwing in the compile command I am using.

Currently I cut down the simulation to 10^4 as a nice benchmark. On my system it takes 3 seconds to run. Ultimately I need to run a number of 10^6 particle simulations though, so I need to bring down the time a bit.

If anyone can point me in the direction of how to get my code working, it would be super appreciated. I can also try to include any detailed information about the system as needed.

Cheers, Rylkan


1) Compile

f2py -c --f90flags='-fopenmp' -lgomp -m calc_accel_jerk calc_accel_jerk.f90

2) Python script to call

import numpy as N
import calc_accel_jerk

# a is a (1e5,7) array with M,r,v information
a        = N.load('../test.npy')
a        = a[:1e4]

out = calc_accel_jerk.calc(a,a.shape[0])
print out[:10]

3) Fortran code

subroutine calc (input_array, nrow, output_array)
implicit none
!f2py threadsafe
include "omp_lib.h"

integer, intent(in) :: nrow
double precision, dimension(nrow,7), intent(in) :: input_array
double precision, dimension(nrow,2), intent(out) :: output_array

! Calculation parameters with set values
double precision,parameter :: psr_M=1.55*1.3267297e20
double precision,parameter :: G_Msun=1.3267297e20
double precision,parameter :: pc_to_m=3.08e16

! Vector declarations
integer :: irow
double precision :: vfac
double precision, dimension(nrow) :: drx,dry,drz,dvx,dvy,dvz,rmag,jfac,az,jz

! Break up the input array for faster access
double precision,dimension(nrow) :: input_M
double precision,dimension(nrow) :: input_rx
double precision,dimension(nrow) :: input_ry
double precision,dimension(nrow) :: input_rz
double precision,dimension(nrow) :: input_vx
double precision,dimension(nrow) :: input_vy
double precision,dimension(nrow) :: input_vz

input_M(:)  = input_array(:,1)*G_Msun
input_rx(:) = input_array(:,2)*pc_to_m
input_ry(:) = input_array(:,3)*pc_to_m
input_rz(:) = input_array(:,4)*pc_to_m
input_vx(:) = input_array(:,5)*1000
input_vy(:) = input_array(:,6)*1000
input_vz(:) = input_array(:,7)*1000

!$OMP PARALLEL DO private(vfac,drx,dry,drz,dvx,dvy,dvz,rmag,jfac,az,jz) shared(output_array) NUM_THREADS(2)
DO irow = 1,nrow
    ! Get the i-th iteration
    vfac  = sqrt(input_M(irow)/psr_M)
    drx   = (input_rx-input_rx(irow))
    dry   = (input_ry-input_ry(irow))
    drz   = (input_rz-input_rz(irow))
    dvx   = (input_vx-input_vx(irow)*vfac)
    dvy   = (input_vy-input_vy(irow)*vfac)
    dvz   = (input_vz-input_vz(irow)*vfac)
    rmag  = sqrt(drx**2+dry**2+drz**2)
    jfac  = -3*drz/(drx**2+dry**2+drz**2)

    ! Calculate the acceleration and jerk
    az = input_M*(drz/rmag**3)
    jz = (input_M/rmag**3)*((dvx*drx*jfac)+(dvy*dry*jfac)+(dvz+dvz*drz*jfac))

    ! Remove bad index
    az(irow) = 0
    jz(irow) = 0

    output_array(irow,1) = sum(az)
    output_array(irow,2) = sum(jz)
END DO
!$OMP END PARALLEL DO

END subroutine calc
like image 716
Rylkan Tiwaz Avatar asked Sep 23 '15 17:09

Rylkan Tiwaz


1 Answers

Here is a simple check to see, wether the OpenMP threads indeed are visible within the Fortran code:

module OTmod
  !$ use omp_lib
  implicit none

  public :: get_threads

contains

  function get_threads() result(nt)
    integer :: nt

    nt = 0
    !$ nt = omp_get_max_threads()

  end function get_threads

end module OTmod

Compilation:

> f2py -m OTfor --fcompiler=gfortran --f90flags='-fopenmp' -lgomp -c OTmod.f90

Execution:

> python
>>> from OTfor import otmod
>>> otmod.get_threads()
12
like image 54
haraldkl Avatar answered Sep 20 '22 07:09

haraldkl