I have written a fairly large program in Fortran 90. It has been working beautifully for quite a while, but today I tried to step it up a notch and increase the problem size (it is a research non-standard FE-solver, if that helps anyone...) Now I get the "stack overflow" error message and naturally the program terminates without giving me anything useful to work with.
The program starts with setting up all relevant arrays and matrices, and after that is done it prints a few lines of stats regarding this to a log-file. Even with my new, larger problem, this works fine (albeit a little slow), but then it fails as the "number crunching" gets going.
What confuses me is that everything at that point is already allocated (and that worked without errors). I'm not entirely sure what the stack is (Wikipedia and several treads here didn't do much since I have only a quite basic knowledge of the "behind the scenes" workings of a computer).
Assume that I for instance have some arrays initialized as:
INTEGER,DIMENSION(64) :: IA
REAL(8),DIMENSION(:,:),ALLOCATABLE :: AA, BB
which after some initialization routines (i.e. read input from file and such) are allocated as (I store some size-integers for easier passing to subroutines in IA of fixed size):
ALLOCATE( AA(N1,N2) , BB(N1,N2) )
IA(1) = N1
IA(2) = N2
This is basically what happens in the initial portion, and so far so good. But when I then call a subroutine
CALL ROUTINE_ONE(AA,BB,IA)
And the routine looks like (nothing fancy):
SUBROUTINE ROUTINE_ONE(AA,BB,IA)
IMPLICIT NONE
INTEGER,DIMENSION(64) :: IA
REAL(8),DIMENSION(IA(1),IA(2)) :: AA, BB
...
do lots of other stuff
...
END SUBROUTINE ROUTINE_ONE
Now I get an error! The output to the screen says:
forrtl: severe (170): Program Exception - stack overflow
However, when I run the program with the debugger it breaks at line 419 in a file called winsig.c
(not my file, but probably part of the compiler?). It seems to be part of a routine called sigreterror:
and it is the default case that has been invoked, returning the text Invalid signal or error
. There is a comment line attached to this which strangely says /* should never happen, but compiler can't tell */
...?
So I guess my question is, why does this happen and what is actually happening? I thought that as long as I can allocate all the relevant memory I should be fine? Does the call to the subroutine make copies of the arguments, or just pointers to them? If the answer is copies then I can see where the problem might be, and if so: any ideas on how to get around it?
The problem I try to solve is big, but not insane in any way. Standard FE-solvers can handle bigger problems than my current one. I run the program on a Dell PowerEdge 1850 and the OS is Microsoft Server 2008 R2 Enterprise. According to systeminfo
at the cmd
prompt I have 8GB of physical memory and almost 16GB virtual. As far as I understand the total of all my arrays and matrices should not add up to more than maybe 100MB - about 5.5M integer(4)
and 2.5M real(8)
(which according to me should be only about 44MB, but let's be fair and add another 50MB for overhead).
I use the Intel Fortran compiler integrated with Microsoft Visual Studio 2008.
Adding some actual source code to clarify a bit
! Update continuum state
CALL UpdateContinuumState(iTask,iArray,posc,dof,dof_k,nodedof,elm,&
bmtrx,detjac,w,mtrlprops,demtrx,dt,stress,strain,effstrain,&
effstress,aa,fi,errmsg)
is the actual call to the routine. Big arrays are posc
, bmtrx
and aa
- all other are at least an order of magnitude smaller (if not more). posc
is INTEGER(4)
and bmtrx
and aa
is REAL(8)
SUBROUTINE UpdateContinuumState(iTask,iArray,posc,dof,dof_k,nodedof,elm,bmtrx,&
detjac,w,mtrlprops,demtrx,dt,stress,strain,effstrain,&
effstress,aa,fi,errmsg)
IMPLICIT NONE
!I/O
INTEGER(4) :: iTask, errmsg
INTEGER(4) :: iArray(64)
INTEGER(4),DIMENSION(iArray(15),iArray(15),iArray(5)) :: posc
INTEGER(4),DIMENSION(iArray(22),iArray(21)+1) :: nodedof
INTEGER(4),DIMENSION(iArray(29),iArray(3)+2) :: elm
REAL(8),DIMENSION(iArray(14)) :: dof, dof_k
REAL(8),DIMENSION(iArray(12)*iArray(17),iArray(15)*iArray(5)) :: bmtrx
REAL(8),DIMENSION(iArray(5)*iArray(17)) :: detjac
REAL(8),DIMENSION(iArray(17)) :: w
REAL(8),DIMENSION(iArray(23),iArray(19)) :: mtrlprops
REAL(8),DIMENSION(iArray(8),iArray(8),iArray(23)) :: demtrx
REAL(8) :: dt
REAL(8),DIMENSION(2,iArray(12)*iArray(17)*iArray(5)) :: stress
REAL(8),DIMENSION(iArray(12)*iArray(17)*iArray(5)) :: strain
REAL(8),DIMENSION(2,iArray(17)*iArray(5)) :: effstrain, effstress
REAL(8),DIMENSION(iArray(25)) :: aa
REAL(8),DIMENSION(iArray(14)) :: fi
!Locals
INTEGER(4) :: i, e, mtrl, i1, i2, j1, j2, k1, k2, dim, planetype, elmnodes, &
Nec, elmpnodes, Ndisp, Nstr, Ncomp, Ngpt, Ndofelm
INTEGER(4),DIMENSION(iArray(15)) :: doflist
REAL(8),DIMENSION(iArray(12)*iArray(17),iArray(15)) :: belm
REAL(8),DIMENSION(iArray(17)) :: jelm
REAL(8),DIMENSION(iArray(12)*iArray(17)*iArray(5)) :: dstrain
REAL(8),DIMENSION(iArray(12)*iArray(17)) :: s
REAL(8),DIMENSION(iArray(17)) :: ep, es, dep
REAL(8),DIMENSION(iArray(15),iArray(15)) :: kelm
REAL(8),DIMENSION(iArray(15)) :: felm
dim = iArray(1)
...
And it fails before the last line above.
As per steabert's request, I'll just summarize the conversation in the comments here where it's a bit more visible, even though M.S.B.'s answer already gets right to the nub of the problem.
In technical programming, where procedures often have large local arrays for intermediate computation, this happens a lot. Local variables are generally stored on the stack, which typically (and quite reasonably) a small fraction of overall system memory -- usually of order 10MB or so. When the local variable sizes exceed the stack size, you see exactly the symptoms described here -- a stack overflow occuring after a call to the relevant subroutine but before its first executable statement.
So when this problem happens, the best thing to do is to find the relevant large local variables, and decide what to do. In this case, at least the variables belm and dstrain were getting quite sizable.
Once the variables are located, and you've confirmed that's the problem, there's a few options. As MSB points out, if you can make your arrays smaller, that's one option. Alternatively, you can make the stack size larger; under linux, that's done with ulimit -s [newsize]
. That really just postpones the problem, though, and you have to do something different on windows machines.
The other class of ways to avoid this problem is not to put the large data on the stack, but in the rest of memory (the "heap"). You can do that by giving the arrays the save
attribute (in C, static
); this puts the variable on the heap and thus makes the values persistent between calls. The downside there is that this potentially changes the behavior of the subroutine, and means the subroutine can't be used recursively, and similarly is non-threadsafe (if you're ever in a position where multiple threads will enter the routine simulatneously, they'll each see the same copy of the local varaiable and potentially overwrite each other's results). The upside is that it's easy and very portable -- it should work everywhere. However, this will only work with fixed-size local variables; if the temporary arrays have sizes that depend on the inputs, you can't do this (since there'd no longer be a single variable to save; it could be different size every time the procedure is called).
There are compiler-specific options which put all arrays (or all arrays of larger than some given size) on the heap rather than on the stack; every Fortran compiler I know has an option for this. For ifort, used in the OPs post, it's -heap-arrays
in linux, or /heap-arrays
for windows. For gfortran, this may actually be the default. This is good for making sure you know what's going on, but it means you have to have different incantations for every compiler to make sure your code works.
Finally, you can make the offending arrays allocatable. Allocated memory goes on the heap; but the variable which points to them is on the stack, so you get the benefits of both approaches. Also, this is completely standard fortran and so totally portable. The downside is that it requires code changes. Also, the allocation process can take nontrivial amounts of time; so if you're going to be calling the routine zillions of times, you may notice this slows things down slightly. (This possible performance regression is easy to fix, though; if you'll be calling it zillions of times with the same size arrays, you can have an optional argument to pass in a pre-allocated local array and use that instead, so that you only allocate/deallocate once).
Allocating/deallocating each time would look like:
SUBROUTINE UpdateContinuumState(iTask,iArray,posc,dof,dof_k,nodedof,elm,bmtrx,&
detjac,w,mtrlprops,demtrx,dt,stress,strain,effstrain,&
effstress,aa,fi,errmsg)
IMPLICIT NONE
!...arguments....
!Locals
!...
REAL(8),DIMENSION(:,:), allocatable :: belm
REAL(8),DIMENSION(:), allocatable :: dstrain
allocate(belm(iArray(12)*iArray(17),iArray(15))
allocate(dstrain(iArray(12)*iArray(17)*iArray(5))
!... work
deallocate(belm)
deallocate(dstrain)
Note that if the subroutine does a lot of work (eg, takes seconds to execute), the overhead from a couple allocate/deallocates should be negligable. If not, and you want to avoid the overhead, using the optional arguments for preallocated worskpace would look something like:
SUBROUTINE UpdateContinuumState(iTask,iArray,posc,dof,dof_k,nodedof,elm,bmtrx,&
detjac,w,mtrlprops,demtrx,dt,stress,strain,effstrain,&
effstress,aa,fi,errmsg,workbelm,workdstrain)
IMPLICIT NONE
!...arguments....
real(8),dimension(:,:), optional, target :: workbelm
real(8),dimension(:), optional, target :: workdstrain
!Locals
!...
REAL(8),DIMENSION(:,:), pointer :: belm
REAL(8),DIMENSION(:), pointer :: dstrain
if (present(workbelm)) then
belm => workbelm
else
allocate(belm(iArray(12)*iArray(17),iArray(15))
endif
if (present(workdstrain)) then
dstrain => workdstrain
else
allocate(dstrain(iArray(12)*iArray(17)*iArray(5))
endif
!... work
if (.not.(present(workbelm))) deallocate(belm)
if (.not.(present(workdstrain))) deallocate(dstrain)
Not all of the memory is created when the program starts. When you call the subroutine the executable is creating the memory that the subroutine needs for local variables. Typically arrays with simple declarations that are local to that subroutine -- neither allocatable, nor pointer -- are allocated on the stack. You could have simply run of of stack space when you reached these declarations. You might have reached a 2GB limit on a 32-bit OS with some array. Sometimes executable statements implicitly create a temporary array on the stack.
Possible solutions: 1) make your arrays smaller (not attractive), 2) make the stack larger), 3) some compilers have options to switch from placing arrays on the stack to dynamically allocating them, similar to the method used for "allocate", 4) identify large arrays and make them allocatable.
The stack is the memory area where the information needed to return from a function, and the information locally defined in a function is stored. So a stack overflow may indicate you have a function that calls another function which in its turn calls another function, etc.
I am not familiar with Fortran (anymore) but another cause might be that those functions declare tons of local variables, or at least variables that need a lot of place.
A last one: the stack is typically rather small, so it's not a priori relevant how much memory the machine has. It should be quite simple to instruct the linker to increase the stack size, at least if you are certain it's just a lack of space, and not a bug in your application.
Edit: do you use recursion in your program? Recursive calls can eat through the stack very quickly.
Edit: have a look at this: (emphasis mine)
On Windows, the stack space to reserved for the program is set using the /Fn compiler option, where n is the number of bytes. Additionally, the stack reserve size can be specified through the Visual Studio IDE which adds the Microsoft Linker option /STACK: to the linker command line. To set this, go to Property Pages>Configuration Properties>Linker>System>Stack Reserve Size. There you can specify the stack size in bytes in either decimal or C-language notation. If not specified, the default stack size is 1MB.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With