Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Debugging strategy for a bug (apparently) affected by timing

I'm fairly inexperienced with python, so I find myself at a loss as to how to approach this bug. I've inherited a python app that mainly just copies files and folders from one place to another. All the files and folders are on the local machine and the user has full admin rights, so there are no networking or security issues here.

What I've found is that a number of files fail to get copied from one directory to another unless I slow down the code somehow. If I just run the program it fails, but if I step through with a debugger or add print statements to the copy loop, it succeeds. The difference there seems to be either the timing of the loop or moving things around in memory.

I've seen this sort of bug in compiled languages before, and it usually indicates either a race condition or memory corruption. However, there is only one thread and no interaction with other processes, so a race condition seems impossible. Memory corruption remains a possibility, but I'm not sure how to investigate that possibility with python.

Here is the loop in question. As you can see, it's rather simple:

def concat_dirs(dir, subdir):
    return dir + "/" + subdir

for name in fullnames:
    install_name = concat_dirs(install_path, name)
    dirname = os.path.dirname(install_name)
    if not os.path.exists(dirname): 
        os.makedirs(dirname)
    shutil.copyfile(concat_dirs(java_path, name), install_name)

That loop usually fails to copy the files unless I either step through it with a debugger or add this statement after the shutil.copyfile line.

    print "copied ", concat_dirs(java_path, name), " to ", install_name

If I add that statement or step through in debug, the loop works perfectly and consistently. I'm tempted to say "good enough" with the print statement but I know that's just masking an underlying problem.

I'm not asking you to debug my code because I know you can't; I'm asking for a debug strategy. How do I approach finding this bug?

like image 955
Carey Gregory Avatar asked Apr 19 '14 01:04

Carey Gregory


1 Answers

You do have a race condition: you check for the existence of dirname and then try to create it. That should cause the program to bomb if something unexpected happens, but...

In Python, we says that it's easier to ask forgiveness than permission. Go ahead and create that directory each time, then apologize if it already exists:

import errno
import os

for name in fullnames:
    source_name = os.path.join(java_path, name)
    dest_name = os.path.join(install_path, name)
    dest_dir = os.path.dirname(dest_name)

    try:
        os.makedirs(dest_dir)
    except OSError as exc:
        if exc.errno != errno.EEXIST:
            raise

    shutil.copyfile(source_name, dest_name)

I'm not sure how I'd troubleshoot this, other than by trying it the non-racy way and seeing what happens. There may be a subtle filesystem issue that's making this run oddly.

like image 118
Kirk Strauser Avatar answered Oct 14 '22 05:10

Kirk Strauser