Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Platform-dependent performance issues when selecting a large number of files with gtk.FileChooserDialog

I have a pygtk program designed to run on both Windows and Ubuntu. It's Python 2.7 and gtk2 with the static bindings (ie no gobject introspection). The problem I'm experiencing exists on Ubuntu but not on Windows.

My program is supposed to be able process large numbers of files (here I test with about 200), but the actual processing per file isn't much. I queue the processing up on a per-file basis and present progress to the user.

The problem is that after choosing the files with a gtk.FileChooserDialog (control-A is your friend), the program hangs and gtk events aren't processed for quite some time - even though my callback function has returned. During this time the CPU usage on all cores hangs around 80%, iotop shows that my process is writing to disk at about 20MB per second, and other apps become intermittently unresponsive - Chrome, Xorg, compiz, banshee and gedit all have high CPU usage (having had low usage prior to selecting the files).

Here's some example code. To reproduce, click the button, select about 200 files from somewhere (approx ten screens worth of holding shift and down) and click OK. It shouldn't matter what files - nothing is done with them.

import gtk,gobject,time

def print_how_long_it_was_frozen():
    print time.time() - start_time

def button_clicked(button):
    dialog = gtk.FileChooserDialog(
                'Select files to add', w, gtk.FILE_CHOOSER_ACTION_OPEN,
                buttons=(gtk.STOCK_CANCEL, gtk.RESPONSE_CANCEL,
                         gtk.STOCK_OPEN, gtk.RESPONSE_OK))
    dialog.set_select_multiple(True)
    dialog.set_default_response(gtk.RESPONSE_OK)
    response = dialog.run()
    files = dialog.get_filenames()
    dialog.destroy()
    for i, f in enumerate(files):
        print i

    global start_time
    start_time = time.time()
    gobject.idle_add(print_how_long_it_was_frozen)


w = gtk.Window() 
b = gtk.Button('Select files')
w.add(b)
b.connect('clicked', button_clicked)
w.show_all()
gtk.main()

This results in a ~60 second hang after the callback has ended, during which time nothing should be happening except the dialog's destruction being processed (which happens partway through the hang).

That's on Ubuntu 11.10. On Windows there is less than a second of hang.

I have my suspicions that this is due to some Gnome or Unity 'recent files' feature, or other activity tracking. the process zeitgeist-daemon has high CPU usage during the hang too, though killing it doesn't fix the problem. Neither does disabling logging with the Zeitgeist Activity Log Manager. Even if Zeitgeist can be disabled, I can't really expect my users to disable it.

Does anyone know of how to disable a gtk app's reporting of recent files, or know of anything else that could be causing this?

Extremely large numbers of files will have to be added for processing via a 'select-folder' dialog instead, but for smaller number of files the hang time seems to be about half a second per file, which isn't really acceptable for an otherwise responsive app.

(testing done on 32 bit Windows 7 and 64 but Ubuntu 11.10. Python 2.7 and pygtk 2.24 on both)

like image 899
Chris Billington Avatar asked Feb 14 '12 10:02

Chris Billington


2 Answers

The slowdown is due to the fact that the gtk.FileChooser widget automatically puts all the files selected into the recently used file list (gtk.RecentManager.add_item()).

Adding this function running in a separate thread (and seemingly having no problem acquiring the gtk lock even during the hang) in the example code:

def log_n_recent_files():
    manager = gtk.recent_manager_get_default()
    manager.purge_items()
    while True:
        time.sleep(1)
        with gtk.gdk.lock:
            items = manager.get_items()
        with open('log.log','a') as f:
            f.write('%f %d\n'%(time.time(), len(items)))

reveals (after being left running overnight) that the delay per file increases as the number of recent files does:

Number of files added over timeRate of file adding

Since there is no method to add multiple files to the RecentManager, they are added one at a time.

Each time one is added, other gtk apps get notified that the recent files list (stored in ~/.local/share/recently-used.xbel) has changed. They then parse the file and loop through the items, looking for the n most recent items (where n is app specific), to display them. In determining which files are most recent, a system time call is made for each item.

The problem is exacerbated by the fact that recently-used.xbel is able to grow without limit. So if you have 5000 items in recently-used.xbel, and you're selecting 200 files with a gtk.FileChooser, you'll get (sum from n=1 to 200) (5000 + n) ~ 1 million system time calls for each gtk app running.

There are properties in gtk.Settings that make your app look for fewer files in the history, gtk-recent-files-limit and gtk-recent-files-max-age, but they don't prevent ~/.local/share/recently-used.xbel from being written to.

To prevent recently-used.xbel from being written to, one can write protect it, or replace it with a folder. In this case gtk still attempts to add all the files, but each attempt fails. The delay is about 1 second per 200 files - I guess the overhead of making the attempt is still significant.

Since there seems no way to turn off this behaviour of gtk.FileChooser, the only other way is to use a different filechooser widget. Even with 30000 files, there is no perceptible delay when using the deprecated gtk.FileSelection widget instead.

It's an ugly widget, but I think I'm going to have to use it and file a bug report/feature request for being able to disable recent file reporting by gtk.FileChooser.

like image 149
Chris Billington Avatar answered Sep 27 '22 17:09

Chris Billington


This might not count as an answer, but it might help.

After looking at why the file chooser dialogs in gtk2 were so slow to open I found out that gtk.FileChooserDialogs aren't light weight objects.

You shouldn't create one for a single use and then destroy it. You should instead reuse them as you can just .hide() them and they will reappear when .run() is called again.

note that using dialog.set_current_folder(dialog.get_current_folder()) forces the file listing to refresh.

also note that the items that are selected when the dialog is hidden will remain selected when the dialog reappears, unless the file listing is refreshed or the files no longer exist.


If I change your code to follow that, it becomes:

import gtk,gobject,time

def print_how_long_it_was_frozen():
    print time.time() - start_time

def button_clicked(button):
    response = dialog.run()
    files = dialog.get_filenames()
    dialog.hide()
    for i, f in enumerate(files):
        print i

    global start_time
    start_time = time.time()
    gobject.idle_add(print_how_long_it_was_frozen)


w = gtk.Window() 
b = gtk.Button('Select files')
w.add(b)
b.connect('clicked', button_clicked)
w.show_all()

dialog = gtk.FileChooserDialog(
            'Select files to add', w, gtk.FILE_CHOOSER_ACTION_OPEN,
            buttons=(gtk.STOCK_CANCEL, gtk.RESPONSE_CANCEL,
                     gtk.STOCK_OPEN, gtk.RESPONSE_OK))
dialog.set_select_multiple(True)
dialog.set_default_response(gtk.RESPONSE_OK)

gtk.main()
dialog.destroy()
like image 34
Dan D. Avatar answered Sep 27 '22 17:09

Dan D.