Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Increasing speed of IMAP bulk message deletion in python

The goal is to remove bunch of email messages using imaplib. Email folder receives approximately 300k new messages a month. Only messages that are older than 1 month should be deleted. If executing this script it will delete old messages, but deletion takes a lot of time and simple for iteration does not look effective. It takes several hours. By trying to increase speed with multiprocessing gives error.

What can you advise to improve the speed of deleting big amount of messages?

import sys
import datetime
from imaplib import IMAP4

# get the date a month from the current
monthbefore = (datetime.date.today() - datetime.timedelta(365/12)).strftime("%d-%b-%Y")

m = IMAP4('mail.domain.com')
m.login('[email protected]', 'password')

# shows how many messages in selected folder
print m.select('Folder')
typ, data = m.select('Folder')

# find old messages
typ, data = m.search(None, '(BEFORE %s)' % (monthbefore))

# delete them
print "Will be removed:\t", data[0].split()[-1],"messages"
for num in data[0].split():
  m.store(num, '+FLAGS', '\\Deleted')
  sys.stderr.write('\rRemoving message:\t %s' % num)

# now expunge marked for deletion messages, close connection and exit
print "\nGet ready for expunge"
m.expunge()
print "Expunged! Quiting."
m.close()
m.logout()

Update: Rewrited part of a code, here is a 1000 times faster working variant (my server supports store command to more than 1000 messages at a time):

    def chunks(l, n):
        # yields successive n-sized chunks from l.
        for i in xrange(0, len(l), n):
            yield l[i:i+n]

    mcount = data[0].split()[-1]
    print "Will be removed", mcount, "messages"
    for i in list(chunks(data[0].split(), 1000)):
        m.store(",".join(i), '+FLAGS', '\\Deleted')
        sys.stderr.write('\rdone {0:.2f}%'.format((int(i[-1])/int(mcount)*100)))
like image 539
insider Avatar asked Oct 20 '25 03:10

insider


1 Answers

I think the main problem here is that you're calling STORE for each message. Each one of those round trips to the server takes time and when you're doing lots of deletions this really adds up.

To avoid all those calls to STORE trying calling it with multiple message ids. You can either pass a comma separate listed (e.g. "1,2,3,4"), ranges of message ids (e.g. "1:10") or a combination of both (e.g. "1,2,5,1:10"). Note that most servers seem to have a limit on the number of message ids allowed per call so you'll probably still need to chunk the ids into blocks (of say 200 messages) and call STORE multiple times. This will still be much, much faster than calling STORE per message.

For further reference, see the STORE Command section of RFC 3501. It shows an example of a STORE command taking a range of message ids.

like image 99
Menno Smits Avatar answered Oct 21 '25 18:10

Menno Smits



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!