I'm using the cloudfile module to upload files to rackspace cloud files, using something like this pseudocode:
import cloudfiles
username = '---'
api_key = '---'
conn = cloudfiles.get_connection(username, api_key)
testcontainer = conn.create_container('test')
for f in get_filenames():
obj = testcontainer.create_object(f)
obj.load_from_filename(f)
My problem is that I have a lot of small files to upload, and it takes too long this way.
Buried in the documentation, I see that there is a class ConnectionPool, which supposedly can be used to upload files in parallell.
Could someone please show how I can make this piece of code upload more than one file at a time?
Run the Application by running “python multiplefilesupload.py”. Go to browser and type “http://localhost:5000”, you will see “upload files” in browser.
Browse to the files you want to upload from your computer and use Ctrl/Cmd +select to choose multiple files. Select Upload.
The ConnectionPool
class is meant for a multithreading application that ocasionally has to send something to rackspace.
That way you can reuse your connection but you don't have to keep 100 connections open if you have 100 threads.
You are simply looking for a multithreading/multiprocessing uploader.
Here's an example using the multiprocessing
library:
import cloudfiles
import multiprocessing
USERNAME = '---'
API_KEY = '---'
def get_container():
conn = cloudfiles.get_connection(USERNAME, API_KEY)
testcontainer = conn.create_container('test')
return testcontainer
def uploader(filenames):
'''Worker process to upload the given files'''
container = get_container()
# Keep going till you reach STOP
for filename in iter(filenames.get, 'STOP'):
# Create the object and upload
obj = container.create_object(filename)
obj.load_from_filename(filename)
def main():
NUMBER_OF_PROCESSES = 16
# Add your filenames to this queue
filenames = multiprocessing.Queue()
# Start worker processes
for i in range(NUMBER_OF_PROCESSES):
multiprocessing.Process(target=uploader, args=(filenames,)).start()
# You can keep adding tasks until you add STOP
filenames.put('some filename')
# Stop all child processes
for i in range(NUMBER_OF_PROCESSES):
filenames.put('STOP')
if __name__ == '__main__':
multiprocessing.freeze_support()
main()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With