So I've been playing around with raw WSGI, cgi.FieldStorage and file uploads. And I just can't understand how it deals with file uploads. At first it seemed that it just stores the whole file in memory. And I thought hm, that should be easy to test - a big file should clog up the memory!.. And it didn't. Still, when I request the file, it's a string, not an iterator, file object or anything. I've tried reading the cgi module's source and found some things about temporary files, but it returns a freaking string, not a file(-like) object! So... how does it fscking work?! Here's the code I've used: <pre class="prettyprint"><code>import cgi from wsgiref.simple_server import make_server def app(environ,start_response): start_response('200 OK',[('Content-Type','text/html')]) output = """ <form action="" method="post" enctype="multipart/form-data"> <input type="file" name="failas" /> <input type="submit" value="Varom" /> </form> """ fs = cgi.FieldStorage(fp=environ['wsgi.input'],environ=environ) f = fs.getfirst('failas') print type(f) return output if __name__ == '__main__' : httpd = make_server('',8000,app) print 'Serving' httpd.serve_forever() </code></pre> Thanks in advance! :)

Inspecting the cgi module description, there is a paragraph discussing how to handle file uploads. <blockquote> If a field represents an uploaded file, accessing the value via the value attribute or the <code>getvalue()</code> method reads the entire file in memory as a string. This may not be what you want. You can test for an uploaded file by testing either the filename attribute or the file attribute. You can then read the data at leisure from the file attribute: </blockquote> <pre class="prettyprint"><code>fileitem = form["userfile"] if fileitem.file: # It's an uploaded file; count lines linecount = 0 while 1: line = fileitem.file.readline() if not line: break linecount = linecount + 1 </code></pre> Regarding your example, <code>getfirst()</code> is just a version of <code>getvalue()</code>. try replacing <pre class="prettyprint"><code>f = fs.getfirst('failas') </code></pre> with <pre class="prettyprint"><code>f = fs['failas'].file </code></pre> This will return a file-like object that is readable "at leisure".

Using an answer by @hasanatkazmi (utilized in a Twisted app) I got something like: <pre class="prettyprint"><code>#!/usr/bin/env python2 # -*- coding: utf-8 -*- # -*- indent: 4 spc -*- import sys import cgi import tempfile class PredictableStorage(cgi.FieldStorage): def __init__(self, *args, **kwargs): self.path = kwargs.pop('path', None) cgi.FieldStorage.__init__(self, *args, **kwargs) def make_file(self, binary=None): if not self.path: file = tempfile.NamedTemporaryFile("w+b", delete=False) self.path = file.name return file return open(self.path, 'w+b') </code></pre> Be warned, that the file is not always created by the cgi module. According to these <code>cgi.py</code> lines it will only be created if the content exceeds 1000 bytes: <pre class="prettyprint"><code>if self.__file.tell() + len(line) > 1000: self.file = self.make_file('') </code></pre> So, you have to check if the file was actually created with a query to a custom class' <code>path</code> field like so: <pre class="prettyprint"><code>if file_field.path: # Using an already created file... else: # Creating a temporary named file to store the content. import tempfile with tempfile.NamedTemporaryFile("w+b", delete=False) as f: f.write(file_field.value) # You can save the 'f.name' field for later usage. </code></pre> If the <code>Content-Length</code> is also set for the field, which seems rarely, the file should also be created by cgi. That's it. This way you can store the file predictably, decreasing the memory usage footprint of your app.

How does cgi.FieldStorage store files?

Tags:

python

cgi

wsgi

So I've been playing around with raw WSGI, cgi.FieldStorage and file uploads. And I just can't understand how it deals with file uploads.

At first it seemed that it just stores the whole file in memory. And I thought hm, that should be easy to test - a big file should clog up the memory!.. And it didn't. Still, when I request the file, it's a string, not an iterator, file object or anything.

I've tried reading the cgi module's source and found some things about temporary files, but it returns a freaking string, not a file(-like) object! So... how does it fscking work?!

Here's the code I've used:

Click to copy

import cgi
from wsgiref.simple_server import make_server

def app(environ,start_response):
    start_response('200 OK',[('Content-Type','text/html')])
    output = """
    <form action="" method="post" enctype="multipart/form-data">
    <input type="file" name="failas" />
    <input type="submit" value="Varom" />
    </form>
    """
    fs = cgi.FieldStorage(fp=environ['wsgi.input'],environ=environ)
    f = fs.getfirst('failas')
    print type(f)
    return output


if __name__ == '__main__' :
    httpd = make_server('',8000,app)
    print 'Serving'
    httpd.serve_forever()

Thanks in advance! :)

332

asked Jul 27 '11 14:07

Justinas

3 Answers

Inspecting the cgi module description, there is a paragraph discussing how to handle file uploads.

If a field represents an uploaded file, accessing the value via the value attribute or the getvalue() method reads the entire file in memory as a string. This may not be what you want. You can test for an uploaded file by testing either the filename attribute or the file attribute. You can then read the data at leisure from the file attribute:

Click to copy

fileitem = form["userfile"]
if fileitem.file:
    # It's an uploaded file; count lines
    linecount = 0
    while 1:
        line = fileitem.file.readline()
        if not line: break
        linecount = linecount + 1

Regarding your example, getfirst() is just a version of getvalue(). try replacing

Click to copy

f = fs.getfirst('failas')

with

Click to copy

f = fs['failas'].file

This will return a file-like object that is readable "at leisure".

134

answered Oct 24 '22 17:10

gimel

The best way is to NOT to read file (or even each line at a time as gimel suggested).

You can use some inheritance and extend a class from FieldStorage and then override make_file function. make_file is called when FieldStorage is of type file.

For your reference, default make_file looks like this:

Click to copy

def make_file(self, binary=None):
    """Overridable: return a readable & writable file.

    The file will be used as follows:
    - data is written to it
    - seek(0)
    - data is read from it

    The 'binary' argument is unused -- the file is always opened
    in binary mode.

    This version opens a temporary file for reading and writing,
    and immediately deletes (unlinks) it.  The trick (on Unix!) is
    that the file can still be used, but it can't be opened by
    another process, and it will automatically be deleted when it
    is closed or when the current process terminates.

    If you want a more permanent file, you derive a class which
    overrides this method.  If you want a visible temporary file
    that is nevertheless automatically deleted when the script
    terminates, try defining a __del__ method in a derived class
    which unlinks the temporary files you have created.

    """
    import tempfile
    return tempfile.TemporaryFile("w+b")

rather then creating temporaryfile, permanently create file wherever you want.

answered Oct 24 '22 18:10

hasanatkazmi

Using an answer by @hasanatkazmi (utilized in a Twisted app) I got something like:

Click to copy

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
# -*- indent: 4 spc -*-
import sys
import cgi
import tempfile


class PredictableStorage(cgi.FieldStorage):
    def __init__(self, *args, **kwargs):
        self.path = kwargs.pop('path', None)
        cgi.FieldStorage.__init__(self, *args, **kwargs)

    def make_file(self, binary=None):
        if not self.path:
            file = tempfile.NamedTemporaryFile("w+b", delete=False)
            self.path = file.name
            return file
        return open(self.path, 'w+b')

Be warned, that the file is not always created by the cgi module. According to these cgi.py lines it will only be created if the content exceeds 1000 bytes:

Click to copy

if self.__file.tell() + len(line) > 1000:
    self.file = self.make_file('')

So, you have to check if the file was actually created with a query to a custom class' path field like so:

Click to copy

if file_field.path:
    # Using an already created file...
else:
    # Creating a temporary named file to store the content.
    import tempfile
    with tempfile.NamedTemporaryFile("w+b", delete=False) as f:
        f.write(file_field.value)
        # You can save the 'f.name' field for later usage.

If the Content-Length is also set for the field, which seems rarely, the file should also be created by cgi.

That's it. This way you can store the file predictably, decreasing the memory usage footprint of your app.

answered Oct 24 '22 19:10

Vladius

Related questions
                            
                                How to override equals() in google app engine data model type?
                            
                                Python Pickling Slots Error
                            
                                How to execute another python script from your script and be able to debug?
                            
                                SQLAlchemy memory hog on select statement
                            
                                Why does python gstreamer crash without "gobject.threads_init()" at the top of my script?
                            
                                Python-style pickling for C++?
                            
                                Do i need Node.js in Python like I would with PHP?
                            
                                In Tkinter how do i remove focus from a widget?
                            
                                Accessing model field attributes in Django
                            
                                How to receive mail using python
                            
                                Encoding for Multilingual .py Files
                            
                                Subclassing numpy ndarray problem
                            
                                Initialization of unit-test in PyDev?
                            
                                Google App Engine: task_retry_limit doesn't work?
                            
                                Python on iPhone
                            
                                Saving the state of a program to allow it to be resumed [duplicate]
                            
                                Python monitor serial port (RS-232) handshake signals
                            
                                best practice for passing values between functions in Python
                            
                                Monitor events in a filesystem as they happen
                            
                                Jinja install for python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does cgi.FieldStorage store files?

Tags:

python

cgi

wsgi

Justinas

People also ask

3 Answers

gimel

hasanatkazmi

Vladius

Recent Activity

Donate For Us