Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does cgi.FieldStorage store files?

Tags:

python

cgi

wsgi

So I've been playing around with raw WSGI, cgi.FieldStorage and file uploads. And I just can't understand how it deals with file uploads.

At first it seemed that it just stores the whole file in memory. And I thought hm, that should be easy to test - a big file should clog up the memory!.. And it didn't. Still, when I request the file, it's a string, not an iterator, file object or anything.

I've tried reading the cgi module's source and found some things about temporary files, but it returns a freaking string, not a file(-like) object! So... how does it fscking work?!

Here's the code I've used:

import cgi
from wsgiref.simple_server import make_server

def app(environ,start_response):
    start_response('200 OK',[('Content-Type','text/html')])
    output = """
    <form action="" method="post" enctype="multipart/form-data">
    <input type="file" name="failas" />
    <input type="submit" value="Varom" />
    </form>
    """
    fs = cgi.FieldStorage(fp=environ['wsgi.input'],environ=environ)
    f = fs.getfirst('failas')
    print type(f)
    return output


if __name__ == '__main__' :
    httpd = make_server('',8000,app)
    print 'Serving'
    httpd.serve_forever()

Thanks in advance! :)

like image 332
Justinas Avatar asked Jul 27 '11 14:07

Justinas


People also ask

How many examples of CGI fieldstorage are there in Python?

Python FieldStorage - 30 examples found. These are the top rated real world Python examples of cgi.FieldStorage extracted from open source projects. You can rate examples to help us improve the quality of examples.

What is field storage in CGI module?

Module cgi:: Class FieldStorage [hide private] [frames] | no frames] _ClassType FieldStorage Store a sequence of fields, reading multipart/form-data. This class provides naming, typing, files stored on disk, and more. At the top level, it is accessible like a dictionary, whose keys are the field names.

How to prevent make_file () function from being called on fieldstorage?

The best way is to NOT to read file (or even each line at a time as gimel suggested). You can use some inheritance and extend a class from FieldStorage and then override make_file function. make_file is called when FieldStorage is of type file.

What is the use of CGI in Python?

Python Standard Library Module cgi:: Class FieldStorage [hide private] [frames] | no frames] _ClassType FieldStorage Store a sequence of fields, reading multipart/form-data. This class provides naming, typing, files stored on disk, and more. At the top level, it is accessible like a dictionary, whose keys are the field names.


3 Answers

Inspecting the cgi module description, there is a paragraph discussing how to handle file uploads.

If a field represents an uploaded file, accessing the value via the value attribute or the getvalue() method reads the entire file in memory as a string. This may not be what you want. You can test for an uploaded file by testing either the filename attribute or the file attribute. You can then read the data at leisure from the file attribute:

fileitem = form["userfile"]
if fileitem.file:
    # It's an uploaded file; count lines
    linecount = 0
    while 1:
        line = fileitem.file.readline()
        if not line: break
        linecount = linecount + 1

Regarding your example, getfirst() is just a version of getvalue(). try replacing

f = fs.getfirst('failas')

with

f = fs['failas'].file

This will return a file-like object that is readable "at leisure".

like image 134
gimel Avatar answered Oct 24 '22 17:10

gimel


The best way is to NOT to read file (or even each line at a time as gimel suggested).

You can use some inheritance and extend a class from FieldStorage and then override make_file function. make_file is called when FieldStorage is of type file.

For your reference, default make_file looks like this:

def make_file(self, binary=None):
    """Overridable: return a readable & writable file.

    The file will be used as follows:
    - data is written to it
    - seek(0)
    - data is read from it

    The 'binary' argument is unused -- the file is always opened
    in binary mode.

    This version opens a temporary file for reading and writing,
    and immediately deletes (unlinks) it.  The trick (on Unix!) is
    that the file can still be used, but it can't be opened by
    another process, and it will automatically be deleted when it
    is closed or when the current process terminates.

    If you want a more permanent file, you derive a class which
    overrides this method.  If you want a visible temporary file
    that is nevertheless automatically deleted when the script
    terminates, try defining a __del__ method in a derived class
    which unlinks the temporary files you have created.

    """
    import tempfile
    return tempfile.TemporaryFile("w+b")

rather then creating temporaryfile, permanently create file wherever you want.

like image 34
hasanatkazmi Avatar answered Oct 24 '22 18:10

hasanatkazmi


Using an answer by @hasanatkazmi (utilized in a Twisted app) I got something like:

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
# -*- indent: 4 spc -*-
import sys
import cgi
import tempfile


class PredictableStorage(cgi.FieldStorage):
    def __init__(self, *args, **kwargs):
        self.path = kwargs.pop('path', None)
        cgi.FieldStorage.__init__(self, *args, **kwargs)

    def make_file(self, binary=None):
        if not self.path:
            file = tempfile.NamedTemporaryFile("w+b", delete=False)
            self.path = file.name
            return file
        return open(self.path, 'w+b')

Be warned, that the file is not always created by the cgi module. According to these cgi.py lines it will only be created if the content exceeds 1000 bytes:

if self.__file.tell() + len(line) > 1000:
    self.file = self.make_file('')

So, you have to check if the file was actually created with a query to a custom class' path field like so:

if file_field.path:
    # Using an already created file...
else:
    # Creating a temporary named file to store the content.
    import tempfile
    with tempfile.NamedTemporaryFile("w+b", delete=False) as f:
        f.write(file_field.value)
        # You can save the 'f.name' field for later usage.

If the Content-Length is also set for the field, which seems rarely, the file should also be created by cgi.

That's it. This way you can store the file predictably, decreasing the memory usage footprint of your app.

like image 34
Vladius Avatar answered Oct 24 '22 19:10

Vladius