Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Uploading a csv file with a fixed format

Tags:

python

csv

I have a .csv file which my users have to download, input some data and upload to my site.

Is there a better way of ensuring the data gets uploaded successfully based on my snippet below? What else should I be checking for? Would using a dialect be better?

def import(resident_file):

    try:
        file = resident_file.file.path
        reader = csv.reader(open(file, 'rU'), delimiter=',', quotechar='"')
        headerline = reader.next()

        for row in reader:
            try:
                # do stuff

            except Exception, e:
                print e

    except Exception, e:
        print e

An example of a problem I am running into is that when a user opens the file, inputs data and saves it, the delimiters change from , to ;. How can I cover the various types of delimiters that the document could be saved in due to it being open in different programmes e.g excel in windows, excel in mac, open office in mac, open office in linux etc

Another example of a problem is when the user tries to copy and paste the data into the template provided, all hell breaks loose.

UPDATE I'm using the Sniffer class now as mentioned in one of the answers below but its still not fool proof.

UPDATED CODE SNIPPET

def bulk_import_residents(condo, resident_file):

    """
    COL 1       COL 2       COL 3           COL 4           COL 5        
    first_name  last_name   contact_number  unit_number     block_number

    """

    file_path = resident_file.file.path
    csvfile = open(file_path, 'rb')
    dialect =  csv.Sniffer().sniff(csvfile.read(1024))
    csvfile.seek(0)
    reader = csv.reader(csvfile, dialect)
    headerline = reader.next()

    for row in reader:
        try:
            data = ResidentImportData()
            data.condo = condo
            data.file = resident_file
            data.first_name = row[0]
            data.last_name = row[1] 
            data.contact_number = row[2]
            data.unit_number = row[3]
            data.block_number = row[4]
            data.save()
        except Exception, e:
            print '{0}'.format(e)
            raise Http404('Wrong template format')
like image 330
super9 Avatar asked Jan 19 '12 16:01

super9


People also ask

How do I keep numbers formatted in a CSV file?

To preserve all the digits in text-formatted numbers, you have to import the downloaded CSV file as raw data into a new Excel spreadsheet, set the column datatypes as needed, and then save the new file as an Excel workbook. Excel (XLSX) files will preserve these formats, CSV files won't.

What is the difference between delimited and fixed width text files?

Fixed format means that the fields in your file have a fixed length. For instance first column is always 10 characters, second is 3 characters and third is 20 characters. Delimited format means that there is a character used to separate every column on each line.

Why do CSV files not save formatting?

CSV files contain only data, as comma-separated values. If you want to keep your formatting changes, save the file as an Excel file (i.e. myfile. xls), using the 'save as' file menu option.


3 Answers

CSV is a non-format. The Sniffer class is not foolproof because it's actually impossible to 100% reliably detect all given dialects.

I think you're going to have to use Sniffer for the 90% of the time it works, and capture invalid input files, analyze them, and extend Sniffer to catch them.

like image 91
nfirvine Avatar answered Sep 28 '22 05:09

nfirvine


I completely agree with nfirvine (CSV IS A NON FORMAT) - okay, not that harsh. But it is a minimal format. Its very loose. Expect things to break frequently if you use CSV, as it sounds like you are already experiencing this.

I also agree with Mike Bynum - use something like XML.

But I understand that even if there is a better way, there is often the pragmatic way. Maybe you gotta stick with your format a plethora of reasons...so: two routes.

Route 1: CSV

I've done (am doing) this route now. My users update data on a daily basis (couple thousand records). Given the frequency and # of records updated, I really wish I had gone the second route: when dealing with a significant amount of data or updates, solid data validation is a huge time saver.

That said. When you are stuck with CSV. I suggest you do the following:

  • Provide your users with a good/common definition of CSV, namely RFC 4180. Make sure your customer understands what you expect their file to contain:
    • A header line.
    • Commas as separation
    • Quotes around any data that contains commas.
  • Along with that definition, give your users a sample of the CSV (which it sounds like you did, good!). Explain that you can't process a CSV file that doesn't conform to your data definition.
  • Make sure the text file type is what you expect it to be before you import it - see convert to/from Unix/Windows.
  • Within your CSV parser you need to adopt the fail fast methodology, and make sure you have a mechanism to notify your users when the CSV file doesn't conform to the standard you expect. Give them as much information as possible (provide the exception details...if not for them, at least for you).
  • This problem you are having with one customer's files suggests that you might want to give your customers some direction as far as editors you know work correctly. Excel should work, or Open Office. I suggest a spreadsheet application b/c they a good job of exporting to CSV and taking care of quoting, etc.

Not that you can't support some oddities, but in general - you want to avoid it, and you want to avoid accidentally importing badly formed data.

Route 2: XML

I would suggest you do the following:

  • Define what the data your users have to import is with a schema definition (XSD). I like to keep the w3c definitions on hand. But there are good tutorials to help you write your own XSD definition.

  • Give your users a sample XML file to fill in, and a suggestion for an editor. There are great commercial ones, and reasonable free ones.

  • You can read your user's XML files, and be sure that if it validates then its good to go. For that matter, your users can validate before they send it to you.

like image 37
dsummersl Avatar answered Sep 28 '22 06:09

dsummersl


Ah just found the sniffer class.

csvfile = open("example.csv", "rb")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
like image 26
super9 Avatar answered Sep 28 '22 05:09

super9