I have a .csv file which my users have to download, input some data and upload to my site.
Is there a better way of ensuring the data gets uploaded successfully based on my snippet below? What else should I be checking for? Would using a dialect be better?
def import(resident_file):
try:
file = resident_file.file.path
reader = csv.reader(open(file, 'rU'), delimiter=',', quotechar='"')
headerline = reader.next()
for row in reader:
try:
# do stuff
except Exception, e:
print e
except Exception, e:
print e
An example of a problem I am running into is that when a user opens the file, inputs data and saves it, the delimiters change from ,
to ;
. How can I cover the various types of delimiters that the document could be saved in due to it being open in different programmes e.g excel in windows, excel in mac, open office in mac, open office in linux etc
Another example of a problem is when the user tries to copy and paste the data into the template provided, all hell breaks loose.
UPDATE
I'm using the Sniffer
class now as mentioned in one of the answers below but its still not fool proof.
UPDATED CODE SNIPPET
def bulk_import_residents(condo, resident_file):
"""
COL 1 COL 2 COL 3 COL 4 COL 5
first_name last_name contact_number unit_number block_number
"""
file_path = resident_file.file.path
csvfile = open(file_path, 'rb')
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
headerline = reader.next()
for row in reader:
try:
data = ResidentImportData()
data.condo = condo
data.file = resident_file
data.first_name = row[0]
data.last_name = row[1]
data.contact_number = row[2]
data.unit_number = row[3]
data.block_number = row[4]
data.save()
except Exception, e:
print '{0}'.format(e)
raise Http404('Wrong template format')
To preserve all the digits in text-formatted numbers, you have to import the downloaded CSV file as raw data into a new Excel spreadsheet, set the column datatypes as needed, and then save the new file as an Excel workbook. Excel (XLSX) files will preserve these formats, CSV files won't.
Fixed format means that the fields in your file have a fixed length. For instance first column is always 10 characters, second is 3 characters and third is 20 characters. Delimited format means that there is a character used to separate every column on each line.
CSV files contain only data, as comma-separated values. If you want to keep your formatting changes, save the file as an Excel file (i.e. myfile. xls), using the 'save as' file menu option.
CSV is a non-format. The Sniffer
class is not foolproof because it's actually impossible to 100% reliably detect all given dialects.
I think you're going to have to use Sniffer
for the 90% of the time it works, and capture invalid input files, analyze them, and extend Sniffer
to catch them.
I completely agree with nfirvine (CSV IS A NON FORMAT) - okay, not that harsh. But it is a minimal format. Its very loose. Expect things to break frequently if you use CSV, as it sounds like you are already experiencing this.
I also agree with Mike Bynum - use something like XML.
But I understand that even if there is a better way, there is often the pragmatic way. Maybe you gotta stick with your format a plethora of reasons...so: two routes.
Route 1: CSV
I've done (am doing) this route now. My users update data on a daily basis (couple thousand records). Given the frequency and # of records updated, I really wish I had gone the second route: when dealing with a significant amount of data or updates, solid data validation is a huge time saver.
That said. When you are stuck with CSV. I suggest you do the following:
Not that you can't support some oddities, but in general - you want to avoid it, and you want to avoid accidentally importing badly formed data.
Route 2: XML
I would suggest you do the following:
Define what the data your users have to import is with a schema definition (XSD). I like to keep the w3c definitions on hand. But there are good tutorials to help you write your own XSD definition.
Give your users a sample XML file to fill in, and a suggestion for an editor. There are great commercial ones, and reasonable free ones.
You can read your user's XML files, and be sure that if it validates then its good to go. For that matter, your users can validate before they send it to you.
Ah just found the sniffer class.
csvfile = open("example.csv", "rb")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With