Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read data from CSV file and transform from string to correct data-type, including a list-of-integer column

When I read data back in from a CSV file, every cell is interpreted as a string.

  • How can I automatically convert the data I read in into the correct type?
  • Or better: How can I tell the csv reader the correct data-type of each column?

(I wrote a 2-dimensional list, where each column is of a different type (bool, str, int, list of integer), out to a CSV file.)

Sample data (in CSV file):

IsActive,Type,Price,States True,Cellphone,34,"[1, 2]" ,FlatTv,3.5,[2] False,Screen,100.23,"[5, 1]" True,Notebook, 50,[1] 
like image 997
wewa Avatar asked Jul 26 '12 08:07

wewa


People also ask

What is the data type of reader in CSV file?

A CSV file (Comma Separated Values file) is a type of plain text file that uses specific structuring to arrange tabular data. Because it's a plain text file, it can contain only actual text data—in other words, printable ASCII or Unicode characters.


1 Answers

As the docs explain, the CSV reader doesn't perform automatic data conversion. You have the QUOTE_NONNUMERIC format option, but that would only convert all non-quoted fields into floats. This is a very similar behaviour to other csv readers.

I don't believe Python's csv module would be of any help for this case at all. As others have already pointed out, literal_eval() is a far better choice.

The following does work and converts:

  • strings
  • int
  • floats
  • lists
  • dictionaries

You may also use it for booleans and NoneType, although these have to be formatted accordingly for literal_eval() to pass. LibreOffice Calc displays booleans in capital letters, when in Python booleans are Capitalized. Also, you would have to replace empty strings with None (without quotes)

I'm writing an importer for mongodb that does all this. The following is part of the code I've written so far.

[NOTE: My csv uses tab as field delimiter. You may want to add some exception handling too]

def getFieldnames(csvFile):     """     Read the first row and store values in a tuple     """     with open(csvFile) as csvfile:         firstRow = csvfile.readlines(1)         fieldnames = tuple(firstRow[0].strip('\n').split("\t"))     return fieldnames  def writeCursor(csvFile, fieldnames):     """     Convert csv rows into an array of dictionaries     All data types are automatically checked and converted     """     cursor = []  # Placeholder for the dictionaries/documents     with open(csvFile) as csvFile:         for row in islice(csvFile, 1, None):             values = list(row.strip('\n').split("\t"))             for i, value in enumerate(values):                 nValue = ast.literal_eval(value)                 values[i] = nValue             cursor.append(dict(zip(fieldnames, values)))     return cursor 
like image 175
cortopy Avatar answered Sep 24 '22 03:09

cortopy