Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Changing strings to floats in an imported .csv

Quick question for an issue I haven't managed to solve quickly:

I'm working with a .csv file and can't seem to find a simple way to convert strings to floats. Here's my code,

import csv

def readLines():
    with open('testdata.csv', 'rU') as data:
        reader = csv.reader(data)
        row = list(reader)
        for x in row:
            for y in x:
                print type(float(y)),
readLines()

As you can see, it will currently print the type of every y element in x set of lists in the variable row; this produces a long list of "<type 'float'>". But this doesn't actually change each element to a float, nor does setting the for loop to execute float(y) (a type test returns 'string' for each element) work either.

I also tried literal_eval, but that failed as well. The only way to change the list elements to floats is to create a new list, either with list comprehension or manually, but that loses the original formatting of each list (as lists of a set amount of elements within one larger list).

I suppose the overall question is really just "What's the easiest way to read, organize, and synthesize data in .csv or excel format using Python?"

Thanks in advance to those courteous/knowledgeable enough to help.

like image 845
userNaN Avatar asked Sep 18 '13 16:09

userNaN


People also ask

Which is the correct way to import a CSV module?

Step 1) To read data from CSV files, you must use the reader function to generate a reader object. The reader function is developed to take each row of the file and make a list of all columns. Then, you have to choose the column you want the variable data for.

How do I convert a string to a CSV file in Python?

First, open the CSV file for writing ( w mode) by using the open() function. Second, create a CSV writer object by calling the writer() function of the csv module. Third, write data to CSV file by calling the writerow() or writerows() method of the CSV writer object.

Can you convert strings to floats?

We can convert a string to float in Python using the float() function. This is a built-in function used to convert an object to a floating point number. Internally, the float() function calls specified object __float__() function.


1 Answers

You are correct that Python's builtin csv module is very primitive at handling mixed data-types, does all its type conversion at import-time, and even at that has a very restrictive menu of options, which will mangle most real-world datasets (inconsistent quoting and escaping, missing or incomplete values in Booleans and factors, mismatched Unicode encoding resulting in phantom quote or escape characters inside fields, incomplete lines will cause exception). Fixing csv import is one of countless benefits of pandas. So, your ultimate answer is indeed stop using builtin csv import and start using pandas. But let's start with the literal answer to your question.

First you asked "How to convert strings to floats, on csv import". The answer to that is to open the csv.reader(..., quoting=csv.QUOTE_NONNUMERIC) as per the csv doc

csv.QUOTE_NONNUMERIC: Instructs the reader to convert all non-quoted fields to type float.

That works if you're ok with all unquoted fields (integer, float, text, Boolean etc.) being converted to float, which is generally a bad idea for many reasons (missing or NA values in Booleans or factors will get silently squelched). Moreover it will fail (throw exception) on unquoted text fields obviously. So it's brittle and needs to be protected with try..catch.

Then you asked: 'I suppose the overall question is really just "What's the easiest way to read, organize, and synthesize data in .csv or excel format using Python?"' to which the crappy csv.reader solution is to open with csv.reader(..., quoting=csv.QUOTE_NONNUMERIC)

But as @geoffspear correctly replied 'The answer to your "overall question" may be "Pandas", although it's a bit vague.'

like image 57
smci Avatar answered Sep 20 '22 14:09

smci