Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas read scientific notation and change

Tags:

python

pandas

csv

I have a dataframe in pandas that i'm reading in from a csv.

One of my columns has values that include NaN, floats, and scientific notation, i.e. 5.3e-23

My trouble is that as I read in the csv, pandas views these data as an object dtype, not the float32 that it should be. I guess because it thinks the scientific notation entries are strings.

I've tried to convert the dtype using df['speed'].astype(float) after it's been read in, and tried to specify the dtype as it's being read in using df = pd.read_csv('path/test.csv', dtype={'speed': np.float64}, na_values=['n/a']). This throws the error ValueError: cannot safely convert passed user dtype of <f4 for object dtyped data in column ...

So far neither of these methods have worked. Am I missing something that is an incredibly easy fix?

this question seems to suggest I can specify known numbers that might throw an error, but i'd prefer to convert the scientific notation back to a float if possible.

EDITED TO SHOW DATA FROM CSV AS REQUESTED IN COMMENTS

7425616,12375,28,2015-08-09 11:07:56,0,-8.18644,118.21463,2,0,2
7425615,12375,28,2015-08-09 11:04:15,0,-8.18644,118.21463,2,NaN,2
7425617,12375,28,2015-08-09 11:09:38,0,-8.18644,118.2145,2,0.14,2
7425592,12375,28,2015-08-09 10:36:34,0,-8.18663,118.2157,2,0.05,2
65999,1021,29,2015-01-30 21:43:26,0,-8.36728,118.29235,1,0.206836151554794,2
204958,1160,30,2015-02-03 17:53:37,2,-8.36247,118.28664,1,9.49242000872744e-05,7
384739,,32,2015-01-14 16:07:02,1,-8.36778,118.29206,2,Infinity,4
275929,1160,30,2015-02-17 03:13:51,1,-8.36248,118.28656,1,113.318511172611,5
like image 674
hselbie Avatar asked Dec 01 '15 06:12

hselbie


People also ask

Can pandas read scientific notation?

Scientific notations isn't helpful when you are trying to make quick comparisons across your dataset. However, Pandas will introduce scientific notations by default when the data type is a float.

Can Python interpret scientific notation?

Python can deal with floating point numbers in both scientific and standard notation.


1 Answers

It's hard to say without seeing your data but it seems that problem in your rows that they contain something else except for numbers and 'n/a' values. You could load your dataframe and then convert it to numeric as show in answers for that question. If you have pandas version >= 0.17.0 then you could use following:

df1 = df.apply(pd.to_numeric, args=('coerce',))

Then you could drop row with NA values with dropna or fill them with zeros with fillna

like image 126
Anton Protopopov Avatar answered Sep 21 '22 19:09

Anton Protopopov