I have a dataframe in pandas that i'm reading in from a csv.
One of my columns has values that include NaN
, floats
, and scientific notation, i.e. 5.3e-23
My trouble is that as I read in the csv, pandas views these data as an object dtype
, not the float32
that it should be. I guess because it thinks the scientific notation entries are strings.
I've tried to convert the dtype using df['speed'].astype(float)
after it's been read in, and tried to specify the dtype as it's being read in using df = pd.read_csv('path/test.csv', dtype={'speed': np.float64}, na_values=['n/a'])
. This throws the error ValueError: cannot safely convert passed user dtype of <f4 for object dtyped data in column ...
So far neither of these methods have worked. Am I missing something that is an incredibly easy fix?
this question seems to suggest I can specify known numbers that might throw an error, but i'd prefer to convert the scientific notation back to a float if possible.
EDITED TO SHOW DATA FROM CSV AS REQUESTED IN COMMENTS
7425616,12375,28,2015-08-09 11:07:56,0,-8.18644,118.21463,2,0,2
7425615,12375,28,2015-08-09 11:04:15,0,-8.18644,118.21463,2,NaN,2
7425617,12375,28,2015-08-09 11:09:38,0,-8.18644,118.2145,2,0.14,2
7425592,12375,28,2015-08-09 10:36:34,0,-8.18663,118.2157,2,0.05,2
65999,1021,29,2015-01-30 21:43:26,0,-8.36728,118.29235,1,0.206836151554794,2
204958,1160,30,2015-02-03 17:53:37,2,-8.36247,118.28664,1,9.49242000872744e-05,7
384739,,32,2015-01-14 16:07:02,1,-8.36778,118.29206,2,Infinity,4
275929,1160,30,2015-02-17 03:13:51,1,-8.36248,118.28656,1,113.318511172611,5
Scientific notations isn't helpful when you are trying to make quick comparisons across your dataset. However, Pandas will introduce scientific notations by default when the data type is a float.
Python can deal with floating point numbers in both scientific and standard notation.
It's hard to say without seeing your data but it seems that problem in your rows that they contain something else except for numbers and 'n/a' values. You could load your dataframe and then convert it to numeric as show in answers for that question. If you have pandas version >= 0.17.0
then you could use following:
df1 = df.apply(pd.to_numeric, args=('coerce',))
Then you could drop row with NA values with dropna
or fill them with zeros with fillna
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With