I am going around in circles and tried so many different ways so I guess my core understanding is wrong. I would be grateful for help in understanding my encoding/decoding issues.
I import the dataframe from SQL and it seems that some datatypes:float64 are converted to Object. Thus, I cannot do any calculation. I fail to convert the Object back to float64.
df.head()
Date WD Manpower 2nd CTR 2ndU T1 T2 T3 T4 2013/4/6 6 NaN 2,645 5.27% 0.29 407 533 454 368 2013/4/7 7 NaN 2,118 5.89% 0.31 257 659 583 369 2013/4/13 6 NaN 2,470 5.38% 0.29 354 531 473 383 2013/4/14 7 NaN 2,033 6.77% 0.37 396 748 681 458 2013/4/20 6 NaN 2,690 5.38% 0.29 361 528 541 381
df.dtypes
WD float64 Manpower float64 2nd object CTR object 2ndU float64 T1 object T2 object T3 object T4 object T5 object dtype: object
SQL table:
Use pandas DataFrame. astype() function to convert column from string/int to float, you can apply this on a specific column or on an entire DataFrame. To cast the data type to 54-bit signed float, you can use numpy. float64 , numpy.
to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric(). This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
You can convert most of the columns by just calling convert_objects
:
In [36]: df = df.convert_objects(convert_numeric=True) df.dtypes Out[36]: Date object WD int64 Manpower float64 2nd object CTR object 2ndU float64 T1 int64 T2 int64 T3 int64 T4 float64 dtype: object
For column '2nd' and 'CTR' we can call the vectorised str
methods to replace the thousands separator and remove the '%' sign and then astype
to convert:
In [39]: df['2nd'] = df['2nd'].str.replace(',','').astype(int) df['CTR'] = df['CTR'].str.replace('%','').astype(np.float64) df.dtypes Out[39]: Date object WD int64 Manpower float64 2nd int32 CTR float64 2ndU float64 T1 int64 T2 int64 T3 int64 T4 object dtype: object In [40]: df.head() Out[40]: Date WD Manpower 2nd CTR 2ndU T1 T2 T3 T4 0 2013/4/6 6 NaN 2645 5.27 0.29 407 533 454 368 1 2013/4/7 7 NaN 2118 5.89 0.31 257 659 583 369 2 2013/4/13 6 NaN 2470 5.38 0.29 354 531 473 383 3 2013/4/14 7 NaN 2033 6.77 0.37 396 748 681 458 4 2013/4/20 6 NaN 2690 5.38 0.29 361 528 541 381
Or you can do the string handling operations above without the call to astype
and then call convert_objects
to convert everything in one go.
UPDATE
Since version 0.17.0
convert_objects
is deprecated and there isn't a top-level function to do this so you need to do:
df.apply(lambda col:pd.to_numeric(col, errors='coerce'))
See the docs and this related question: pandas: to_numeric for multiple columns
convert_objects is deprecated.
For pandas >= 0.17.0, use pd.to_numeric
df["2nd"] = pd.to_numeric(df["2nd"])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With