I have the following data in pandas dataframe:
    state        1st        2nd             3rd 0   California  $11,593,820 $109,264,246    $8,496,273 1   New York    $10,861,680 $45,336,041     $6,317,300 2   Florida     $7,942,848  $69,369,589     $4,697,244 3   Texas       $7,536,817  $61,830,712     $5,736,941   I want to perform some simple analysis (e.g., sum, groupby) with three columns (1st, 2nd, 3rd), but the data type of those three columns is object (or string).
So I used the following code for data conversion:
data = data.convert_objects(convert_numeric=True)   But, conversion does not work, perhaps, due to the dollar sign. Any suggestion?
to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric(). This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
To encode non-numeric data to numeric you can use scikit-learn's LabelEncoder. It will encode each category such as COL1's a , b , c to integers. enc. fit() creates the corresponding integer values.
@EdChum's answer is clever and works well. But since there's more than one way to bake a cake.... why not use regex? For example:
df[df.columns[1:]] = df[df.columns[1:]].replace('[\$,]', '', regex=True).astype(float)   To me, that is a little bit more readable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With