Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting string objects to int/float using pandas

Tags:

python

pandas

csv

import pandas as pd

path1 = "/home/supertramp/Desktop/100&life_180_data.csv"

mydf =  pd.read_csv(path1)

numcigar = {"Never":0 ,"1-5 Cigarettes/day" :1,"10-20 Cigarettes/day":4}

print mydf['Cigarettes']

mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)

print mydf['CigarNum']

mydf.to_csv('/home/supertramp/Desktop/powerRangers.csv')

The csv file "100&life_180_data.csv" contains columns like age, bmi,Cigarettes,Alocohol etc.

No                int64
Age               int64
BMI             float64
Alcohol          object
Cigarettes       object
dtype: object

Cigarettes column contains "Never" "1-5 Cigarettes/day","10-20 Cigarettes/day". I want to assign weights to these object (Never,1-5 Cigarettes/day ,....)

The expected output is new column CigarNum appended which consists only numbers 0,1,2 CigarNum is as expected till 8 rows and then shows Nan till last row in CigarNum column

0                     Never
1                     Never
2        1-5 Cigarettes/day
3                     Never
4                     Never
5                     Never
6                     Never
7                     Never
8                     Never
9                     Never
10                    Never
11                    Never
12     10-20 Cigarettes/day
13       1-5 Cigarettes/day
14                    Never
...
167                    Never
168                    Never
169     10-20 Cigarettes/day
170                    Never
171                    Never
172                    Never
173                    Never
174                    Never
175                    Never
176                    Never
177                    Never
178                    Never
179                    Never
180                    Never
181                    Never
Name: Cigarettes, Length: 182, dtype: object

The output I get shoudln't give NaN after few first rows.

0      0
1      0
2      1
3      0
4      0
5      0
6      0
7      0
8      0
9      0
10   NaN
11   NaN
12   NaN
13   NaN
14     0
...
167   NaN
168   NaN
169   NaN
170   NaN
171   NaN
172   NaN
173   NaN
174   NaN
175   NaN
176   NaN
177   NaN
178   NaN
179   NaN
180   NaN
181   NaN
Name: CigarNum, Length: 182, dtype: float64
like image 725
codex Avatar asked Jun 04 '14 12:06

codex


People also ask

How do pandas change objects to float?

pandas Convert String to FloatUse pandas DataFrame. astype() function to convert column from string/int to float, you can apply this on a specific column or on an entire DataFrame. To cast the data type to 54-bit signed float, you can use numpy. float64 , numpy.

How do you convert a string to a float in Python?

We can convert a string to float in Python using the float() function. This is a built-in function used to convert an object to a floating point number. Internally, the float() function calls specified object __float__() function.

How do you convert an object to int or float in Python?

A float value can be converted to an int value no larger than the input by using the math. floor() function, whereas it can also be converted to an int value which is the smallest integer greater than the input using math. ceil() function. The math module is to be imported in order to use these methods.

How to convert a string to a float in pandas Dataframe?

In this article, we’ll look at different ways in which we can convert a string to a float in a pandas dataframe. Now, let’s create a Dataframe with ‘Year’ and ‘Inflation Rate’ as a column. Method 1: Using DataFrame.astype (). The method is used to cast a pandas object to a specified dtype.

How do I convert a Dataframe column from an object to float?

The following code shows how to use the to_numeric () function to convert the points column in the DataFrame from an object to a float: Notice that the points column now has a data type of float64. Also note that this method produces the exact same result as the previous method.

How do I convert a string to an int in Python?

Convert multiple string column to int In this example, we are converting multiple columns that have a numeric string to int by using the astype (int) method of the Pandas library. We are using a Python dictionary to change multiple columns datatype Where keys specify the column and values specify a new datatype.

How to convert a price column to a float in Python?

Run the code in Python and you would see that the data type for the ‘Price’ column is Object: The goal is to convert the values under the ‘Price’ column into a float. You can then use the astype(float) method to perform the conversion into a float: In the context of our example, the ‘DataFrame Column’ is the ‘Price’ column.


2 Answers

OK, first problem is you have embedded spaces causing the function to incorrectly apply:

fix this using vectorised str:

mydf['Cigarettes'] = mydf['Cigarettes'].str.replace(' ', '')

now create your new column should just work:

mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)

UPDATE

Thanks to @Jeff as always for pointing out superior ways to do things:

So you can call replace instead of calling apply:

mydf['CigarNum'] = mydf['Cigarettes'].replace(numcigar)
# now convert the types
mydf['CigarNum'] = mydf['CigarNum'].convert_objects(convert_numeric=True)

you can also use factorize method also.

Thinking about it why not just set the dict values to be floats anyway and then you avoid the type conversion?

So:

numcigar = {"Never":0.0 ,"1-5 Cigarettes/day" :1.0,"10-20 Cigarettes/day":4.0}

Version 0.17.0 or newer

convert_objects is deprecated since 0.17.0, this has been replaced with to_numeric

mydf['CigarNum'] = pd.to_numeric(mydf['CigarNum'], errors='coerce')

Here errors='coerce' will return NaN where the values cannot be converted to a numeric value, without this it will raise an exception

like image 179
EdChum Avatar answered Oct 22 '22 08:10

EdChum


Try using this function for all problems of this kind:

def get_series_ids(x):
    '''Function returns a pandas series consisting of ids, 
       corresponding to objects in input pandas series x
       Example: 
       get_series_ids(pd.Series(['a','a','b','b','c'])) 
       returns Series([0,0,1,1,2], dtype=int)'''

    values = np.unique(x)
    values2nums = dict(zip(values,range(len(values))))
    return x.replace(values2nums)
like image 38
Apogentus Avatar answered Oct 22 '22 09:10

Apogentus